As my team is in the field showing off our ability to search television for mentions of interest, we occasionally field questions about the variety of applications for the data. In general, SnapStream customers are able to search, clip and distribute content of interest from a single user interface. Using closed captioning, SnapStream brings a user directly to any mention of interest within a recorded broadcast. Occasionally, especially in a University or Public Relations setting, the topic turns to statistical analysis. While our solution does not currently offer a statistical engine, the data is easily exported for analysis.
Politics aside, I have downloaded the closed-caption text from Wednesday night's (January 27, 2010) State of the Union. Using the word cloud creator at Wordle.net, I quickly created the above symbolic representation of the speech. The 75-minute speech generated over 7,000 words, of which, the top 200 are represented in the cloud. Wordle.net has the option to remove "common" words – looks to be mostly prepositions, conjunctions and pronouns - to get at the meat of the text . The more appearances a word makes in the text, the larger the word is portrayed in the cloud. The top five words from the speech were People, Americans, Year, Jobs and Work. The overall process took less than 5 minutes - in fact, it took considerably less time than creating this post. ;-)
For customers that are currently using our television search solution, here is a guide to the process. First, locate the video content by searching (or browsing, in this case) within the user interface. If the topic of interest is short, you can use the clipping feature to "trim" down the Closed Caption transcript - when clipping video, SnapStream automatically trims the closed caption transcript as well, so the clip is also searchable. Browse the library for the show or clip, and instead of playing the file, choose "Download Transcript" from the about program page. A dialog will open asking where you want to save the text file of the transcript. The only massaging required is to remove the timestamps, which can be done in any text editor. Copy the text to Wordle.net's create engine and sit back to admire your work.
How did we learn of this capability? Interestingly enough, a customer turned us on to the ability during last years election. Ultimately, the goal was statistical analysis of candidate speeches - the cloud representation was just a by-product. We are never ceased to be amazed at the uses customers find for our products.