Whenever Google introduces a new app or service not based on search or advertising (its primary revenue generator), a deafening chorus of analysts and pundits immediately bemoan the fact that the company is once again dabbling in a domain it doesn’t understand or can ever hope to monetize. Add to that long list Microsoft CEO Steve Balmer, here on Google’s Android:
“I don’t really understand their strategy. Maybe somebody else does. If I went to my shareholder meeting, my analyst meeting, and said, ‘hey, we’ve just launched a new product that has no revenue model!’…I’m not sure that my investors would take that very well. But that’s kind of what Google’s telling their investors about Android.”
Given Microsoft’s recent performance, it’s no longer possible to discern if Ballmer is being disingenuous or simply naive here. I’m sure a lot of people at 1600 Amphitheatre Parkway in Mountain View must be laughing at their competitors’ inability to connect the dots and wish it would continue forever.
So how are dots connected at Google then?
Exhibit A: Two years ago, Google acquired the popular video site YouTube for $1.65 billion. It’s no secret that the company has had a tough time finding ways to monetize the video service. Unfortunately, what Google does better than any other company in indexing and searching text-based documents doesn’t work with video. Video is opaque. One sees a poster frame, a title and often a brief description of a YouTube video, but beyond that, a 5 minute video may contain thousands of words — searchable and relatable words — that remain unseen. While a user can easily scroll up and down to scan online text or just use the browser’s search function to quickly locate relevant parts, finding and navigating to specific parts of video is cumbersome at best, non-existent in most cases.
Wouldn’t it be nice if video could be indexed?
The technology for converting the audio tracks of video via speech recognition to plain text indexed to time-code has been around for a while. Google would like to dramatically improve the process by perfecting speech recognition. In order to do that it needs better phoneme data from a diverse population. How does Google get a hold of massive phoneme datasets? Offer a free service, the kind that gets Ballmer seemingly so upset.
At up to $2 for each operator-assisted call, directory assistance has become a $7.4 billion industry. Google’s automated 411 service, however, is free and lets you find and call local businesses. In an InfoWorld interview last year, Marissa Mayer, Google’s VP of Search Products & User Experience, explained why GOOG-411 was created:
Google Video has had an interesting evolution. When we first launched it, it was based on closed captions, so literally a transcription of the program, but interestingly, you couldn’t play video. So we changed it so that you could play video, and now we’re searching the meta content. That said, one of the future elements of what’s likely to happen in search is around speech recognition.
You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.
The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we’re trying to get the voice out of video, we can do it with high accuracy.
Googled even tested outdoor ads for GOOG-411 in New York state:
Exhibit B: Another way to collect useful speech recognition data is of course another free app, this time for the iPhone: Google Mobile App. It allows users to bypass typing and speak their search queries directly into the phone:
It wouldn’t be far fetched, for example, to imagine Google could one day correlate location data from the iPhone GPS to dialect and pronunciation characteristics of the user or to any detectable variation in speech articulation between morning and late night timeframes, and so on. Google hasn’t even begun to scratch the surface of mine-ability of the immense datasets it’s been accumulating.
To Google, that invented the ultimate relationship manager PageRank, it’s all a matter of connecting the dots:
Exhibit C: Wherever the dots lead Google will erect services to collect and enhance its datasets, even if it means coming up with a free but somewhat contrived “game” to improve its image indexing, as in Image Labeler:
To the fat, lazy and slumbering Microsoft these are all unmonetizable services which Ballmer claims his board would never allow him to let loose on the market. It’s apparently easier to convert a desktop app, slap a “Live” label on it and shove it online to connect the dots to the bank. Is it any wonder why Microsoft hasn’t made any dent in Google’s online dominance?