Google shows Microsoft how to connect the dots

Whenever Google introduces a new app or service not based on search or advertising (its primary revenue generator), a deafening chorus of analysts and pundits immediately bemoan the fact that the company is once again dabbling in a domain it doesn’t understand or can ever hope to monetize. Add to that long list Microsoft CEO Steve Balmer, here on Google’s Android:

“I don’t really understand their strategy. Maybe somebody else does. If I went to my shareholder meeting, my analyst meeting, and said, ‘hey, we’ve just launched a new product that has no revenue model!’…I’m not sure that my investors would take that very well. But that’s kind of what Google’s telling their investors about Android.”

Given Microsoft’s recent performance, it’s no longer possible to discern if Ballmer is being disingenuous or simply naive here. I’m sure a lot of people at 1600 Amphitheatre Parkway in Mountain View must be laughing at their competitors’ inability to connect the dots and wish it would continue forever.

So how are dots connected at Google then?

Exhibit A: Two years ago, Google acquired the popular video site YouTube for $1.65 billion. It’s no secret that the company has had a tough time finding ways to monetize the video service. Unfortunately, what Google does better than any other company in indexing and searching text-based documents doesn’t work with video. Video is opaque. One sees a poster frame, a title and often a brief description of a YouTube video, but beyond that, a 5 minute video may contain thousands of words — searchable and relatable words — that remain unseen. While a user can easily scroll up and down to scan online text or just use the browser’s search function to quickly locate relevant parts, finding and navigating to specific parts of video is cumbersome at best, non-existent in most cases.

Wouldn’t it be nice if video could be indexed?

The technology for converting the audio tracks of video via speech recognition to plain text indexed to time-code has been around for a while. Google would like to dramatically improve the process by perfecting speech recognition. In order to do that it needs better phoneme data from a diverse population. How does Google get a hold of massive phoneme datasets? Offer a free service, the kind that gets Ballmer seemingly so upset.

Dialing GOOG-411

At up to $2 for each operator-assisted call, directory assistance has become a $7.4 billion industry. Google’s automated 411 service, however, is free and lets you find and call local businesses. In an InfoWorld interview last year, Marissa Mayer, Google’s VP of Search Products & User Experience, explained why GOOG-411 was created:

Google Video has had an interesting evolution. When we first launched it, it was based on closed captions, so literally a transcription of the program, but interestingly, you couldn’t play video. So we changed it so that you could play video, and now we’re searching the meta content. That said, one of the future elements of what’s likely to happen in search is around speech recognition.

You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.

The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we’re trying to get the voice out of video, we can do it with high accuracy.

Googled even tested outdoor ads for GOOG-411 in New York state:

goog411.jpg

Say what?

Exhibit B: Another way to collect useful speech recognition data is of course another free app, this time for the iPhone: Google Mobile App. It allows users to bypass typing and speak their search queries directly into the phone:


speaknow.jpg

It wouldn’t be far fetched, for example, to imagine Google could one day correlate location data from the iPhone GPS to dialect and pronunciation characteristics of the user or to any detectable variation in speech articulation between morning and late night timeframes, and so on. Google hasn’t even begun to scratch the surface of mine-ability of the immense datasets it’s been accumulating.

To Google, that invented the ultimate relationship manager PageRank, it’s all a matter of connecting the dots:

google-dots.png

Exhibit C: Wherever the dots lead Google will erect services to collect and enhance its datasets, even if it means coming up with a free but somewhat contrived “game” to improve its image indexing, as in Image Labeler:

google-imagelabeler.jpg

To the fat, lazy and slumbering Microsoft these are all unmonetizable services which Ballmer claims his board would never allow him to let loose on the market. It’s apparently easier to convert a desktop app, slap a “Live” label on it and shove it online to connect the dots to the bank. Is it any wonder why Microsoft hasn’t made any dent in Google’s online dominance?

62 thoughts on “Google shows Microsoft how to connect the dots

  1. Pingback: Corporations and Hypocrisy: Inconvenient truths about Google « counternotions | Fibonacci Love

  2. Pingback: Corporations and Hypocrisy: Inconvenient truths about Google « counternotions

  3. Pingback: YouTube’s Real Reason For h.264/aac Samuel Thurston

  4. Pingback: When is it “too early” for a new product? « counternotions

  5. “Innovation works in unexpected ways: that team’s efforts turned into GOOG-411 directory service which refines phoneme recognition so that speech-to-text conversion of YouTube videos can better be indexed, searched, and ultimately monetized, as we recently illustrated here.” I will readily admit to being an extreme Google “Fanboy”. Not because I believe they can do no wrong, but because the right that they do is planned further ahead than even they know. Anyone remember Grandcentral? Lets start back at Goog-411 and question why the free voice service. People talk to it and it starts to aquire voice recognition. “Waste of money” some say. Google buys Grandcentral (a free VOIP service) and quickly stops signups leaving only the group who were lucky enough to be members able to play with their Grandcentral accounts. (Hmmm, maybe that was so they could use the VOIP technology for their Gtalk client?) Google then starts dabbling in offering FREE wireless access to cities. Ouch!! That can’t be good for the bottom line can it?! Then we introduce Android. Open source phone OS. Waste of time and money? After all, “iPhone iz da best” blah blah blah. Misteps all along the way by themselves it may seem, but then the latest news comes out and brings to light things that people (M$’s Ballmer for one) may not have seen coming. The latest update on Grandcentral is that it is to be renamed Google Voice (https://www.google.com/voice/about). With it’s many options (being able to record conversations, speech to text voice mail alerts) we begin to see where all of these ideas join as one. Jump ahead a few years and the G6 (hypothetical name, don’t start doing searches for it) will be an Open source phone without a carrier on the Google wireless network with some outrageous speed using VOIP for phone service, and directly connecting you to your email, voice mail, contacts and social networking while feeding your cat and walking your dog. It’s very easy to find fault in a companies plan when you only see page 4 of the document. No one but Google knows what they are going to do (sometimes even they don’t) but if we look at what they HAVE done and what technology is capable of, it’s pretty safe to assume that they aren’t going to stop at “offering a service that ends up costing them” without it benefiting them in the long run. I am a loyal Sprint customer, currently own a Windows Mobile phone (HTC Touch Pro), and unless something drastic happens in the next few months, I will continue on with those statistics. BUT when HTC comes up with a version of Android on the Sprint network I know that I will hop on the band wagon simply because I know that it will be moving forward. It isn’t about being better than the iphone, or being a Fan Boy… it’s about seeing where Google is going to head next! Think about it, most house holds have at least one computer that can be shared with all in the house… but each person wants to have their own cell phone. When you see them making moves, think “How does this translate into the cell phone world.” Cell phones are simply portable ways to access the Internet. And the Internet (not just search) is where Google strives to achieve dominance.

  6. Pingback: startupi » TvMoob: Monetizing Videos

  7. Pingback: Google Dethroned (on Brandan Lennox's Blog)

  8. Pingback: They Just Don’t ‘Get It’ « Not The Droids You Are Looking For

  9. Pingback: Book review: “Closing the Innovation Gap” « counternotions

  10. Pingback: startupi » TvMoob: monetizando os vídeos

  11. Pingback: del.icio.us bookmarks - 2008-12-09

  12. Pingback: GOOG and 411 « /var/blog/messages

  13. Pingback: . Connecting the Dots . « All my drivel

  14. Wash Echte: “…how searchable YouTube videos could make lots of more money for Google.”

    As you know, Google’s main revenue generator is advertising via search. So Google’s ability to classify, index, rank, relate and find content is supremely important. Google offers many apps (mostly free) that not only provide services for free but at the same time give the company the means to classify, index, rank, relate and find content.

    YouTube, on the other hand, is largely outside of this tightly integrated search infrastructure due to video being opaque. In order for video to become a revenue generator, it likely has to become part of the Google ecosystem and thus searchable.

  15. I don’t get how searchable YouTube videos could make lots of more money for Google. You’re “feeling” that’s the case, or what’s the story here?

  16. The main driver for improving speech recognition has always been military. If you look around the research you’ll find that much of it is funded by DARPA. Basically it’s important stuff for scanning all those phone calls that they listen in to.

    If they can produce better speech recognition this way, then I’m sure they’ll find a buyer with deep pockets.

  17. Its more like a game of chess, if you take in to consideration a larger no. of moves, you can make a better decision.

  18. Pingback: YouTube Monetization: Connecting the Dots | Andrew MacKenzie

  19. May I point out that simply gaining a monopoly is not illegal. Partnering with a rival to shut out the competition (i.e. what Google and Yahoo tried to do) is however. There are no, repeat NO anti-monopoly laws in the United States, there are however many, many anti-TRUST laws. Maybe, everyone should spend some time looking at the difference.

  20. someone_who_knows: “when you say that Tellme collects only a limited amount of speech recognition data…”

    Nowhere did I say that.

    I simply referred to the extremely limited number of mobile devices Tellme supports, namely BlackBerries, as you can see it at their site.

    Once again, the point here is not what Tellme does as a service in and of itself, as it’s clearly not a money-making powerhouse for MSFT. But whether or not the company leverages the dataset for some other, greater, more ambitious strategic purpose. Hence the “connecting the dots” graphic at the end of the article above.

  21. microsoft didn’t make money legally… they did it as a monopoly…. that translates to cheat, steal, lie and keep the money. The company needs to be broken up…. baby bells style… micro-monopolies.

  22. Kontra, I don’t think you know what you’re talking about when you say that Tellme collects only a limited amount of speech recognition data. They do speech recognition on literally billions of phone calls per year. 1-800-Goog-411 is a nice little service, but Tellme answers 40% of all calls to 411 in America through their contracts with Verizon, AT+T, etc. So in terms of who is collecting more speech recognition utterances — google or microsoft’s tellme, there is really no comparison. Tellme also provides speech recognition by answering every call to major companies that get a lot of phone calls like Merrill Lynch, UPS and Fedex. And, they operate a free service called 1-800-555-TELL that has been doing what 800-GOOG-411 does since it launched eight years ago!

  23. We have a fad these days. Praise anything that has “Google” in it. And world will go gaga.

    It doens’t work always. Let Google show the money. Soon.

    Microsoft made money, legally, and Balmer has a right to his opinion.

  24. Pingback: steev’s thoughts » Blog Archive » How Google connects the dots for $$$

  25. mclaren, you start with good premises but fall into the same trap of algorithmic analysis that large-dataset approaches complement or exceed in some cases. You seem to have an aristotelian judgment of what are the ‘right’ or ‘ wrong’ search results, versus focusing on relevance – relevance informed by subjective judgement and algorithmic analysis.

    “Wreck a nice beach” is solvable if you do statistical analysis of the context of the utterance, again: massive empirical data renders a ‘good enough’ solution to something algorithms with zero data have had challenges with.

    And you sound like a Microsoft employee with your debate techniques and argumentative fallacies :)

  26. Buchstabenbuch: “Anyone can download for free apps like Photosynth and Deep Zoom Composer which, despite being PC-only, offer some innovative technology.”

    But the argument here isn’t about technology (innovation). It’s about about having a clearcut focus on offering free end-user products in one area as a value-feeder to another.

  27. Our company, Nexidia, phonetically indexes audio at high speed rates and is used in several applications to make finding the spoken phrase in audio or video possible.

  28. Ballmer is a marketer. Businesses where the decisions are made by marketers and bankers are tanking. Look at Microsoft and the big 3 auto makers.

    Businesses where decisions are made by production/engineering/design are not tanking. Look at Google, Honda, and Toyota.

    Stop worrying about monetizing everything and just make stuff people want, and dont restrict the bejeezus out of how they use it. It really is that simple.

    Pennywise and pound foolish.

  29. I would dismiss Google vs. Microsoft, and instead look at how this strategy places Google up against Amazon.

    Amazon is creating distributed tools (Mechanical Turk and AWS to store/analyze/serve) that allow 3rd parties to basically compete (though on a relatively smaller scale) with the Google monolith. So while Amazon isn’t directly competing with Google in some sense (Google doesn’t seem to have completely drunk the cloud computing kool-aid), Amazon is lowering the barrier to entry for start-ups to begin competing directly with Google in the R&D department.

    Of course, Google is just as likely to purchase said startup (and we all know the VC business model is to get purchased by Google), but I think Amazon sees the value in ensuring that Google doesn’t become an *unstoppable* juggernaut by ensuring alternative R&D infrastructure.

  30. I’m not a Microsoft apologist, I have as much problem with their buggy software releases and bloat as anyone. However they are making changes: your fundamental argument is that they don’t release apps unless they can immediately monetize it. This is simply untrue. Anyone can download for free apps like Photosynth and Deep Zoom Composer which, despite being PC-only, offer some innovative technology. You might not agree with their strategy, which often involves pulling people into the Microsoft “garden”, but to say they’re just interested in the fast money is shortsighted.

  31. A company attracts the kind of people that has relation to the kind of company. I mean, dumb companies just interested in money attract dumb shareholders just interested in money. That is why M$ cannot see beyond: because they are just interested in fast money. That’s why M$ is going down each day. Just watch and see M$ crumbling.

  32. Microsoft = thieving monopoly. Balmer is little different from say, Paulsen or Rove. People need to get over Microsoft and return to computing. Support and apply open source software that is sourced from foundations.

  33. This is absurd nonsense. PageRank remains worthless. Typical top Google search results no longer even lead to the pages described in the search: for example, when you search for “Joe Blow” you no longer even get “www.joeblow.com” or “www.joeblow.org” among the search results, Google’s PageRank is so badly broken. The first several pages of search results on any subject now return primarily hack “parked” spam pages. Searching by date is impossible and always has been — when you search by a specific date, say “Bill Clinton April 22 1996,” you always get nothing but yesterday’s 2008 article on Bill Clinton, _never_ an article from the year and month you searched for. So Google’s search is completely broken, as John Dvorak points out. Cue crackpots erupting in demented fury to shriek that John Dvorak is a hack, Dvorak is ignorant, etc., etc., blah blah, fabulation after insult after baseless ad hominem attack devoid of documented facts. Finished with the irrelevant smears and character assassination? Good, let’s move on.

    The article continues to pile absurdity upon ridiculousness by claiming that the main problem is speech recognition is getting “a large enough phoneme set.” That’s not the problem is speech recognition at all. The big problem in speech recognition is the “wreck a nice beach problem,” namely, the fact that different strings of words sound identical and analyze identically when we apply signal processing methods to their acoustic sets. “Master painter” sounds identical to “masturbator,” “wreck a nice beach” sounds identical to “recognize speech,” and so on. The only way to tell these different sentences apart is to analyze the context in which they occur. Unfortunately, this limits the accuracy of speech recognition because it requires computer programs to parse the meaning of language, something no computer program has ever been to do very well. The frame problem is what’s hanging up speech recognition, not getting a bigger phoneme set. Alas, the only way to really solve the frame problem is to narrow down the context so much that there’s no possible ambiguity. This is why speech recognition works extremely in certain narrow applications: medical transcription, for instance, but in general speech, it works poorly.

    Naturally, since I’ve specifically cited peer-reviewed scientific journal articles, these must also be ridiculed and dismissed. Standard stuff. As always, the crackpots without college degrees shower the world’s most knowledgeable expert PhDs with acid contempt whenever their scholarly work is cited. Standard, typical, usual and quotidian for the worldwide web.

    The article goes on to claim “It wouldn’t be far fetched, for example, to imagine Google could one day correlate location data from the iPhone GPS to dialect and pronunciation characteristics of the user or to any detectable variation in speech articulation between morning and late night timeframes, and so on.” Of course it would be far-fetched to imagine Google could one day correlate location data from an iPhone or android GPS to dialect and pronunciation characteristics of the user, because we live in a highly mobiel polyglot society. It’s sburd to imagine, for example, that you’ll be able to predict the speech inflections of people speaking on their cellphones in Times Square. You’re just as likely to get a Japanese tourist as a guy from New Jersey. This is nonsense.

    These wild claims about Google persist, growing more absurd as time passes, even as Google’s actual performance on simple basic tasks like search continues to collapse and degenerate. 2 years ago we were regaled with the ridiculous claim that Google would produce true AI. Since that time, their PageRank searches have degenerated so badly they can’t even keep spam parking pages out of their top 10 results. Google has hired some of the smartest people in the world and they’re getting exactly the same results JFK did when we put the best and brightest into his administration and launched us into Viet Nam. Big advances don’t come from he smartest and most influential and most prestigious people in the world — they come from a couple of unknown shlubs in a garage, like Jobs and Wozniak. All the “smartest people in the world,” all the world-class superstars, at Google are losing the search battle against the spammers and the hacks who park spam pages and make money off fraudulent adsense clicks.

  34. I dont think Google came up with the idea initially, because I recall I have heard about the p0rn industry using it years ago (and the rise of the idiotic captchas which I so much hate)

    It makes sense, but I must admit that I dont like Google benefitting from user’s data so readily …

  35. Deanston: “Didn’t MSFT spent years on voice recognition R&D?”

    Yep, one of Gates’ pet projects that was supposed to change how we operate our PCs…any day now. But it was targeted mostly for desktop apps and OS interaction.

    What Microsoft still hasn’t understood is the fact that when you amass massive quantities of data, like Google does, you can mine it for insight that doesn’t happen at the individual user or algorithmic level. Google’s “Did you mean this?” feature for “spell checking” is not some clever linguistic construct at all. It simply looks at frequency of next click after a word is misspelled millions of times by 70% of all online searchers.

    Likewise, if you want better phoneme detection, collect very large number of samples from a wide spectrum of people. Refine. Rinse and repeat.

  36. “Tellme is currently a content delivery portal for a few specific mobile devices, as opposed to a phoneme variation collection service.”
    This is not really true. Tellme fields several billion voice requests per year, of various types:

    http://www.tellme.com/about

    and this mass of acoustic data, which is by far the most valuable part of a good speech recognition engine ( the tech used by everybody is pretty much the same everywhere you go ), and the ability to keep collecting it in future via their user base, is almost certainly the largest reason why microsoft acquired them in the first place.

    This business about collecting ‘phonemes’ is also a bit off. In order to train a high quality ASR system what you need is *acoustic data* that reflects the characteristics of the speakers who will be using the system, and the acoustic environmental characteristics that users will be using the system in. Once you’ve collected a large corpus of acoustic data, you then need to transcribe it, using some standard phonetic alphabet, and usually a series of additional tokens used to indicate non-speech events, environmental sounds and other characteristics of the recordings. These transcriptions are then used in tandem with the recorded audio data in order to automatically train statically-driven acoustic models which are then used in combination with a language model to decode new test speech.

  37. Didn’t MSFT spent years on voice recognition R&D? What happened to that? I think MSFT still believes that they can always catch up to whatever others pioneer with their resources, and then dominate with their market spread. After all their product line moved from desktop OS and Office to the back-end Windows and SQL Server very successfully.

    But technology and culture are evolving so fast it’s hard to see how companies that lag behind in certain critical technology can catch up, let alone surpass its dominating rival unless that rival makes a dreadful mistake. Look at Palm, RIM, and others trying to mimic iPhone OS and SDK. SQL Server is well known for always catching up to the newest features and benchmarks set by the latest Oracle DBMS in its *next* version, but so far MS SQL has yet eclipsed Oracle in capabilities. The same will be true with search. Some of these newer technologies are very difficult to leapfrog. You have to build the foundation first. Phone companies laughed at Jobs thinking he didn’t have the experience or the technology to build a smartphone but in fact Apple had – OS X. Without being able to keep building on ever accumulating search usage, will even the hiring of Dr. Qi Lu offer MSFT the *fundamental research data* you need to help build the next, better search engine?

    One thing that Apple and Google have but MSFT may never have is *style*. Just look at their logos, artwork, and website design. To me that says a lot about how they think.

  38. Bob: “Then again, why bother giving a text index to a video or a picture?”

    Because, given our current technology, we can do much more with text than video. Once video text is transcribed and parsed, you can do a lot: run concordance, index, extract entities (people, places, companies, data…), auto-link, decouple text-processing from video distribution, integrate with ranking, networking, semantic or other structural processes, and so on. And you can do these today, not in some future date where all cancer is cured.

    There’s a good reason why some 70% of search runs on Google and nobody cares what Hitachi does with video, no matter how cool it otherwise might be.

  39. Then again, why bother giving a text index to a video or a picture? Hitachi automatically indexes picture/video images.

    Further, if video is the ‘new’ text for this millenium, then maybe Google should consider branching out. – Just spitballing!

  40. Bob: “You can index video frames / objects…”

    As I indicated above, video indexing isn’t new. What’s new for Google is the notion of indexing spoken words within videos so that they can then be integrated with the rest of the text-based Google ecosystem, including PageRank.

  41. … err… You can index video frames / objects appearing in video – Hitachi has a search engine for video and pictures. It even lets you upload a photo to find a image with a similar object, or a video with a similar object.

    Even Google has its betters.

  42. Pingback: Matt O’ Rama » Google is smarter than your company

  43. Minor correction: Google didn’t ‘come up’ with the image labeling game. They got it from a Carnegie Mellon researcher who developed the ESP game. (Incidentally I think the original version is much more fun as a game than the Google version, and the Goog-version certainly needs a less-lame name!)

  44. Being the fanboy that I am, my initial temptation would be to say that the folks at MS are just “stupid,” or “shortsighted,” or any one of a number of similar adjectives. However, I believe it goes much deeper than that. “Corporate culture” as a term is overused to the point of cliché, but it is quite descriptive in this instance. The culture fostered by Bill Gates in the early days of the industry, which stipulated that “in order for us to win, you must lose,” does not serve them well anymore. Ballmer’s statements concerning companies like Apple and Google only serve to indicate that they have great difficulty engaging in the kind of thinking that has made those companies successful since the dotcom bust.

    And I don’t really blame them much. They have a business model which has been wildly successful for them in the past. Their first instinct is to apply it to all of their future endeavors as well. But a slew of well-documented difficulties (Vista, Xbox, Zune, search, Windows Mobile) has perhaps finally shown them that this may not be the right approach. But then again, the rhetoric continuing to filter out of Redmond has demonstrated that they are being dragged toward this realization kicking and screaming.

    Up until recently, it has been quite fashionable to ridicule Google for pouring resources something “unmonetizable” (YouTube). How unfathomable to Ballmer and his cohorts. But, ah ha, we see that they had a plan all along. Does Microsoft have the ability to think that far ahead? It remains to be seen, but I would venture to say that their future depends upon it.

  45. Pingback: timlauer.org » Recently found items… - December 3, 2008

  46. Jared: “How does Microsoft’s Tellme Networks play into this”

    Hard to say what Microsoft is planning to do with Tellme strategically. While it does provide toll-free directory assistance, Tellme is currently a content delivery portal for a few specific mobile devices, as opposed to a phoneme variation collection service.

Comments are closed.