Flaws in Google’s “Exposé”

I am unimpressed by Google’s “sting” operation and their accusations (link) against Bing. All Google did was expose a vulnerability in Bing’s algorithm and then cry wolf. Google should have conducted two more experiments before coming to their current conclusion. First, I will explain why the current operation does not mean anything (partly explained here). Second, I will discuss other experiments that Google needed to do before making their accusation.

Going back a few years, a search term, “miserable failure,” linked to George W Bush. This “Google Bomb” was achieved in the following way: Suppose a link reads ‘apples’, but links to a webpage on oranges, Google makes an association between apples and the webpage on oranges. If there were 500 webpages had links reading ‘apples’ but pointed to oranges, Google search on ‘apples’ would have led to the webpage on oranges. Similar tactics were used on the search term, “miserable failure,” to lead it to George W Bush’s site. Today, Google’s algorithm is robust enough not to fall for the same tactic.

Bing also makes similar associations based on search strings (used in Google. Amazon eBay etc.) and the websites clicked by the users. Google hard-coded some synthetic, gibberish search string to lead to a specific webpage. After fifteen days of using IE to search for the same synthetic queries and then clicking the link, Bing made the association between the search strings and the web pages. This happened to less than 10% of the synthetic queries. Google then accused Bing of copying their result. Isn’t Google’s experimental results similar to Google Bombs?  The only difference is the source of the data. This just exposed the vulnerability of Bing, and nothing more. Yes, the data comes from the publicly-available results from a rival company. It was not intentional (benefit of the doubt). In the next update, Bing should try and reduce its dependence on that to be fair.

Two more experiments are needed to conclude that Bing copies Google. First experiment is look for common queries that give different results in Google and Bing. Google should freeze the results for that query and then observe Bing to see if the results on Bing change over time to match Google’s results. Second experiment is to look for commonly-used queries that give same results in both Google and Bing. Google should manually change their results and then wait for Bing to reflect the same changes. If Google had designed these experiments and disclosed the results, I would have given them some credit.

I am sure Google knows how to derive conclusions from a set of experiments. They should know better than conclude that Bing is copying Google based on their synthetic-query experiment. Google’s PageRank algorithm and search results (by Google and others) rely of statistical aggregates to work efficiently. Synthetic queries are not statistically significant. Google should not be crying wolf.

Update: Don’t you think that Google looked for examples other than “torsoraphy” and did not find any? If they did, don’t you think they would have reported that?

3 Replies to “Flaws in Google’s “Exposé””

  1. i would beg to differ. google said that bing copies its results, which is different from saying that bing is a ripoff off google. it is obvious that all search engines will perform well on frequent queries. Google success has been due to its performance in the “long tail” of search terms. This experiment highlights that for less frequent queries, bing is heavily relying on google, something that they have partly acknowledged (officially).

  2. “heavily”???? 9 of 100 synthetic queries is not “heavily.” Also, in research, it is common to show that your algorithm performing better than others by fine-tuning the constants in your formula but not doing the same in the other algorithms.

    In this case, Google was the only input signal. For less-than-common queries, there are other input signals and Bing, presumably, will not rely on Google. I did say, “In the next update, Bing should try and reduce its dependence on that to be fair.”

    Google found out Bing’s vulnerability to cases where Google is the only input signal. It’s a super-specific case. Google will be vulnerable, too. It is just the question of finding out what it is.

