Friday, January 4, 2013

Inside Bing's Spell Checker

According to Bing's data, their speller processes "tens of thousands" of queries every second and algorithmically processes and returns the corrections within "tens of milliseconds."

Inside Bing's Spell CheckerSpell checking your queries has been a part of the search experience for a while. Auto-correcting your spelling mistakes is almost expected nowadays. Recently Bing gave a sneak peak of what goes into making a great spell checking engine and how it makes search feel like magic to its users.

In a recent post, Dr. Jim Kleban, Bing R&D Program Manager explained that Bing's Speller processes tens of millions of data points that are mined from search queries, clickstream data, user actions and indexed web pages when it tried to guess what you're typing into Bing. Processing all those millions of data points and correcting misspellings or entire queries needs to happen in microseconds to be able to pass the new query to the actual search engine.

When attempting to unravel the mystery of a misspelling, Bing's speller works context clues in the query. Using data models, an algorithm attempts to look at all the words in a search query to figure out what the misspelled word should be.

Bing's algorithm also takes into account edit distance, what Kleban describes as the difference of individual letters of two distinct search queries. In its most common form, edit distance spelling errors occur when searchers attempt to type a word phonetically, as if they were pronouncing it.

