Si Si Blog - Culture

Write a Comment

Culture, Arts, Biographies, Dance, Education, Fashion, Literature, Music

Wired's Autocorrect article
by TheEtruscan at 17:13 September 28, 2014

To Wired Magazine Editors,

Sorry I am late but I spend some Summer months at the beach house where they don't deliver mail and the Wired magazine goes to my mail address. Sorry I don't do also smartphones, Facebook, Twitter or such other time wasting devilries.

Anyway I was dismayed that Gideon Lewis-Kraus in his article: "The History of Autocorrect" in issue 22.08 didn't bother to mention Soundex, Metaphone and other/similar phonetic algorithms.

The article makes believe that Microsoft's Dean Hachamovitch invented autocorrect. I wrote a spellchecker myself that doesn't require a database. It uses a randomly accessed dictionary and makes suggestions based on matches of metaphone encodings. There!

Soundex was first. It is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms. Improvements to Soundex are the basis for many modern phonetic algorithms.

Soundex was developed by Robert Russell and Margaret Odell in 1918 for the Census Bureau. Soundex became well-known in the 1960s when articles appeared in the Communications and Journal of the Association for Computing Machinery and especially in Donald Knuth's The Art of Computer Programming.

In 1990 Lawrence Philips published Metaphone. Metaphone fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.

That said, given enough computing power and memory availability it could be possible to do away with the encoding altogether and just keep endless varieties of all the words contained in a given dictionary.

This blurt (= blog article) is also being posted in my blog