May 28, 2018

Issues with Google's Spellchecker and Spotlight

A problem with machine-learning/predictive algorithms is you don't really know what's going on in the back (I use the term machine-learning loosely). When a computer program runs on simple rules, you are able to predict mistakes that the computer may make and preemptively correct for it. If a computer program is using black box rules that may not even stay consistent over time, you no longer feel in control. You are at the mercy of whatever behaviour the program decides to do that instance.

An outstanding example of this type of problem is the Google Docs spellchecker. What was once a standard word-processor feature from almost prehistoric times, the spellchecker has been through some evolution recently. As a baseline, I know that Microsoft Word's spellchecker is consistent. I know that typing my name, "Yoonsik", into Word would result in the classic squiggly red underline. Right click, and it would tell me to AutoCorrect to "Coonskin". Google Docs also used to run on a simple dictionary for spellcheck features. However, during the past few years, Google Docs now uses a predictive spellchecker, meaning that words like "Yoonsik" and "Haploinsufficiency", which shows up misspelt in Microsoft Word, are now considered to be spelled correct. This information must be coming from Google's own web crawling sources. That's great right?

However, this is where the inconsistent behaviour begins. Here are a select few of many, many spelling errors that I have encountered. In the following images, why is "haplo-insufficiency" considered spelled wrong but correct if the word next to it also receives a hyphen?

before
after

Also, why is the phrase "ot" sometimes considered misspelt, but sometimes not?

Google Docs spellchecker is an example of where machine learning models have an incomplete understanding of a domain, but fails to err on the side of caution. Spellcheckers are supposed to catch mistakes, not occasionally catch spelling errors if it feels like it or change behaviours because a their search results are starting to use a phrase. This is worrying to me because I've already been hindered in the past after the Google Docs spellchecker failed to catch spelling mistakes. These are spelling errors that Microsoft Word would clearly show. With something like a spellchecker, the model should be erring on the side of Type I errors (false positives) and never Type II errors (false negatives). If this is Google's attitude towards something so obvious, then maybe this is indicative of their attitude towards other domains where Type I errors can have more significant consequences. If Google ventured into health systems and developed machine learning models for a common cancer, then perhaps Google's model would fail to alert the possibility of cancer for a patient. Hopefully this is just hyperbole, but I hope you understand my point.

This trend causes similar confusion whenever I use Spotlight on macOS. Spotlight has some predictive learning autocomplete features, based on your previous typing history. Take a look at what happens as I attempt to open Adobe Photoshop. I start with the letter 'p'.

Screen-Shot-2018-05-28-at-10.52.16-AM

At a typing speed of at 6.5 characters/second, and a reaction time of 200ms, I will have already typed 1 more character after the Photoshop logo flashes across the screen before reacting. It's really difficult to predict when to stop typing, and I end up typing and pausing very often. Should I just be memorizing the inconsistent predictions provided by macOS for every application? Finally, the UI makes me quite angry. The Spotlight prompt implies it is has correctly matched the application by showing the rest of the application name in gray. It makes me expect that if I type another letter, Spotlight will continue to match it. However, it switches to another application as soon as I type another character. I don't think Apple will be able to fix this in the near future...

Yoon's Blog

ABOUT ME

Issues with Google's Spellchecker and Spotlight