# WordSmith – Building a Program that Plays Scrabble

Last fall, I finished the MIT Challenge. While the challenge was exciting and educational, the tight deadline didn’t give me any time for extracurricular projects. When I finished, I wanted to work on a small and fun project that would test some of the things I learned.

The result was WordSmith, a program that lets you play the game Scrabble against a computer. I’ve made the project free and open-source, so anyone can play it and also see how it works. Visit the download page to try it out.

For the rest of this article, I’m going to explain how the program works. While it’s not tremendously complex, it does show a little of the computer science knowledge I picked up through the MIT Challenge. If CS or Scrabble don’t interest you at all, feel free to skip this article, as I’ll be back next week with my normal writing.

## Building an AI that Plays Scrabble

I’ve always enjoyed the game Scrabble. The game requires a mixture of vocabulary, strategy, pattern-recognition and luck. Unlike chess, AI-researchers’ favorite game, luck and outside knowledge play a role. Even a perfect Scrabble player may still lose due to unfortunate letters or an unlikely countermove.

Because of this, I wanted to see if I could make a computer that plays Scrabble. Not only because then I could play a game I enjoy without needing a ready opponent, but also because a strong computer opponent would teach me more about the game.

So, I built WordSmith. It was a fun project, not only because I enjoy Scrabble, but because the problem was interesting enough to require clever algorithmic tricks, but not so tricky I couldn’t do it part-time over three weeks while on vacation in Paris.

## How WordSmith Plays Scrabble

WordSmith plays by searching the board for all possible tile placements and then chooses the best one. It evaluates this not only based on how many points a particular move gives, but also by using heuristics to avoid technically higher-scoring moves which are probably detrimental in the long-run.

When playing the AI, you’ll notice a blue progress bar as the AI “thinks”. What it is actually doing is trying every combination of letters it can play and keeping track of which moves are both valid and score enough points. I’ve built the AI to give up after 15 seconds (which is why the blue bar doesn’t always finish), but at least on my computer it does retrieve the highest possible scoring word over 95% of the time.

This doesn’t mean WordSmith is impossible to beat, however. I’ve beaten it roughly 10% of the time in the matches I’ve played. Scrabble is a lot of luck and often the technically best word isn’t considerably higher scoring than one an advanced player would pick.

WordSmith also uses heuristics. A heuristic is a fancy CS way of saying “rule of thumb”, meaning a simplified rule that will tend to increase performance when applied. I had started developing this capacity in the program, but there’s always the possibility of better heuristics, so this can give human players a strategic advantage.

The only heuristic I’ve tested that has made it into the final build of WordSmith is one that is conservative when playing good tiles. Blanks and high-point letters (Q, Z, X and J) are incredibly valuable, so if WordSmith can’t find a good use for it in the first turn it finds one, it will wait.

This omits some of the obvious rules of thumbs that human players actually use. Leaving a triple-word score spot open is dangerous, but WordSmith is oblivious to this misstep. Similarly, playing long words opens up more possible countermoves than defensive, compact playing.

In technical terms, ignoring the timeout feature, WordSmith is statically-perfect, but not dynamically-perfect. This means a clever player can outmaneuver WordSmith even though WordSmith never fails to recognize the highest-scoring word.

## How I Built the Program

Now that I’ve explained how WordSmith works, I’m going to explain how the algorithm was actually designed. I’m going to try to be as non-technical as possible, but I want to also try to cover some interesting details.

The first step was to build a program that allows humans to play Scrabble. This was not too difficult, although the scoring algorithm was more complex than I had anticipated.

Once I could play Scrabble by myself, and ensure the rules were upheld, it was time to build a computer that could do the playing automatically. The basic algorithm is fairly simple:

1. Make a set of “slots” where all possible moves can be placed. Don’t consider the individual tiles yet, just look at where it is possible to place tiles in a valid move and make a list of these.
2. For each slot, try placing all permutations of your tiles onto the board. For blanks, that also means trying every single one of the 26 possible letters.
3. For each possible play, check whether it forms all valid words (not only in the principle direction, but in all new crosswords) and calculate the points. Keep track of the highest scoring play while you proceed.

This uses a very common AI strategy called searching. If you can visualize this algorithm, it is as if the computer is scanning through the space of all possible moves and looking for the best one.

## Speeding up the AI

There’s just one problem with this approach—it’s incredibly slow. Mid-game plays may consider a few thousand slots, for each of these there are over 5000 permutations and over 3 million if we consider blanks. The scoring algorithm is not trivial either, so this means that considering all options in a brute-force way can take several hours or more on a modern computer. I wanted an AI that could play in around a dozen seconds.

From here a lot of tricks were used to prune the search space. I’ll briefly describe a couple:

1. Scoring is complex, testing whether the main word is valid is easy (O(1) hash-table lookup, for the CS geeks). Only words that were valid along the principle direction would be scored thoroughly.
2. Before placing a blank, figure out if there are any words which match that blank configuration. X_YGYK doesn’t mean anything with any letter, so we can skip trying all 26 times.
3. Since blanks have zero points, we only need to figure out if one letter assignment is valid, since all others will result in the same score. WI_ is the same whether you played WIZ or WIN (provided all crosswords are also valid in both cases).
4. The algorithm for calculating possible slots can produce duplicates, so making sure every slot is unique can cut the processing by 30-50%.

All of these algorithmic optimizations are still correct—they will still always produce the best scoring word, no exceptions. Even so, because the algorithm depends on the number of possible tiles (and blanks), considering every possible option can still take a long time in extreme cases.

I didn’t want to ever have to wait a few minutes to play my turn, even in an unusual case. To fix this, I set a time limit that the computer cannot exceed when deciding its move.

This also made use of another trick. By sorting the tile slots by length, I could make sure the small words (which are much faster to compute all possibilities for) are tested before long words. Using this method, I could then calculate that the algorithm is perfect 95% of the time with a time limit of 15 seconds on my machine.

This type of algorithm is also a common CS trick, very similar to the idea of lossy compression. The idea is that if you can still preserve most of the information (or in this case, computation) but you drastically reduce size or processing time, the tradeoff can be worth it. This is one reason why videos load so much faster than GIFs—one is intelligently compressed and the other is not.

## Statistics and Other Fun Stuff

Once that was complete, the program was essentially finished. I expanded it to include a few other cool tricks that can give some insights into how to win at Scrabble.

One question I considered was, what are the most important words to know in Scrabble?

Now that I had a program that played each Scrabble move (statically) perfectly, I could actually test this. So I had the AI play against itself for nearly a sixty thousand moves and compiled the results. Some interesting factoids:

1. The most useful word in Scrabble is ‘QI’. (An accepted spelling of chi, the Chinese life force). This is followed by RE, ER, YE, IN, IT, TI, ZA. In fact, ‘QIS’, ‘QAT’ and ‘AYE’ are the only three-letter words to break the top 100.
2. Short words are far more useful to know than long ones. The most-used 5-letter word was 50x less common than the most-used 2-letter word.
3. The blank is the most useful tile, worth on average 12.9 more points. The letter ‘G’ is the least useful.

These statistics helped me train the algorithm. By collecting tile worth, I developed a heuristic which discounts high-value tiles so the computer won’t waste them in play. I then ran this modified AI against its predecessor a few hundred games and could show that its win-loss ratio was statistically significant.

I had hoped to build other heuristics, such as one that would avoid playing words that enabled high-point countermoves (such as leaving a precious triple-word score vulnerable). However testing these heuristics showed the AI performed worse, meaning either my formulation of the heuristic was misinformed, or that rule of thumb is less useful to my AI than I had previously believed.

One unfinished feature in the program was going to be a sliding difficulty scale. I had considered trying to limit the computer’s vocabulary to use only commonly known words. My strategy involved using Google’s n-gram library to set a threshold vocabulary so the computer wouldn’t use unusual words.

## Scrabble and Having Fun

I didn’t make this program for any reason other than for the fun of making and playing it. As such, I’m releasing it free and open source.

The game isn’t as polished as many commercial applications. Mostly because UI design isn’t my specialty and I did all the graphics and sound myself. Also because working on the AI was what interested me for the application, so I didn’t spend much time making it pretty.

The code is also a lot messier than I had hoped. This was partially some poor planning, and partially because the algorithm grew more complex than I had expected as I added heuristics, statistics and other details.

The program is open-source, so if anyone wants to make improvements or modifications to it, feel free to send them to me. If they are worthwhile I’ll be happy to showcase them along with my own edition. In particular, I’m interested to see if anyone can demonstrate if different heuristics can reliably outperform my own WordSmith without changing the timeout on the algorithm.

However, if you just want to play the program, that’s fine too. I’ve even included a training mode that lets you check what the computer thinks you should play, so don’t worry if you’re not the best Scrabble player. Just go to the download page to get a copy.

• http://rev-development.com Rick

It would be really cool if you uploaded the project to gihub

• http://thepianoandthefish.wordpress.com J. Wibble

Thanks for posting this, it’s a great game and very challenging. Only one problem I’ve had is that I’ve tried to change the word list and the game won’t acknowledge the new words. I tried replacing scrabblewords.txt with a new file with the same name, copying and pasting a complete list and adding words individually (always making sure they are in capitals and one line for each word as per the instructions) and none of them work. I’m used to playing with the international SOWPODS list and it’s frustrating when the game won’t accept very useful words like CH or DI.

• http://www.scotthyoung.com Scott Young

J. Wibble,

You need to change scrabblewords_usage.txt, I miswrote the instructions. There is a number that goes next to the word, but it isn’t being used and the parser should read it regardless of whether a number is included. Let me know if you have problems!

-Scott

• http://thepianoandthefish.wordpress.com J. Wibble

That worked, thanks!

• http://evansendra.com E

That’s a really awesome use of the knowledge from the challenge Scott. This must have been really fun/challenging to write!

• http://-- Alex

Hi, amazing program. Is possible use this for other languages?, how do you would do?.

• http://www.scotthyoung.com Scott Young

Alex,

Yes–but the letters and their score values are hard-coded, so you’d need to modify the source code to either add letters, change the distribution or change their points value (I don’t know their setup for international versions of Scrabble). If you want to simply use a foreign dictionary (provided it doesn’t use any non-English characters like ç or é you just need to replace scrabblewords_usage.txt with the contents of a different word list)

-Scott

• Jose miguel

Hey scott, i find this much interesting and challenging that the Mit challenge you did..

Congrats. Excelente.

• http://www.incogdai.com Ashim

This is pretty awesome, helped me a lot in my assignment.

• http://logicmason.com/ Mark

“The program is open-source, so if anyone wants to make improvements or modifications to it, feel free to send them to me.”

Dude. This is exactly what github is for. If you’re serious about wanting to share the code and being interested in contributions, put it in a github repo. It’s where the kinds of people who would read your source code hang out and it will drastically lower the friction for both readers and contributors. It also streamlines the process of dealing with pull requests (i.e. people requesting that you take their improvements into your project).

• http://www.scotthyoung.com Scott Young

Mark,

There is a GitHub repo for the project!

-Scott

• dave

I believe you could speed up the search process a lot by avoiding checking all of the permutations of your rack letters. Here’s how…

Find a list of all of the words you can make with the tiles on your rack using a prime number factorization trick

Assign a unique prime number to every letter of the alphabet

A – 3
B – 5
C – 7

Then for every word, you get a (not unique) word code formed by taking the product of the letters

CAB = (3*5*7) = 105

If the word code formed with all of the letters on your rack divide evenly (without remainder) into the word code of a word in your dictionary, then you can form that word.

This lets you quickly generate a list of words that you can form with all of the tiles on your rack in only O(N) time where N is the size of your dictionary, avoiding the combinatorial problem of forming permutations. I believe this list will generally never exceed a few hundred words

Now as you proceed through your scoring algorithm you only ever need to consider plays that involve forming one of these words. I would guess that the whole process should be achievable in under a second on most modern CPU’s

Having fun playing with your game, thanks!

• http://www.scotthyoung.com Scott Young

Dave,

Really interesting. Most word-checks aren’t based on the tiles you hold, but combinations with the board. Playing words beside each other is quite common, so you may need to check 5-6 words or more for validity.

I found the biggest help was pruning the total combinations and, then, when addressing permutations, doing something like you listed.

Technically my algorithm is O(n!) because it does need all permutations. However, Scrabble is limited to 7 tiles per play, so pruning the tree and eliminating extraneous possibilities is faster than trying to simplify the order of the algorithm.

I could probably get the thing running a lot faster, but it would require switching to a different language and running the code through fewer abstractions. Python just isn’t very fast…

-Scott

• Jeff Axelrod

Thanks for the great article! I was thinking it would be fun to have an AI scrabble competition and happened upon your project. I didn’t read any discussion of turning in tiles. Does the AI consider this?

• http://www.scotthyoung.com Scott Young

Jeff,

I hadn’t included the possibility that the AI would return its tiles. It seemed a rather complex problem for what seems to be a rather costly, and therefore rarely used, maneuver.

-Scott