Min-Maxing Your First Wordle Guess

February 6, 2022 wordle python ☕️ buy me a coffee

My pal showed me Wordle for the first time the other day and I instantly wanted to know what the objectively best starting guess was. Some may argue that that’s not in the spirit of the game and that it should be a daily challenge, but those people probably don’t spend 45 minutes on their first game (that’d be super awkward right…?).

TL;DR: oreas, arose, seora, aries, arise, raise, serai, aesir

Firstly we need to figure out some heuristic to measure the goodness of a word. The most intuitive would be to pick a word with the most frequently used letters within the set of 5 letter words. Thus, we can measure a words score using

\[p(\textbf w) = \sum^5_{i=0} s(w_i),\]

which translates to the sum of the frequencies of each individual letter.

Enough maths. Let’s get into the code.

Next, let’s get a word list. I chose this one because it was the first one I found. Simple. We’ll read this into memory, but only the 5 letter words.

words = []
with open('words_alpha.txt') as f:
    for line in f.readlines():
        line = line.strip()
        if len(line) == 5:
            words.append(line)

This results in 15918 possible words. My guess the Wordle backend uses a wordlist similar, if not exact, to this one.

We then need to count all the letters and divide by the total amount of letters to get each letter’s frequency.

big_word = list("".join(words))
letter, count = np.unique(big_word, return_counts=True)
freq = count/len(big_word)

This results in a nice graph with a surprising result.

png Most frequent letters in the set of all 5 letter words

a seems to be the most common letter even though it’s a commonly known fact that e is th most used letter in the english language! This is (probably) because we’re limited to the set of 5 letter words, and no all words.

Another optimisation on the quest for the best starting word, is optimising the word to exclude repeating letters. Having skiing, for example, would be a waste of a letter so let’s employ a nifty hack using set.

print(f"{'e'*9}, {len('e'*9)}, {len(set('e'*9))}")
eeeeeeeee, 9, 1

As you can see above, we can easily filter for words that have multiples if the set object containing that word has a length less than 5.

filtered_words = []
scores = []

for word in words:
    if len(set(word)) == len(word):
        score = np.sum([freq_df.loc[letter] for letter in list(word)])
        scores.append(score)
        filtered_words.append(word)

And just like that we’ve got a tie for the top word. All of these 8 words use the 5 most frequent letters (a, e, s, o, r).

png The top 10 objectively best words to use for your first guess in Wordle

❗ Disclaimer - I am terrible at this game, and these words might be terrible too for various reasons. Use at your own risk.

Min-Maxing Your First Wordle Guess

Related Posts

March 22, 2022

Pydantic vs Protobuf vs Namedtuples vs Dataclasses. Which Python Data Class Is Best?

October 3, 2021

Scheduling Backups To OneDrive For Paperless-ng Using RClone

September 25, 2021

Paperless-ng On Raspberry Pi With Email And Samba