One of the biggest factors in SEO is keywords. It makes sense because users usually find websites by searching for keywords. So let’s make an example. Let’s say we have ten possible websites and we want to find the one most relevant to our query. Let’s say we are looking for “The Lion King” and we type it into the search bar. If we rank the websites by how many times those three keywords mentioned, we should find the right one, right? Wrong.

The reason is that this method counts all the keywords as being equal. The answer to this problem is to check for term frequency. They way we do this is by giving priority to the words that appear less frequently in the ten documents. Out of the ten documents, let’s say ten mentions the word “The”, four mentions the word “King” and only 2 mention the word “Lion”. What we do now divide the number of time the word appears by the amount of documents it appears in. So if one document mentioned “the” 100 times, the score would end up being only 10, because the is such a common word. A document about Lions that mentions “Lion” 40 times would rank this page because the document contains a much more scarce keyword.

Note two things here. The first is that “the” “Lion” and “King” are all words that can be understood by a computer. Together, they bring to mind a very popular movie that everyone is familiar. The searching algorithm doesn’t understand “The Lion King” like we do, so it has to be able to figure this out using words it does know. The second thing to note here is that there are two common strategies with keywords. You could aim to capture a small amount of traffic for a large keyword like “pets” or you could aim to capture a large chunk of traffic from a smaller keyword like “shih tzu”.

Experts say that the latter is more effective because it is better to give people information that is relevant to them. No one will click on your shih tzu website if they are looking for canaries, so it is better to focus on your target market. So we correct for term frequency in the search engine, are we done yet? No. The reason is that there are still certain irrelevant documents that will outrank relevant ones. The reason for this is document size. A 400-page book about anything is bound to outrank a 1-page article about The Lion King. What we need to do is determine the relative frequency of the term, that is, how many times it appears per line of text. Once the algorithm checks for that, it will return the document that most frequently mentions the keywords.

So let’s say we are down to two documents competing for the top spot. One is about Disney’s The Lion King, and the other is about famous monarchs that keep Lions as pets. Both will mention the search terms “Lion” and “King” in equal frequency, but one is clearly more relevant. The way the algorithm finds this is by checking for a few different variables. One important variable that gets checked is how closely the terms mentioned to one another. Another important variable is the order in which they mentioned. The sentence “The Lion King was a very popular movie” clearly outranks “The king of Persia used to keep a Lion as his pet” because the first sentence has both the correct order and all the terms are together. So to review, we have sometimes keywords are mentioned in the document, giving priority to the document that has more rare keywords as well as documents that mention them very frequently. We also give priority to the documents that mention these keywords close together and in the correct order. Seems simple enough, right?

The last variable we will include is where the words appear in the document. Picture the raw score we found from before and now imagine that is to multiply if the keywords are in the title. So if the document is called “The Lion King” it would easily outrank a document called “Popular Disney Movies”. It leads us into our next chapter about content.

