Understanding searchers and search engines

Ash Nallawalla14 November 2008, 5:55 PM

Understanding how people search with search engines, and how the engines work is critical for applying search engine optimization.


Search engines have been with us for over 10 years and they keep improving, particularly Google. We can search not only for Web pages, but also images, news, blogs, movie times, weather, and a host of other specialised searches. This broader view of search behaviour is important for SEO.

How do novice computer users write a search term? Typically they are not sophisticated searches and it could take two or more attempts to improve the search result. For example, if a novice searcher was planning to find an Indian restaurant in Melbourne, Australia, they might try:
  • Indian restaurant
  • Indian restaurant in Melbourne (might include results for Melbourne, Florida)


Figure1. An unqualified search for “Indian restaurant in Melbourne”

Next, they might restrict the results to Pages from Australia and realise that Melbourne is a large city and they had better search a specific suburb or two. A novice searcher at this point might type in turn:

  • Indian restaurant Malvern
  • Indian restaurant Collingwood
An experienced searcher might try something like this for their first search, with quote marks:
  • “Indian restaurant” Malvern OR Collingwood
You can see that searchers are unpredictable.


Figure2. Sometimes even Google gets confused about some Australian place names.

SEO customers sometimes assert unrealistic expectations, say, when they want to rank for a single word such as “camera”. Such a term is unfocused because it could be a film camera or a digital camera. Does the customer want to sell cameras or buy cameras?

You, the SEO, have to help them to clearly define the purpose of the website, say, “Kodak digital cameras for sale” so that you can check that the text contains related language.

HOW SEOS SEARCH

SEOs can use advanced operators to get useful information from Google. Details are at http://tinyurl.com/4tcdg. Here are some of them (please omit quote marks).

Allintitle:
If you search for “allintitle:Chrysler cars”, Google will only show URLs that have both the words “Chrysler” and “cars” in the title of a page.

Intitle:
If you search for “intitle:Chrysler cars”, Google will only show URLs that have the word “Chrysler” in the Title and “cars” in the body or title of the document.

Allinurl:
If you search for “allinurl:Chrysler cars”, Google will only show URLs that have both the words “Chrysler” and “cars” in the URL.

Inurl:
If you search for “inurl:Chrysler cars”, Google will only show URLs that have the word “Chrysler” in the URL and “cars” in the body or URL.

You will want to use such queries to understand why certain sites show up in the results and others do not. They are also handy for making searches.

Link:
The link: operator gives a limited list of sites that link to you, including your own internal links. This prevents reverse engineering the Google algo and is best regarded as entertainment value. Example: 
 link:www.ibm.com
Site:
The site: operator lists the pages indexed in the search engine. You can only see the first 1000. Example:
 site:apcmag.com

GUESSING THE ALGORITHMS

At online forums, SEOs love to guess various factors that make up the “secret sauce”, otherwise known as a search engine algorithm. Since Google is the predominant search engine, most of the focus has been on trying to understand its algorithm.
There isn’t one algorithm – there are many that come into play depending on the type of search. There are search ranking algos, image ranking algos, blog ranking algos, and so on. There isn’t much point in guessing, unless it is informed guesswork based on several observations. Another approach is to read its patent applications.

Most patents are not easy to understand, so by reading other people’s posts about them, you can try to guess what is in the Google algorithm. For example, the age of a domain name or the duration of its registration are believed to be ranking factors because they are in a patent.

SEARCH ENGINES

Understanding how a search engine works makes learning SEO easier. Read Google’s official explanation for how Google finds and ranks sites, and its technology overview.

At the heart of their search engine software is PageRank, a numeric value between zero and ten derived from the number of links to a site and their quality. Here is Google’s description:

“PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."
Important, high-quality sites receive a higher PageRank, which Google remembers each time it conducts a search. Of course, important pages mean nothing to you if they don't match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query.”

In reality, it is a lot more complicated and most of it is a closely guarded secret. PageRank is a relatively small ranking factor these days.

CRAWLERS, ROBOTS, SPIDERS

Search engines use programs known as crawlers/spiders/robots that follow links on a web page and thus “discover” other pages on a given website and then on other websites that are linked from this website.

A robot isn’t necessarily a search engine spider. It can be malicious code that mimics a search engine spider, but actually collects, for example, email addresses for spamming purposes.


SEARCH ENGINE INDEXING

Indexing is the proper classification and placement of website content in the search engine databases. For the user, this means that a search should result in the most relevant pages being listed in the search results.

A definitive description of Google’s indexing process comes from Google engineer Matt Cutts’ blog: , which is a rather long article but should be read at an early point in your SEO studies.

RANKING

While indexing merely implies that a given web page is “somewhere” inside the search engine document database, it does not mean that the page will appear for a particular search on the first page of results. Indexing is a precursor to ranking. Indexing begins with the search engine finding a page or through submission of the website home page address, or submission of its sitemap.

So, what happens when a searcher submits a search term? The algo determines which indexed pages are most relevant to that search term and this determines which URLs appear on the first page, second page and so on.

Today, search engine ranking algorithms pay particular attention to:

  • The age of the domain name
  • The “authority” value of the website
  • The link anchor text
  • Usage data
  • The page content, including the presence of keyphrases in the title, heading, and body text.

PAGERANK

I am deliberately avoiding saying much Google PageRank, a numeric value Google displays for a web page when you use the Google toolbar. It is a distraction for proper SEO when people worry about the numeric value, particularly when it goes down. I only get concerned when the green and white bar turns grey, that is, it is not even zero and this might indicate a penalty. A brand-new site that is not known to Google also shows a grey bar. This numeric value has very little significance for a page to rank high for a particular search phrase.


Figure 3. apcmagpro.com’s home page shows a PageRank of 4 out of 10.

TRUSTRANK

TrustRank is a link analysis technique that uses a set of seed pages that have been evaluated by a human expert and then looks for others that have the same characteristics. It is also described in a scholarly paper:

It is designed to show quality pages to a searcher while devaluing spammy pages. Google’s algorithm is also believed to use a similar technique. This is why a link from the BBC to some site will have more value than one from someone’s personal blog.

Next: Writing for humans and search engines
Previous: Introducing search engine optimization
All: The full SEO & Web Marketing series so far


Post your comment



anonymous user Anonymous user


May APC out  now!

Tags