Ash Nallawalla21 February 2009, 12:03 PM
Understanding how people search with search engines, and how the engines work is critical for applying search engine optimization.
Search engines have been with us for over 10 years and they keep
improving, particularly Google. We can search not only for Web pages,
but also images, news, blogs, movie times, weather, and a host of other
specialised searches. This broader view of search behaviour is
important for SEO.
How do novice computer users write a search
term? Typically they are not sophisticated searches and it could take
two or more attempts to improve the search result. For example, if a
novice searcher was planning to find an Indian restaurant in Melbourne,
Australia, they might try:
- Indian restaurant
- Indian restaurant in Melbourne (might include results for Melbourne, Florida)
Figure1. An unqualified search for “Indian restaurant in Melbourne”Next, they might restrict the results to
Pages from Australia
and realise that Melbourne is a large city and they had better search a
specific suburb or two. A novice searcher at this point might type in
turn:
- Indian restaurant Malvern
- Indian restaurant Collingwood
An experienced searcher might try something like this for their first search, with quote marks:
- “Indian restaurant” Malvern OR Collingwood
You can see that searchers are unpredictable.
Figure2. Sometimes even Google gets confused about some Australian place names.SEO
customers sometimes assert unrealistic expectations, say, when they
want to rank for a single word such as “camera”. Such a term is
unfocused because it could be a film camera or a digital camera. Does
the customer want to sell cameras or buy cameras?
You, the SEO,
have to help them to clearly define the purpose of the website, say,
“Kodak digital cameras for sale” so that you can check that the text
contains related language.
HOW SEOS SEARCH
SEOs can use advanced operators to get useful information from Google. Details are at
http://tinyurl.com/4tcdg. Here are some of them (please omit quote marks).
Allintitle:If
you search for “allintitle:Chrysler cars”, Google will only show URLs
that have both the words “Chrysler” and “cars” in the title of a page.
Intitle:If
you search for “intitle:Chrysler cars”, Google will only show URLs that
have the word “Chrysler” in the Title and “cars” in the body or title
of the document.
Allinurl:If
you search for “allinurl:Chrysler cars”, Google will only show URLs
that have both the words “Chrysler” and “cars” in the URL.
Inurl:If
you search for “inurl:Chrysler cars”, Google will only show URLs that
have the word “Chrysler” in the URL and “cars” in the body or URL.
You
will want to use such queries to understand why certain sites show up
in the results and others do not. They are also handy for making
searches.
Link:The
link: operator gives a limited list of sites that link to you,
including your own internal links. This prevents reverse engineering
the Google algo and is best regarded as entertainment value. Example:
link:www.ibm.com
Site:The site: operator lists the pages indexed in the search engine. You can only see the first 1000. Example:
site:apcmag.com
GUESSING THE ALGORITHMS
At
online forums, SEOs love to guess various factors that make up the
“secret sauce”, otherwise known as a search engine algorithm. Since
Google is the predominant search engine, most of the focus has been on
trying to understand its algorithm.
There isn’t one algorithm –
there are many that come into play depending on the type of search.
There are search ranking algos, image ranking algos, blog ranking
algos, and so on. There isn’t much point in guessing, unless it is
informed guesswork based on several observations. Another approach is
to read its patent applications.
Most patents are not easy to
understand, so by reading other people’s posts about them, you can try
to guess what is in the Google algorithm. For example, the age of a
domain name or the duration of its registration are believed to be
ranking factors because they are in a patent.
SEARCH ENGINES
Understanding how a search engine works makes learning SEO easier. Read Google’s official explanation for how Google
finds and ranks sites, and its
technology overview.
At
the heart of their search engine software is PageRank, a numeric value
between zero and ten derived from the number of links to a site and
their quality. Here is Google’s description:
“PageRank
relies on the uniquely democratic nature of the web by using its vast
link structure as an indicator of an individual page's value. In
essence, Google interprets a link from page A to page B as a vote, by
page A, for page B. But, Google looks at more than the sheer volume of
votes, or links a page receives; it also analyzes the page that casts
the vote. Votes cast by pages that are themselves "important" weigh
more heavily and help to make other pages "important."Important,
high-quality sites receive a higher PageRank, which Google remembers
each time it conducts a search. Of course, important pages mean nothing
to you if they don't match your query. So, Google combines PageRank
with sophisticated text-matching techniques to find pages that are both
important and relevant to your search. Google goes far beyond the
number of times a term appears on a page and examines all aspects of
the page's content (and the content of the pages linking to it) to
determine if it's a good match for your query.”In
reality, it is a lot more complicated and most of it is a closely
guarded secret. PageRank is a relatively small ranking factor these
days.
CRAWLERS, ROBOTS, SPIDERS
Search engines use
programs known as crawlers/spiders/robots that follow links on a web
page and thus “discover” other pages on a given website and then on
other websites that are linked from this website.
A robot isn’t
necessarily a search engine spider. It can be malicious code that
mimics a search engine spider, but actually collects, for example,
email addresses for spamming purposes.
SEARCH ENGINE INDEXING
Indexing
is the proper classification and placement of website content in the
search engine databases. For the user, this means that a search should
result in the most relevant pages being listed in the search results.
A definitive description of Google’s indexing process comes from Google engineer
Matt Cutts’ blog: , which is a rather long article but should be read at an early point in your SEO studies.
RANKING
While
indexing merely implies that a given web page is “somewhere” inside the
search engine document database, it does not mean that the page will
appear for a particular search on the first page of results. Indexing
is a precursor to ranking. Indexing begins with the search engine
finding a page or through submission of the website home page address,
or submission of its sitemap.
So, what happens when a searcher
submits a search term? The algo determines which indexed pages are most
relevant to that search term and this determines which URLs appear on
the first page, second page and so on.
Today, search engine ranking algorithms pay particular attention to:
- The age of the domain name
- The “authority” value of the website
- The link anchor text
- Usage data
- The page content, including the presence of keyphrases in the title, heading, and body text.
PAGERANK
I
am deliberately avoiding saying much Google PageRank, a numeric value
Google displays for a web page when you use the Google toolbar. It is a
distraction for proper SEO when people worry about the numeric value,
particularly when it goes down. I only get concerned when the green and
white bar turns grey, that is, it is not even zero and this might
indicate a penalty. A brand-new site that is not known to Google also
shows a grey bar. This numeric value has very little significance for a
page to rank high for a particular search phrase.
TRUSTRANK
TrustRank is a link analysis technique that uses a
set of seed pages that have been evaluated by a human expert and then
looks for others that have the same characteristics. It is also
described in a
scholarly paper:
It
is designed to show quality pages to a searcher while devaluing spammy
pages. Google’s algorithm is also believed to use a similar technique.
This is why a link from the BBC to some site will have more value than
one from someone’s personal blog.