Beneath The Peaceful Waters, There Runs A Current
All Rights Reserved © 2009 Michelle St. Clair by way of The Current News Magazine, LLC
540 East Osage, Pacific, MO 63049
Ph: 636-271-0990 Fax: 636-271-0901

The Current News Magazine
WebSolutions
WHAT IS IT?
SEO is the active practice of optimizing a web site by improving internal and external aspects in order
to increase the traffic the site receives from search engines.
SEO: WHAT IS IT AND WHY DO I NEED IT?

SEO=Search Engine Optimization

WHY IT IS IMPORTANT!
Why does my company/organization/website need SEO?

The majority of web traffic is driven by the major commercial
search engines - Yahoo!, MSN, Google & AskJeeves (although
AOL gets nearly 10% of searches, their engine is powered by
Google's results). If your site cannot be found by search
engines or your content cannot be put into their databases,
you miss out on the incredible opportunities available to
websites provided via search - people who want what you
have visiting your site. Whether your site provides content,
services, products, or information, search engines are a
primary method of navigation for almost all Internet users.
Search queries, the words that users type into the search box which contain terms and phrases best
suited to your site, carry extraordinary value. Experience has shown that search engine traffic can
make (or break) an organization's success. Targeted visitors to a website can provide publicity,
revenue, and exposure like no other. Investing in SEO, whether through time or finances, can have an
exceptional rate of return.

Why can't the search engines figure out my site without SEO help?
Search engines are always working towards improving their technology to crawl the web more deeply
and return increasingly relevant results to users. However, there is and will always be a limit to how
search engines can operate. Whereas the right moves can net you thousands of visitors and
attention, the wrong moves can hide or bury your site deep in the search results where visibility is
minimal. In addition to making content available to search engines, SEO can also help boost rankings
so that content that has been found will be placed where searchers will more readily see it. The online
environment is becoming increasingly competitive, and those companies who perform SEO will have a
decided advantage in visitors and
customers.
HOW SEARCH ENGINES OPERATE
Search engines have a short list of critical operations that allows them to provide relevant web
results when searchers use their system to find information.

Crawling the Web
Search engines run automated programs, called "bots" or "spiders", that use the hyperlink structure of
the web to "crawl" the pages and documents that make up the World Wide Web. Estimates are that
of the approximately 20 billion existing pages, search engines have crawled between 8 and 10 billion.
Indexing Documents
Once a page has been crawled, its contents can be "indexed" - stored in a giant database of
documents that makes up a search engine's "index". This index needs to be tightly managed so that
requests which must search and sort billions of documents can be completed in fractions of a second.
Processing Queries
When a request for information comes into the search engine (hundreds of millions do each day), the
engine retrieves from its index all the documents that match the query. A match is determined if the
terms or phrase is found on the page in the manner specified by the user. For example, a search for
car and driver magazine at Google returns 8.25 million results, but a search for the same phrase in
quotes ("car and driver magazine") returns only 166 thousand results. In the first system, commonly
called "Findall" mode, Google returned all documents which had the terms "car", "driver", and
"magazine" (they ignore the term "and" because it's not useful to narrowing the results), while in the
second search, only those pages with the exact phrase "car and driver magazine" were returned.
Other advanced operators (Google has a list of 11) can change which results a search engine will
consider a match for a given query.
Ranking Results
Once the search engine has determined which results are a match for the query, the engine's
algorithm (a mathematical equation commonly used for sorting) runs calculations on each of the
results to determine which is most relevant to the given query. They sort these on the results pages
in order from most relevant to least so that users can make a choice about which to select.
Although a search engine's operations are not particularly lengthy, systems like Google, Yahoo!,
AskJeeves, and MSN are among the most complex, processing-intensive computers in the world,
managing millions of calculations each second and funneling demands for information to an enormous
group of users.

Speed Bumps & Walls
Certain types of navigation may hinder or entirely prevent search engines from reaching your
website's content. As search engine spiders crawl the web, they rely on the architecture of
hyperlinks to find new documents and revisit those that may have changed. In the analogy of speed
bumps and walls, complex links and deep site structures with little unique content may serve as
"bumps." Data that cannot be accessed by spiderable links qualify as "walls."

Possible "Speed Bumps" for SE Spiders:
URLs with 2+ dynamic parameters; i.e. http://www.url.com/page.php?id=4&CK=34rr&User=%Tom%
(spiders may be reluctant to crawl complex URLs like this because they often result in errors with
non-human visitors)
Pages with more than 100 unique links to other pages on the site (spiders may not follow each one)
Pages buried more than 3 clicks/links from the home page of a website (unless there are many other
external links pointing to the site, spiders will often ignore deep pages)
Pages requiring a "Session ID" or Cookie to enable navigation (spiders may not be able to retain these
elements as a browser user can)
Pages that are split into "frames" can hinder crawling and cause confusion about which pages to rank
in the results.

Possible "Walls" for SE Spiders:
Pages accessible only via a select form and submit button
Pages requiring a drop down menu (HTML attribute) to access them
Documents accessible only via a search box
Documents blocked purposefully (via a robots meta tag or robots.txt file)
Pages requiring a login
Pages that re-direct before showing content (search engines call this cloaking or bait-and-switch and
may actually ban sites that use this tactic)
The key to ensuring that a site's contents are fully crawlable is to provide direct, HTML links to each
page you want the search engine spiders to index. Remember that if a page cannot be accessed from
the home page (where most spiders are likely to start their crawl), it is likely that it will not be
indexed by the search engines. A sitemap can be of tremendous help for this purpose.

Measuring Relevance and Popularity
Modern commercial search engines rely on the science of information retrieval (IR). That science has
existed since the middle of the 20th century, when retrieval systems powered computers in libraries,
research facilities, and government labs. Early in the development of search systems, IR scientists
realized that two critical components made up the majority of search functionality:

Relevance -
the degree to which the content of the documents returned in a search matched the user's query
intention and terms. The relevance of a document increases if the terms or phrase queried by the
user occurs multiple times and shows up in the title of the work or in important headlines or
subheaders.

Popularity -
the relative importance, measured via citation (the act of one work referencing another, as often
occurs in academic and business documents) of a given document that matches the user's query.
The popularity of a given document increases with every other document that references it.

These two items were translated to web search 40 years later and manifest themselves in the form of
document analysis and link analysis.

In document analysis, search engines look at whether the search terms are found in important areas
of the document - the title, the meta data, the heading tags, and the body of text content. They
also attempt to automatically measure the quality of the document (through complex systems beyond
the scope of this guide).

In link analysis, search engines measure not only who is linking to a site or page, but what they are
saying about that page/site. They also have a good grasp on who is affiliated with whom (through
historical link data, the site's registration records, and other sources), who is worthy of being trusted
(links from .edu and .gov pages are generally more valuable for this reason), and contextual data
about the site the page is hosted on (who links to that site, what they say about the site, etc.).

Link and document analysis combine and overlap hundreds of factors that can be individually
measured and filtered through the search engine algorithms (the set of instructions that tells the
engines what importance to assign to each factor). The algorithm then determines scoring for the
documents and (ideally) lists results in decreasing order of importance (rankings).

Information Search Engines Can Trust
As search engines index the web's link structure and page contents, they find two distinct kinds of
information about a given site or page - attributes of the page/site itself and descriptives about that
site/page from other pages. Since the web is such a commercial place, with so many parties
interested in ranking well for particular searches, the engines have learned that they cannot always
rely on websites to be honest about their importance. Thus, the days when artificially stuffed meta
tags and keyword-rich pages dominated search results (pre-1998) have vanished and given way to
search engines that measure trust via links and content.

The theory goes that if hundreds or thousands of other websites link to you, your site must be
popular, and thus, have value. If those links come from very popular and important (and thus,
trustworthy) websites, their power is multiplied to even greater degrees. Links from sites like
NYTimes.com, Yale.edu, Whitehouse.gov, and others carry with them inherent trust that search
engines then use to boost your ranking position. If, on the other hand, the links that point to you are
from low-quality, interlinked sites or automated garbage domains (aka link farms), search engines
have systems in place to discount the value of those links.

The most well-known system for ranking sites based on link data is the simplistic formula developed
by Google's founders - PageRank. PageRank, which relies on a mathematical formula (based around
finding a given document in a random pattern of clicking on links), is described by Google in their
technology section:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an
indicator of an individual page's value. In essence, Google interprets a link from page A to page B as
a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a
page receives; it also analyzes the page that casts the vote. Votes cast by pages that are
themselves "important" weigh more heavily and help to make other pages "important."

Google uses a PageRank “proxy” value, which logarithmically translates the actual PageRank of a
document to a value between 1 and 10, to rank Web sites listed in its directory (which offers a
PageRank order or an Alphabetical order for listings) and in its toolbar (below).




Google's toolbar includes an icon that shows a PageRank value from 0-10

PageRank is, in essence, a rough system for estimating the value of a given link based on the links
that point to the host page. Since PageRank's inception in the late '90s, more subtle and
sophisticated link analysis systems have taken the place of PageRank. Thus, in the modern era of
SEO, the PageRank measurement in Google's toolbar, directory, or through sites that query the
service is of limited value. Pages with PR8 can be found ranked 20-30 positions below pages with a
PR3 or PR4. In addition, the toolbar numbers are updated only every 3-6 months by Google, making
the values even less useful. Rather than focusing on PageRank, it's important to think holistically
about a link's worth.

Here's a small list of the most important factors search engines look at when attempting to value a
link:

The Anchor Text of Link -
Anchor text describes the visible characters and words that hyperlink to another document or
location on the web. For example, in the phrase "CNN is a good source of news, but I actually prefer
the BBC's take on events," two unique pieces of anchor text exist - "CNN" is the anchor text pointing
to http://www.cnn.com, while "the BBC's take on events" points to http://news.bbc.co.uk. Search
engines use this text to help them determine the subject matter of the linked-to document. In the
example above, the links would tell the search engine that when users search for "CNN", SEOmoz.org
thinks that http://www.cnn.com is a relevant site for the term "CNN" and that http://news.bbc.co.uk
is relevant to "the BBC's take on events". If hundreds or thousands of sites think that a particular
page is relevant for a given set of terms, that page can manage to rank well even if the terms NEVER
appear in the text itself (for example, see the BBC's explanation of why Google ranks certain pages for
the term "Miserable Failure").
Global Popularity of the Site -
More popular sites, as denoted by the number and power of the links pointing to them, provide more
powerful links. Thus, while a link from SEOmoz may be a valuable vote for a site, a link from bbc.co.uk
or cnn.com carries far more weight. This is one area where PageRank (assuming it was accurate)
could be a good measure, as it's designed to calculate global popularity.
Popularity of Site in Relevant Communities -
In the example above, the weight or power of a site's vote is based on its raw popularity across the
web. As search engines became more sophisticated and granular in their approach to link data, they
acknowledged the existence of "topical communities"; sites on the same subject that often interlink
with one another, referencing documents and providing unique data on a particular topic. Sites in
these communities provide more value when they link to a site/page on a relevant subject rather than
a site that is largely irrelevant to their topic.
Text Directly Surrounding the Link -
Search engines have been noted to weight the text directly surrounding a link with greater important
and relevant than the other text on the page. Thus, a link from inside an on-topic paragraph may
carry greater weight than a link in the sidebar or footer.
Subject Matter of the Linking Page -
The topical relationship between the subject of a given page and the sites/pages linked to on it may
also factor into the value a search engine assigns to that link. Thus, it will be more valuable to have
links from pages that are related to the site/page's subject matter than those that have little to do
with the topic.

Information contained on this page was adapted from two articles on www.seomoz.org.