Wednesday, September 14, 2005

Back to the Basics of SEO #2 - Black Hat SEO

Black-Hat Techniques


When you look at how the search engines go about finding relevant websites when a certain word, or phrase is typed in, it can be fairly simple to trick that system into finding your website before anyone else's.

These techniques are generally called “Black-Hat Techniques”. They can range anywhere from doorway pages to hidden text, and more.

Some Black-Hat Techniques are:

Artificial Traffic – Artificial traffic systems are setup to hit your website with different IP addresses several times a day. The theory is that the more traffic that comes to your site, the more popular it is. So, by hitting your website website several times a day with different IP's, the search engine's see this as many people visiting your website every day. - In the end driving up your rankings with the search engines.

Why is this bad?

In the constant battle to weed out the good from the bad, search engines have been developed to recognize the programs used to generate artificial traffic. So, while it may work in the meantime in generating better rankings, in the long run you may end up getting penalized, or even banned by one or more of the search engines.

Cloaking Scripts – essentially outwits the search engines to increase your listings, and increase your traffic. Search engines spiders go to each site, follow every link, and index what they find in the engine's results. Cloaking Scripts automatically generate thousands of pages just for the spiders. These pages are usually dynamically built from keyword lists not available to the user.

Why is this bad?

Cloaking scripts are generally unreadable, or even have hidden text (text that is the same color of the webpage background, or hidden by some other feature such as an image, table or div), or redirecting scripts (scripts that send the visitor to the actual website from the cloaking page when the visitor runs their mouse over something, or with the click of a button and/or link within the cloaking page). The whole point of the search engines is to find legitimate content and quality websites for visitors to see. If they can't see the content on the page, or are redirected to a different page, that webpage is essentially thrown out as a quality website.

Doorway Pages - Doorway pages are usually developed as a page that is spider friendly for the user to see that generates rankings, and then redirects the visitor to the actual website. Doorway pages are created to do well for particular phrases. They are also known as portal pages, jump pages, gateway pages, entry pages and by other names. Doorway pages can be used for sites that aren't getting indexed (usually a site developed with frames, or dynamically driven), or they can be completely different domains that direct traffic to the actual website. For example, a lawyer might create a specific website with a doorway page for divorce law, and another one for criminal law, and another one for personal injury law. All three websites are optimized for each specific term, and then redirect to the original website that holds more information about that law firm.

Why is this bad?

Doorway pages do not have any significant content, and only have one or possibly two pages to the site. While they may have links to the site, the content may be the exact same as the original site, and all will link to one another. Search engines view this as spamming them (or tricking), and will eventually either penalize all of the websites, or completely ban them altogether.

Duplicate or Similar Content – In an effort to generate more content quickly, webmasters will sometimes create multiple pages, or even multiple websites, and then simply place the exact same content on each page with different keywords worked in.

Why is this bad?

Search engines can see this as spam. When a website has duplicate content it is either viewed as sheer laziness, or in breech of another's copyrights. In the end, the website that was created originally may not get penalized (generally looking at the older website, or how long a website has been live) though the newer websites will start to drop in rankings as a result.

While the search robots may simply be computers, and computers are only as smart as the ones that created them, search engines such as Google, MSN, and Yahoo spend all of their time, and energy from a staff made of those holding PhD's in computer science, mathematics, and more working on optimizing the search tools, and outsmarting those who create websites, and those who work on optimizing websites to find “ the most comprehensive search” on the web . In the end, the PhDs are most likely to win.

While the search engines may not be able to pick up on these black hat techniques, they may have a little help from users, or even your competitors. Each one of the major search engines (Yahoo, Google, and MSN) all have a form to fill out when it comes to websites that spam. Each and every website is the researched thoroughly, and all that are involved may potentially be penalized as well. So, be wary of those that you link to, and techniques that search marketing companies may use.

Link Farms – The more links you have pointing to your website, the better chance you have of getting a ranking. Although they aren't as around as much as they have been in the past link farms promise to place your link on other websites for a fee.

Why is this bad?

Link Farms often link your page with Web sites that have nothing to do with your content. The repercussions of this action are that the major search engines penalize sites that participate in link farming, thereby reversing their intended effect. A Link Farm usually places your link on a Web page that is nothing more than a page of links to other sites.

Where to report spammers:

Google - http://www.google.com/contact/spamreport.html

Yahoo - http://add.yahoo.com/fast/help/us/ysearch/cgi_reportsearchspam

Alta Vista - http://www.altavista.com/help/contact/search



Yahoo Copyright reporting –


http://docs.yahoo.com/info/copyright/copyright.html

Wednesday, September 10, 2003

What are Meta Tags?

What are Meta Tags?


The common question out on the SEO circuit is Meta tags or no Meta tags. Some argue that meta tags are useless. But are they really useless? The answer is no. In fact, back in the golden days of SEO (1990's) meta tags were the only way any website would have ever been seen. If you wanted your website to show up on MSN, Yahoo, Excite, Lycos, or any other popular search engine at the time, if you didn't have the proper meta tags, then your website was virtually hidden from the world.

As search engines became the more popular way to find what you were looking for on the internet, and the internet itself began to grow into this immense “pot of knowledge”, and resources, search engines had to find a way to find all of those websites available on the internet and lump them in an organized easy to access source for users to find what they need. By typing in a word or phrase that describes what you were looking for, the more results you found, the more you would return to that search engine in order to find what you were looking for.

So, the search engines found a way to find websites, and pull the keywords from those websites if there were no meta tags available. The more a word or phrase was found within the content of a website, the higher a website would show up for that word or phrase. The more another site would have the same word, the higher that site would appear, and so on.

Along came Google, with their page ranking system which changed the direction of SEO completely. Search engine optimizers began focusing on getting links to their sites. So much so, that they forgot about the meta tags altogether. Some SEO individuals are so obessesed with the google page ranking system, and obtaining links to their sites that they get involved with links farms, and cross linking, which inevitably will get a website banned from Google or Yahoo.


So, what happened to the meta tags?


They are still there, and are as important as the day they were ever thought up.

In fact, Google and Yahoo both place emphasis on how important it is to have proper meta tags. Yahoo even lists out that proper meta tags are important when optimizing a website, and details how to create proper meta tags (click here to read more). Does this mean that your site won't be seen if you don't have meta tags? Not necessarily. But, your site won't be highlighted for the words, or phrases, that you would like the people that search those phrases to find you. In fact, you will find that some phrases are so popular they are searched several hundred, if not several thousands of times a day.

So, by choosing the right keywords, and properly placing them in your meta tags, and efficiently wording your description, you will find that not only will you show up ahead of your competition for the words that are getting searched, but that the one who searched that phrase might be lured to your link in the results more so than your competition because of your title, and meta tag description.

Meta tags should be placed in the head of the HTML document, between the <HEAD> and </HEAD> tags.

Expires


The date and time after which the document should be considered expired - such as a news article, or marketing promotional landing page.

Controls cacheing as web robots may delete expired documents from a search engine, or schedule a revisit.

Dates must be given in GMT. E.g. format(META tag):

<META HTTP-EQUIV="expires" CONTENT="Wed, 26 Feb 1997 08:21:57 GMT">
or (HTTP header):




Expires: Wed, 26 Feb 1997 08:21:57 GMT

Pragma


Controls cacheing - the value must be "no-cache".
<META value="no-cache" >


Content-Type




The meta content type may be extended to give the character set.
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-2022-JP">

It is recommended to use this tag failure to do so may cause display problems.

Content-Script-Type


i.e.
<META HTTP-EQUIV="Content-Script-Type" CONTENT="text/javascript">




Specifies the default scripting language in a document.

Content-Style-Type


i.e.
<META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css">



Specifies the default style sheet language for a document.


Default-Style



Set the document's preferred style sheet, taken from a stylesheet specified elsewehere e.g. in a LINK element.



Content-Language



Used to set the natural language of the document. Used
by robots to categorize by language. The corresponding Accept-Language
header (sent by a browser) causes a server to select an appropriate
natural language document.
i.e.
<META HTTP-EQUIV="Content-Language" CONTENT="en-US">



or (HTTP header)

Content-language: en-US

languages are specified as the pair (language-dialect); here, English United States

Set-Cookie



Sets a "cookie" along with a value with an expiry date are
considered "permanent" and will be saved to disk on exit.

i.e.
<META HTTP-EQUIV="Set-Cookie" CONTENT="cookievalue=xxx;expires=Friday, 31-Dec-09 23:59:59 GMT; path=/">




PICS-Label



Platform-Independant Content rating scheme. Typically used to
setva document's rating in terms of adult content (sex, violence, etc.)
although the scheme is very flexible and may be used for other purposes.

Cache-Control



Specifies the action of cache agents. Possible values:

  • Public - may be cached in public shared caches
  • Private - may only be cached in private cache
  • no-cache - may not be cached
  • no-store - may be cached but not archived


Robots



Controls web robots on a per-page basis.
i.e.
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">

Robots may spider this page but not index it.



Most search bots support:


  • NOINDEX prevents anything on the page from being indexed and ranked

  • NOFOLLOW prevents the crawler from following the links on the page and indexing the linked pages

  • NOIMAGEINDEX prevents the images on the page from being indexed but the text on the page can still be indexed

  • NOARCHIVE extension to request that the search engine not cache the pages content



Description



A short, plain language description of the document. Used by search
engines to describe your document. Particularly important if
your document has very little text, is a frameset, or has extensive scripts
at the top.
i.e.
<META NAME="description" CONTENT="describe your website or page here">


Keywords




Keywords used by search engines to index your document in addition to
words from the title and document body. Typically used for synonyms
and alternates of title words.
i.e.
<META NAME="keywords" CONTENT="oranges, lemons, limes">


Author



Typically the author's name.
i.e.

<META NAME="author" CONTENT="Your Name">


Copyright



Typically a copyright date a company or website name
i.e.
<META NAME="copyright" CONTENT="2009 company name">




Google Specific


tags specific to Google onle
i.e.
<META NAME="googlebot" CONTENT="noarchive">


  • googlebot: noarchive - do not allow google to display cached content


  • googlebot: nosnippet - do not allow google to display excerpt or cached content


  • googlebot: noindex - similar to the robots meta element

  • googlebot: nofollow