Featured Post

Update: SEO Issues - is it Penguin? Is it Panda? or is it me?

It was a little over a year ago that I posted the " SEO Issues - is it Penguin? Is it Panda? or is it me? " in which I detailed o...

Monday, January 22, 2024

Robots Tags Explained

So, you're diving into the world of making your website shine on search engines, right? It's quite a journey! Now, here's the thing – there's a nifty trick that beginners sometimes miss out on, and that's using robot tags. These little or meta tags are like secret agents for your website. They play a big role in telling search engines, especially Google, how to organize and show off your awesome content.

Curious to know more?

This beginner-friendly guide is all about the different robot tag settings, why they're a big deal, and when you might want to sprinkle some of that magic on your website.

What are Robot Tags?

Robot tags are snippets of code embedded in the HTML of your web pages to communicate instructions to search engine bots. These instructions guide the bots on how to treat your content in terms of indexing, following links, displaying snippets, and more. Let's dive into some common robot tags and their meanings:

1. all

This is the default setting, indicating that there are no restrictions for indexing or serving. If not specified otherwise, this rule has no effect.

2. noindex

Use this tag when you don't want a particular page, media, or resource to appear in search results. It prevents indexing and displaying in search results.

3. nofollow

By using this tag, you instruct search engines not to follow the links on the page. It's useful when you want to keep search engines from discovering linked pages.

4. none

Equivalent to combining noindex and nofollow, it prevents both indexing and following links.

5. noarchive

This tag stops search engines from showing a cached link in search results. It prevents the generation of a cached page.

6. nosnippet

Use this tag if you don't want a text snippet or video preview in the search results. It prevents Google from generating a snippet based on the page content.

7. indexifembedded

Allows Google to index the content of a page if it's embedded in another page through iframes, despite a noindex rule.

8. max-snippet: [number]

Specifies the maximum length of a textual snippet for search results. You can limit the snippet length or allow Google to choose.

9. max-image-preview: [setting]

Sets the maximum size of an image preview in search results. You can choose between 'none,' 'standard,' or 'large.'

10. max-video-preview: [number]

Limits the duration of video snippets in search results. You can set a specific duration or allow Google to decide.

11. notranslate

Prevents the translation of the page in search results. Useful if you want to keep user interaction in the original language.

12. noimageindex

Stops the indexing of images on the page. If not specified, images may be indexed and shown in search results.

13. unavailable_after: [date/time]

Specifies a date/time after which the page should not appear in search results.

Why Use Robot Tags?

Using robot tags is essential for controlling how your content is treated by search engines. It allows you to tailor the indexing, linking, and display settings based on your specific needs. Let's look at an example scenario to illustrate when you might use these tags.

Example Scenario:

Imagine you have a temporary promotion page on your website that you want to exclude from search results after a specific date. In this case, you would use the noindex tag to prevent indexing and the unavailable_after tag to specify the date after which the page should not appear in search results.

<meta name="robots" content="noindex, unavailable_after: 2024-02-01">

This ensures that the promotional page is not indexed and won't appear in search results after February 1, 2024.

In conclusion, understanding and correctly implementing robot tags is a valuable skill for any website owner or developer. It gives you the power to control how your content is presented in search results, ultimately influencing the visibility and accessibility of your website.

Tuesday, February 21, 2023

Back to the Basics of SEO #2 - Black Hat SEO

Black-Hat Techniques


When you look at how the search engines go about finding relevant websites when a certain word, or phrase is typed in, it can be fairly simple to trick that system into finding your website before anyone else's.

These techniques are generally called “Black-Hat Techniques”. They can range anywhere from doorway pages to hidden text, and more.

Some Black-Hat Techniques are:

Artificial Traffic – Artificial traffic systems are setup to hit your website with different IP addresses several times a day. The theory is that the more traffic that comes to your site, the more popular it is. So, by hitting your website website several times a day with different IP's, the search engine's see this as many people visiting your website every day. - In the end driving up your rankings with the search engines.

Why is this bad?

In the constant battle to weed out the good from the bad, search engines have been developed to recognize the programs used to generate artificial traffic. So, while it may work in the meantime in generating better rankings, in the long run you may end up getting penalized, or even banned by one or more of the search engines.

Cloaking Scripts – essentially outwits the search engines to increase your listings, and increase your traffic. Search engines spiders go to each site, follow every link, and index what they find in the engine's results. Cloaking Scripts automatically generate thousands of pages just for the spiders. These pages are usually dynamically built from keyword lists not available to the user.

Why is this bad?

Cloaking scripts are generally unreadable, or even have hidden text (text that is the same color of the webpage background, or hidden by some other feature such as an image, table or div), or redirecting scripts (scripts that send the visitor to the actual website from the cloaking page when the visitor runs their mouse over something, or with the click of a button and/or link within the cloaking page). The whole point of the search engines is to find legitimate content and quality websites for visitors to see. If they can't see the content on the page, or are redirected to a different page, that webpage is essentially thrown out as a quality website.

Doorway Pages - Doorway pages are usually developed as a page that is spider friendly for the user to see that generates rankings, and then redirects the visitor to the actual website. Doorway pages are created to do well for particular phrases. They are also known as portal pages, jump pages, gateway pages, entry pages and by other names. Doorway pages can be used for sites that aren't getting indexed (usually a site developed with frames, or dynamically driven), or they can be completely different domains that direct traffic to the actual website. For example, a lawyer might create a specific website with a doorway page for divorce law, and another one for criminal law, and another one for personal injury law. All three websites are optimized for each specific term, and then redirect to the original website that holds more information about that law firm.

Why is this bad?

Doorway pages do not have any significant content, and only have one or possibly two pages to the site. While they may have links to the site, the content may be the exact same as the original site, and all will link to one another. Search engines view this as spamming them (or tricking), and will eventually either penalize all of the websites, or completely ban them altogether.

Duplicate or Similar Content – In an effort to generate more content quickly, webmasters will sometimes create multiple pages, or even multiple websites, and then simply place the exact same content on each page with different keywords worked in.

Why is this bad?

Search engines can see this as spam. When a website has duplicate content it is either viewed as sheer laziness, or in breech of another's copyrights. In the end, the website that was created originally may not get penalized (generally looking at the older website, or how long a website has been live) though the newer websites will start to drop in rankings as a result.

While the search robots may simply be computers, and computers are only as smart as the ones that created them, search engines such as Google, MSN, and Yahoo spend all of their time, and energy from a staff made of those holding PhD's in computer science, mathematics, and more working on optimizing the search tools, and outsmarting those who create websites, and those who work on optimizing websites to find “ the most comprehensive search” on the web . In the end, the PhDs are most likely to win.

While the search engines may not be able to pick up on these black hat techniques, they may have a little help from users, or even your competitors. Each one of the major search engines (Yahoo, Google, and MSN) all have a form to fill out when it comes to websites that spam. Each and every website is the researched thoroughly, and all that are involved may potentially be penalized as well. So, be wary of those that you link to, and techniques that search marketing companies may use.

Link Farms – The more links you have pointing to your website, the better chance you have of getting a ranking. Although they aren't as around as much as they have been in the past link farms promise to place your link on other websites for a fee.

Why is this bad?

Link Farms often link your page with Web sites that have nothing to do with your content. The repercussions of this action are that the major search engines penalize sites that participate in link farming, thereby reversing their intended effect. A Link Farm usually places your link on a Web page that is nothing more than a page of links to other sites.

Where to report spammers:

Google - http://www.google.com/contact/spamreport.html

Yahoo - http://add.yahoo.com/fast/help/us/ysearch/cgi_reportsearchspam

Alta Vista - http://www.altavista.com/help/contact/search



Yahoo Copyright reporting –


http://docs.yahoo.com/info/copyright/copyright.html

Thursday, January 10, 2019

Making Decision When to Block in Robots.txt, 301, 404 or Ignore Errors and Warnings

Countless times I have been asked by a Boss, CEO or a Client why we see so many errors and warnings in Google Search Console, Moz, Conductor, Brightedge or Botify Reports; and what should we do about them. Often times the solution to the issue could be more damaging than the issue itself.

When Do We Address Errors and Warnings, and How Best to Deal with Them?

It's not an answer that is very straightforward, really. As any SEO will respond with "It Depends". It depends on so many factors going into the issue from cause, amount pages, impact on traffic and revenue those pages have, and so on.

I'll walk you through a decision making process that might help you make the best decision on what to do in these situations.

What are Errors/Warnings?

First, take a look at the issue that is being brought up. Are they Warnings or Errors? 

If you are seeing Errors, then there is a strong chance the issue should receive a priority, but don't go running up and down the hallways of the office screaming "Fire" just yet. We need to look as what exactly is going on before sounding the alarms.

Using one of my own sites as an example, Google Search Console sent me an email that there is an increase in Crawl Errors. I logged in, clicked on "Coverage" to find that the errors had a substantial increase. At some point every SEO has been through this same scenario. The key is to dig through and understand exactly what the error is, and start to identify what the cause is.

The few most common errors you will find in Google Search Console are:
  • Server error (5xx)
  • Submitted URL not found (404)
  • Submitted URL blocked by robots.txt
  • Submitted URL has crawl issue
  • Submitted URL seems to be a Soft 404
  • Submitted URL marked ‘noindex’
Iit's good to familiarize yourself with your GSC Errors before any such notifications are sent. URLs marked "noindex", blocked by robots.txt and not founf (404) are usually known issues that you can ignore. If you are going through your errors for the first time, it's good to spot check directories or sets of pages generated dynamically to understand what might be causing the errors. If they are something to be alarmed by, then it's a good time to discuss this with engineering, or if you develop yourself, to understand what the cause is and come up a best solution to fix.

Server error (5xx), seems to be a Soft 404, and has a crawl issue are all errors that should be addressed immediately. However, you should fully understand how many pages are causing the errors, what directory those pages are in, and what impact those pages are having before bringing them up with engineering, your boss, CEO or client.

Understanding the Impact

Let's say we have a set of pages in a directory that are causing 500 or soft 404 errors. In this case we see one real estate listing and a set of pages under the "events" directory. This is telling us that the system building out those events is having connectivity issues (server errors are usually a connection from web to database). 

When looking at the server logs for these pages, there is an issue with a javascript call on those pages that is causing the 500 error. An easy fix, but what would be the impact for the company?

Logging into Google Analytics and pulling up all pages that include "events" the session percentage of the total for the event pages in a 3 month span is 18% and last year for the same 3 months was 23% with a avg revenue value and conversion rate higher than any other set of pages for the site, and in general. These numbers show that the impact on the server error is potentially effecting the percentage of traffic negatively and the value of those pages is too great to let the error go. If the value of the pages were lower, and the YoY numbers were the same, then a lower priority on the fix could be set. 

Soft 404s could have the same impact as a server error, with a different cause and more involved fix. Soft 404s are usually caused from pages that throw a 200 (page is ok) code, but the page has little to no content on it. Google translates this a page that is in error and gives it a soft 404, rather than saying is doesn't exist.

Understanding server codes and reading your Google Analytics is extremely important in these cases. Furthermore, understanding web log data and being able to identify issues is just as equally important in determining an issue, level of effort to resolve, and impact to the business both negatively and positively (once resolved). 

With this knowledge, you can go into any meeting with your engineering team, your boss, CEO or client and articulate what it is you're seeing, demonstrate your knowledge of the issue, impact on the business and level of effort it will take to resolve.