Wednesday, August 15, 2007

Official Google Webmaster Central Blog: New robots.txt feature and REP Meta Tags

Great for Promotions!

Now Google offers the feature to add a meta tag that tells googlebot when your page is going to expire. While at Classmates.com we would launch promotions on a regular basis in which landing pages and SEO were passed up because the PPC would be too expensive to run for such a short time, and we didn't want to run risk of the page ranking later when the promotion was finished.

SEO Similar Content case study

In April 2007 I attended the SES in New York and took part in a duplicate content discussion panel. The panel talked about the usual content issues such as shingles, duplicate content, boiler plate, etc.
At Classmates I was working on a project that was one page design in which it had dynamically fed content resulting in over 40 million dynamically generated pages. The goal was to individually optimize these pages so that they would act as landing pages when a person were to search a specific word that was located in our database.
I spoke with the people on the panel and asked them how they would resolve the duplicate content issues on so many dynamically generated pages. They suggested setting up a system in which we could add content to the page individually, and just take on a few each week until they all had unique content. The problem with that was that we didn't have the man power to do this for over 40 million pages. The pages were user generated (meaning the user filled out the information for each specific page) and exposing the user content was against the privacy policy and also against our business model leaving the possibility of allowing the user generated content to be visible by the search engines in order to make the pages unique.

So back to square one - when the experts can't help you, what do you do?

So I went back to work after the conference and presented the dilemma to my copywriter. We started talking about duplicate content issues and how we can avoid the content getting filtered out with boilerplate content. Then I thought - "what breaks up the content for each shingle? Is it the code, punctuation, or all of the words counted without code or punctuation?"

So I decided to run a test case against the question...

We took one block of copy that didn't make sense and added a 3 word phrase not common in the english language to optimize the copy for that word. Then applied it to 3 different basic html pages (no css, no skin, no javascriptiing of any kind, etc)

  1. Page 1 - the first page was the copy with the punctuation removed not a single quote, period, or comma as one solid paragraph surrounded by paragraph tags.
  2. Page 2 - the second page was the same paragraph as is without the punctuation that was originally in the content broken up with heading, paragraph, and break tags.
  3. Page 3 - the third page was the paragraph with the punctuation in it as it was originally written and surrounded in whole by paragraph tags.
Links to all three pages were added to the home page of a site that has been in existence for 5 years. They were titled "page 1", "page 2" and "page 3" so that they weren't swayed by the anchor text of the link.

It took a little over a week for the pages to show up in the Google index with Yahoo and MSN trailing by 4-6 weeks. Page 3 showed up first while page 1 showed up in the index shortly thereafter. When searching for the term that the pages were optimized for, Page 3 received rankings while page 1 did not show up. Page 2 was never indexed on Google or MSN, but did show up on Yahoo eventually.

After 3 months of page 2 not showing up in the index I changed page 2 and added punctuation in random places. The punctuation was in different places than page 3, and I removed the heading, paragraph, and break tags.

Just a few days after making the changes page 2 started showing up in the search results for the same term, and not as a supplement result. It was a unique ranking all it's own.

The result is that the search engines use punctuation to break up the content. So when optimizing a dynamic page that will create many dynamic generated pages it's good to use bullet points while keeping your sentences short, and adding the dynamic content as much as possible (without being too repetitive).

Monday, August 13, 2007

Official Google Webmaster Central Blog: Malware reviews via Webmaster Tools

Official Google Webmaster Central Blog: Malware reviews via Webmaster Tools

What is malware?

Malware is a software that tracks your moves online and feeds that information back to marketing groups so that they can send you targeted ads.

Malware can be downloaded with you even knowing by playing online games, downloading software that includes it and they don't inform you, etc. that install the software that tracks your every move.

Most of the malware installed will show you random pop-up ads but some peoples' computers slow down or even crash. The malicious software will use peoples' personal information and abuse it.

Google will now show you results in the search engine result page (SERP) that will warn you if a website uses malware. Some site owners don't even realize that something on their website has malware on it, so they are letting webmaters know through the Google webmaster tools if they have malicious software etc. on their site so they they can remove it.

I always love to see the user coming first when it comes to websites and search engine optimization and this is another great way that Google is helping the webmaster optimize the website, and pay attention to the user.

Problems with multiple domains

There are a lot of websites and companies that buy up multiple domains in order to try and corner the market and weed out the competition. While at Classmates.com I found a long list of domains that the company owned, some of which were parked, and others were resolving to the same DNS as the www.classmates.com domain. The problem with the multiple domains pointing to the same DNS for SEO is that the search engines view this as an exact match down to the same directory and filename, as well as the content on the page. The result in the end is that one or more of the domains could be banned or even penalized for being the same.

I ran into the same issue with the professional womens organization - the original domain was www.pwoo.org which was later changed to wwww.professionalwomenonline.com. The www.pwoo.org domain was used when I was quoted in articles and marketing publications, so I wanted those who came from there to still arrive to the same site even though the domain changed. With the two domains pointing ot the same site, I struggled with ranking issues due to links, and duplicate content. So I removed the www.pwoo.org domian from the Google index so that it won't obtain rankings anymore. The www. professionalwomenonline.com domain is now getting great rankings and the users who type in www.pwoo.org can still see the same site.

How did I resolve the issue?
Google has a great addition to their webmaster tools in which you can remove files, directories, and even whole domains. By removing one of the domains from the index, the other can be left to obtain rankings without resulting in being blacklisted. The only problem is that you have to have the pages 404 (page does not exist) in order for the domain to be removed.

Back up your files on a computer or separate server and remove them all (just temporarily)
Go to your Google webmaster tools and select the site you wish to choose (if you don't have a site setup in the Google webmaster tools I would advise creating an xml sitemap and setting up an account today)
From your webmaster tools site dashboard select the "URL Removals" for your website
Click the "New Removal request"
In this case you want to select " Remove your site from appearing in Google search results."
Confirm by clicking the " Yes, I want to remove this entire site."
The site is then added fore removal.