Wednesday, August 15, 2007

SEO Similar Content case study

In April 2007 I attended the SES in New York and took part in a duplicate content discussion panel. The panel talked about the usual content issues such as shingles, duplicate content, boiler plate, etc.
At Classmates I was working on a project that was one page design in which it had dynamically fed content resulting in over 40 million dynamically generated pages. The goal was to individually optimize these pages so that they would act as landing pages when a person were to search a specific word that was located in our database.
I spoke with the people on the panel and asked them how they would resolve the duplicate content issues on so many dynamically generated pages. They suggested setting up a system in which we could add content to the page individually, and just take on a few each week until they all had unique content. The problem with that was that we didn't have the man power to do this for over 40 million pages. The pages were user generated (meaning the user filled out the information for each specific page) and exposing the user content was against the privacy policy and also against our business model leaving the possibility of allowing the user generated content to be visible by the search engines in order to make the pages unique.

So back to square one - when the experts can't help you, what do you do?

So I went back to work after the conference and presented the dilemma to my copywriter. We started talking about duplicate content issues and how we can avoid the content getting filtered out with boilerplate content. Then I thought - "what breaks up the content for each shingle? Is it the code, punctuation, or all of the words counted without code or punctuation?"

So I decided to run a test case against the question...

We took one block of copy that didn't make sense and added a 3 word phrase not common in the english language to optimize the copy for that word. Then applied it to 3 different basic html pages (no css, no skin, no javascriptiing of any kind, etc)

  1. Page 1 - the first page was the copy with the punctuation removed not a single quote, period, or comma as one solid paragraph surrounded by paragraph tags.
  2. Page 2 - the second page was the same paragraph as is without the punctuation that was originally in the content broken up with heading, paragraph, and break tags.
  3. Page 3 - the third page was the paragraph with the punctuation in it as it was originally written and surrounded in whole by paragraph tags.
Links to all three pages were added to the home page of a site that has been in existence for 5 years. They were titled "page 1", "page 2" and "page 3" so that they weren't swayed by the anchor text of the link.

It took a little over a week for the pages to show up in the Google index with Yahoo and MSN trailing by 4-6 weeks. Page 3 showed up first while page 1 showed up in the index shortly thereafter. When searching for the term that the pages were optimized for, Page 3 received rankings while page 1 did not show up. Page 2 was never indexed on Google or MSN, but did show up on Yahoo eventually.

After 3 months of page 2 not showing up in the index I changed page 2 and added punctuation in random places. The punctuation was in different places than page 3, and I removed the heading, paragraph, and break tags.

Just a few days after making the changes page 2 started showing up in the search results for the same term, and not as a supplement result. It was a unique ranking all it's own.

The result is that the search engines use punctuation to break up the content. So when optimizing a dynamic page that will create many dynamic generated pages it's good to use bullet points while keeping your sentences short, and adding the dynamic content as much as possible (without being too repetitive).