Tuesday, July 30, 2013

Anatomy of the URL and Stuff

I'm sure you are looking at the URL above and thinking to yourself; "Wow, I never realized that all that stuff meant something." Oddly enough it actually does... As the world wide web has changed into a search friendly, user interactivity playground, the formation and meaning of the URL has evolved considerably in to a very significant factor in not only search engine compliance but in how people use websites. Lately I have been helping clients understand how their website's are structured and how servers to browsers to users work. It's something us search optimizers view as something so simple yet can be so complex to someone who doesn't understand how it all works. So here is the URL broken down piece by piece and explained.  

First - What is a URL? 
A Uniform Resource Locator is a website address that holds very important information between each "." and "/" much like an address to your home contains a house number, city, state, country, etc. This allows the browser to connect to a specific website, directory/path, and/or file in order for the user to see what it is on the page you want them to see. A URL consists of the following:


Hypertext Protocol Established by English physicist Tim Berners-Lee in 1990, hypertext protocol is a request/response standard typical in which the client is the application (User on a web browser such as IE, Firefox, safari, etc) and the server which hosts the web site itself. The client submitting HTTP requests is typically referred to as a user-agent (or user) with the responding server—which stores or creates resources such as files (html, .asp, .php, css, etc) and images—referred to as the origin server.*

  WWW (World Wide Web) or "sub-domain" 
The WWW is typically placed before the main domain of your website URL, referencing the World Wide Web. Remember the game you played in elementary school where you could start your home address with your house number, street, city, state and then go off as far as your country, continent, and even earth. The WWW is the address starting with "earth". In some cases, what we call a "sub-domain" can replace the WWW in your URL, which references a whole new website within your existing domain. Search optimizers can use this as a way to target certain key terms. For example, a real estate agent targeting a specific city will use http://city.domain.com and thus will have a leg up when ranking for anything within that city. In most cases the sub-domains will link to the main domain and, since they are treated by most search engines as a domain all it's own, then it will count as an external link credit, boosting the rankings for the main domain it is linking to. It is highly recommended that you avoid this technique as it is only tricking the search engines and in the end will hurt your rankings rather than help. 

  Domain Naming System (or DNS) 
The domain naming system was established so that the common user can understand in simple terms the location of a web site. A web site's files are usually stored on a server that points to a specific IP address (much like a phone number directs someone's call to your phone). In order for the general public to understand where to locate a certain website and it's files, the specific domain name resolves to that particular IP address. In addition, the Domain Name System also stores other types of information, such as the list of mail servers that accept email for a given domain (such as you@yourdomain.com). 

Top-level Domain Extension 
The domain extension originally consisted of the generic gov, edu, com, mil, and org. With the growth of the internet, the addition of country extensions and other such categories have come into play. The most recognized of the extensions is of course the .com. If you are optimizing for a specific country and language, then the best route to take is to register your domain with that specific country's extension. This will help the search engines recognize that you are targeting that particular audience and will rank that site accordingly. Be sure that your country specific site is in the native language for that country to avoid any duplicate content issues. Do also be careful of linking from that domain to your main domain as once again the site will be penalized. 

Directories and Files 
Here's where the fun stuff comes into play. Just as your computer organizes your word doc, excel, and other such files into folders, a server structures your website files in the same way. A "directory" or "path" is much like a "folder" is on your computer. In standard (old school) html development (before the days of creating dynamic websites powered by databases and user interactivity) a file would be created and named "index.html" or "default.html" and placed either on the main domain folder (in which the DNS resolves to on the server) or placed in a named folder (in order to help the webmaster organize the site's files). As the technology grew and more ways to develop websites with user interactivity and database driven websites advanced, the structure has pretty much stayed the same with the addition of "parameters" that reference a part of the database and returns content and such on a page based on those parameters. (have I lost you yet?) Let's go back to the basic structure of the static html files and go from there...

A Dynamic website is one that has a few static pages (in other words the pages are coded and are only editable by a developer) that have parameters that will pull in content or trigger specific actions from a database. The basics of a dynamic page is one that pulls words, images, etc from a database and can do so creating multiple pages with different content from one basic page. A more complex dynamic page (or site) is something like Facebook, or Twitter in which they recognize whether or not you are signed in with a username and password and will show you either your profile page (if you are signed in) or a "please sign up" page (if you are not signed in or don't have an established username).
In order to help understand this let's talk about how a database works. A database is essentially similar to that of an excel spreadsheet or table in a word document that has a unique identifier for each line (or row) and holds different content for each line item. Example:
Email
First Name
Last Name
Sujo234
bob@bobsemail.com
Bob
Sujo
Forjill23
jill@jillsemail.com
Jill
Forman
Username
In this example the username is the unique identifier with the email, first name, and last name as different parameters for that username.

The content will be different on each page. With dynamic content the possibilities are endless as far as how many pages you can create from developing and design just one file. A great example of how a dynamic page is created for search optimization purposes is on usedcars.com - If you search for "used cars in oslo mn" you see the "UsedCars.com Oslo MN" page in the results. Look at the URL in the address bar when you go to that particular page - http://www.usedcars.com/browse/mn-24/oslo-163.aspx. In this case the page is pulling in the unique ID that is equal to "OSLO 163" and "MN 24", just as the username is the unique ID in the above table.  

SEO Friendly URL 
In order to make your dynamic URL friendly for search engines you must use a rewrite. A great resource for rewriting a URL is the Apache Rewriting Guide. Some open source content management systems (such as Wordpress, Drupal, etc) already do the rewriting for you and all you have to do is enter what you want the URL to be (be sure to include your key terms separated with dashes "-" and not underscores "_" for search happiness) Who would have thought a URL could be so complicated? But when it comes to search optimization and understanding basic website development it is very important to understand how the URL works, how it is structured, and how to make sure your site is URL and search engine compliant. *http://en.wikipedia.org/wiki/Http_protocol