Română (România)English (United Kingdom)French (Fr)
JUser::_load: Unable to load user with id: 69
Luni, 10 Mai 2010 13:32

The Swiss Army knife of URL manipulation

Rate this item
(4 votes)

Ever since all our information started to become digital and everyone got online, a new kind of collation emerged. The first prize: the highest position in all search result lists. The battle for awareness has gone virtual and everyone has to choose their weapon by implementing the best solutions offered by SEO (Search Engine Optimization).

Since 1990, SEO has started to collect more and more rules and best practises in order to propel websites on the highest places within search results. Now, SEO is considered a subcategory of online marketing and many companies are using this service in order to attract more clients.


Besides HTML tags, special attributes for images, structured links, correct sitemaps, titles, synonyms and word frequency, friendly URLs have become a very important aspect of SEO. Friendly URLs are considered to be a “White Hat” technology. All “White Hat” technologies come as opposite to “Black Hat” technologies. The last mentioned consist of illegal practices like cloaking or spams. Cloaking means that a search engine “sees” something else in websites content than a usual human visitor. All “Black Hat” practices lead to the exclusion of the website from the search index.

Coming back to friendly URLs, APACHE is known as the unofficial leader in URL manipulation and his best contribution in that field may be considered the mod_rewrite module (aka The Swiss Army knife). The mod_rewrite module uses a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested URLs.

 


In fact, this technology transforms dynamic URLs into static ones so that the search engine can index the pages in order to show them as results for a search. The main reasons for using static URLs (and having dynamic ones „beneath”): they are easy to guess, they illustrate site structure, they are easy to verbally communicate, they are short enough to paste in an email without wrapping, they look pretty in a catalog, brochure, or other document, they should be easy to remember and they are easy to type.

In order to implement mod_rewrite you must alter the content of .htaccess by adding this line: RewriteEngine On, followed by the actual rules just like the ones below.

In order to illustrate how mod_rewrite works, here are some examples that I found online (http://www.petefreitag.com/item/503.cfm):

Suppose you have a script called news.cfm that takes an id in the url - a typical url might be /news.cfm?id=123. Now suppose we want url's like /news/123 - you can do that like this inside your VirtualHost:

RewriteRule /news/([0-9]+) /news.cfm?id=$1 [PT,L]

The first part of the RewriteRule is the pattern to match, this is a regular expression. The main pattern we are looking for is the id which appears after /news/. This is simply an integer so we can use [0-9]+ to match the integer. The + means that there are one or more numbers (you could use a * to mean zero or more).

You will notice that in our RewriteRule we have put this pattern in parenthesis, this allows us to use it in the url we are rewriting to (the second part of the RewriteRule) - it's called a backreference. We can refer to the back reference using $1, if you have more than one back reference use $2, etc.

The third part of the RewriteRule are some options that you may or may not need. The PT stands for pass through - it tells apache to pass the new url into other modules. This is usually needed if your depending on other modules, for instance in ColdFusion you need to pass info through to the ColdFusion apache module.

Finally the L means that if the rule matches don't check any other rules. It is usually a good idea to include this to save processing.

It is a good idea to rewrite url's that users may try and guess. So suppose your are Apple, and you just released Tiger, some people may try going to apple.com/tiger instead of apple.com/macosx. To redirect in this case you want to do a permanent redirect using the 301 HTTP status code.

RewriteRule /tiger.* /macosx/ [R=301,L]

We simply use the option R=301, this tells your browser, and other clients that /tiger should be /macosx/, your browser will make the request for /macosx/ instead, and that is the address you will see in the location bar.

One final note, it is usually a good practice to use the ^ character to match the beginning of the url pattern, and a $ to match the end of the url. Back to our first rule, we might use the following pattern instead: ^/news/([0-9]+)$. If you were using the first pattern, it may also match something like /foo/news/123 instead of just /news/123.

Sources:

  1. http://thinkingandmaking.com/entries/130
  2. http://www.alistapart.com/articles/succeed/
  3. http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html
  4. http://www.petefreitag.com/item/503.cfm

Images:

  1. http://www.globalicon.com/images/seo-landing-img.jpg
  2. http://4.bp.blogspot.com/_vxIG3ZqOjGQ/SiaVpZ7W7PI/AAAAAAAAAOE/aSSbzj3oXOs/s320/seo+friendly+URL.png

Last modified on Luni, 10 Mai 2010 13:59

E-mail: Această adresă de e-mail este protejată de spamboţi; aveţi nevoie de activarea JavaScript-ului pentru a o vizualiza

Add comment