THE AMAZING CRAWLERS!

the amazing crawlers

Nope I’m not talking about Spiderman, but if the internet had superheroes, they would probably be these “crawlers” I’m talking about! Crawlers, also known as “spiders” or “robots”, forms an important aspect of search engine technology. These invisible programs are sent out constantly by search engines such as Google and Bing to crawl all over billions of websites. They do so to collect information from the web to build a searchable index for the search engine we use to seek information.

Here’s a useful video by Google that talked about crawlers (or spiders as they call it):

Quoting Matt, basically they “start by crawling a few web pages, then they follow the links on those pages and fetch the pages they point to, and follow all the links on those pages and fetch the pages they link to, and so on, until they’ve indexed a big chunk of the web.”

(so…what?)

BASICALLY, CRAWLERS DETERMINE WHAT SHOWS UP ON SEARCH RESULTS

These programs determine whether or not some information you churn out on your webpage gets indexed by the search engines. So the smart thing as a marketeer to do is to 1) make sure crawlers get to see what you want them to see, 2) do not lead them to places that you do not need them to be!

How to optimise your web to be “crawler-friendly”?

A simple search on Google shows many websites that actually teaches us how to make a website crawler-friendly. Some basic ones include:

2) Simplifying URLs: Crawlers, like us humans likes URLs which looks logical. URLs that has many numbers and weird symbols doesn’t make any sense to crawlers and are less likely to be crawled and indexed properly. Instead, URLs which are named properly makes navigation easy, benefiting both the crawlers AND the readers.

3) Using Alt text: Sadly, crawlers are blind to images. So in order for them to know what your images are about, they read the alternative text you set for the images. Hence it is important to name your images on the site relevant to its content!

3) Being careful of rich media: Text still plays a big role in reaching out to crawlers. They may have some trouble reading rich media files such as Javascript, Flash or silverlight. However, if removing them completely means making your site less interesting, then it might be a bad idea to do so. Furthermore, Google has improved their methods of indexing flash files.

On the other hand, if you have a site that includes many different (unecessary) pages, you might want to block them from being read by crawlers! This would allow the pages that you want to be featured to be indexed by the search engines. This can be done by using robots.txt files to prevent crawlers from going to certain pages. Here’s a great guide that teaches us how to create such files!

From what I see, crawlers are hard working creatures that follows links and absorb almost everything you present on the web. Making our website crawler friendly- directing them to pages you want them to index and stopping them from going to pages you do not want them to go to definitely helps in rising up in the search rankings!

Was this post helpful? Did I miss anything out? Comment and let me know!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s