Posts

Showing posts from March 21, 2008

How Search Engines Work

The Spider software ‘crawls the web seeking new pages to gather and add to the search engine indices’. This is a figure of speech. In truth, the spider doesn’t do any ‘crawling’ and doesn’t ‘visit’ any web pages. It requests pages from a website in the similar way as Microsoft Explorer, or Firefox or whatever browser you use requests pages to display on your screen. The difference is that the spider doesn’t gather images or visualize designs – it is only interested in text and links AND the URL, from which they come: it doesn’t show anything and it gets as much information as it can in the shortest time possible. A spider loves links because they guide it to other web pages that have the things that it loves, more text, links and URLs. The Index software catches all the Spider can throw at it. The index makes sense of the mass of text, links and URLs using what is called an algorithm - a complex mathematical formula that indexes the words, the pairs of words and so on. Fundamentally,