Thursday, September 18, 2008

How AltaVista Works

How AltaVista Works




AltaVista has an index that is built by sending out a crawler (a robot program) that

captures text and brings it back.



The main crawler is called "Scooter." Scooter sends out thousands of threads

simultaneously. 24 hours a day, 7 days a week, Scooter and its cousins access thousands

of pages at a time, like thousands of blind users grabbing text, pulling it back, throwing it

into the indexing machines so the next day that text can be in the index. And at the same

time, they pull off, from all those pages, every hyperlink that they find, to put in a list of

where to go to next.





In a typical day Scooter and its cousins visit over 10 million pages. If there are a lot of

hyperlinks from other pages to yours, that increases your chances of being found. But if

this is your own personal site, or if this is a brand new Web page, that's not too likely.



AltaVista has in incredibly large database of Web sites, such that searches often return

hundreds of thousands of Web site matches. AltaVista's spider goes down about three

pages into your site. This is important to remember if you have different topical pages

that won't be found within three clicks of the main page. You will have to index them

separately.



You cannot tell Alta Vista how to index your site, it is all done via their spider, but you

can go to their site and give the spider a nudge by submitting specific pages. That way,

AltaVista's spider knows to visit that page and index it. Once you have done that, it's all

up to your META tags and your page's content! AltaVista's spider may revisit your site

each month after its initial visit.



AltaVista ranking algorithms reward keywords in the tag. If a keyword is not

in a title tag, it will likely not appear anywhere near the top of the search results!

AltaVista also rewards keywords near one another, and keywords near the beginning of a

page



Add a Page



Adding a page through AltaVista’s Add URL form doesn’t guarantee that the page would

be listed. It usually takes around 4 to 6 weeks to show up. You don't have to have any

special authority to "add a page." This is not a directory, like Yahoo!, where the

information provider has to submit information and has to prove they are who they say



they are. You do not have to do this with AltaVista. It will go and check and bring back

whatever text it finds at that address.



If you give it a URL for a page that doesn't exist, it will come back with Error 404, which

means there is no such page. If that page was in the index, it will remove that page from

the index the next day.



This is very important from several perspectives. Say you have changed the directory

structure at your Web site. First, you should go to AltaVista and Add a Page for all the

old addresses to remove the old information from the index. Then you should Add a Page

for all the new addresses. Also, if you made an embarrassing typo or posted a document

that you shouldn't have, and removed that page from the Web, you can Add URL for that

page at AltaVista to make sure the information is not perpetuated in the index.

0 comments: