Spider spotting
The effectiveness of your efforts in submitting your pages for listing on search engines
can be monitored and evaluated by two methods: spider spotting and URL check.
Spiders from search engines that visit your site and crawl pages leave some unique trace
marks in your access log. This can tell you whether a spider has visited or not, what
pages they have visited and also the frequency or duration of their visit.
The best way to identify spider visits is by finding out which visitors asked for the file
robots.txt from your site. Only spiders make such a request, as this file is an indication to
them to avoid covering the page in question. So the first thing a crawler would do is to
check for this file. If you see the access log and analyze it using some convenient
software, you would be able to spot all the visits that were initiated with this request.
Then one can spot the host name and relate that to major search engines. Host names are
related to the search engine company’s name (it is the name of the site that hosts the
spider). Another name that is used to identify such visits is the agent or browser names
used by respective search engines. Get a list of host names and agent names from
available resources (these names tend to change often) and also develop your own
intuitive list by searching your access logs for all occurrences of known engine, host or
agent names. Concentrate only on the top engines; though you may find several other
smaller and less known search engines visiting your site.
Pay attention to not only the total number of visits but to the activity pattern for each of
the recent visits to actually judge how many pages they covered. This is a very good way
of ensuring if submissions have worked or if other inducements such as links from other
sites have worked or not. This also helps you to distinctly evaluate the effectiveness of
submission, indexing and page ranking characteristics of your site.
Some examples of hostnames and agent names are as below:
• AltaVista: hostname may have altavista.com within its name; agent is often called
Scooter
• Excite host name may have atex or excite.com and agent name is Architextspider.
• Inktomi agent and host names have inktomi.com and Slurp is often used as the
agent name.
• Lycos uses lycos.com within its host name and Lycos Spider is often part of the
agent name.
The effectiveness of your efforts in submitting your pages for listing on search engines
can be monitored and evaluated by two methods: spider spotting and URL check.
Spiders from search engines that visit your site and crawl pages leave some unique trace
marks in your access log. This can tell you whether a spider has visited or not, what
pages they have visited and also the frequency or duration of their visit.
The best way to identify spider visits is by finding out which visitors asked for the file
robots.txt from your site. Only spiders make such a request, as this file is an indication to
them to avoid covering the page in question. So the first thing a crawler would do is to
check for this file. If you see the access log and analyze it using some convenient
software, you would be able to spot all the visits that were initiated with this request.
Then one can spot the host name and relate that to major search engines. Host names are
related to the search engine company’s name (it is the name of the site that hosts the
spider). Another name that is used to identify such visits is the agent or browser names
used by respective search engines. Get a list of host names and agent names from
available resources (these names tend to change often) and also develop your own
intuitive list by searching your access logs for all occurrences of known engine, host or
agent names. Concentrate only on the top engines; though you may find several other
smaller and less known search engines visiting your site.
Pay attention to not only the total number of visits but to the activity pattern for each of
the recent visits to actually judge how many pages they covered. This is a very good way
of ensuring if submissions have worked or if other inducements such as links from other
sites have worked or not. This also helps you to distinctly evaluate the effectiveness of
submission, indexing and page ranking characteristics of your site.
Some examples of hostnames and agent names are as below:
• AltaVista: hostname may have altavista.com within its name; agent is often called
Scooter
• Excite host name may have atex or excite.com and agent name is Architextspider.
• Inktomi agent and host names have inktomi.com and Slurp is often used as the
agent name.
• Lycos uses lycos.com within its host name and Lycos Spider is often part of the
agent name.
0 comments:
Post a Comment