[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference gyro::internet_toolss

Title:	Internet Tools
Notice:	Report ALL NETSCAPE Problems directly to kdlucas@netscape.com.rnet? Read note 448.L for beginner information.
Moderator:	teco.mro.dec.com::tecotoo.mro.dec.com::mayer

Created:	Fri Jun 25 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	4714
Total number of notes:	40609

3934.0. "Web Crawler/Spider/Search/AltaVist questions/comments" by VAXCPU::michaud (Jeff Michaud - ObjectBroker) Mon Aug 05 1996 19:34

T.R	Title	User	Personal Name	Date	Lines
3934.1		RANGER::WASSER	John A. Wasser	`Tue Aug 06 1996 12:34`	11
3934.2		VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Tue Aug 06 1996 12:49`	21
3934.3		RANGER::WASSER	John A. Wasser	`Wed Aug 07 1996 09:54`	20
3934.4	Typos -- Corrected in current build	NETRIX::"chris.lord@ljo.dec.com"	Chris Lord	`Wed Aug 07 1996 17:12`	8
3934.5		VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Wed Aug 07 1996 19:28`	52
3934.6	additional notes	TUXEDO::ROSENBAUM	Rich Rosenbaum	`Thu Aug 08 1996 23:31`	14
3934.7		MR1MI1::VILCANS		`Fri Aug 09 1996 13:18`	9
3934.8	Warning: Crawler on the Intranet is following links w/query arguments!	VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Mon Mar 24 1997 16:30`	27
	> - at some point the crawler will have the (optional) ability to > crawl link URLs that contain "?" query arguments. This is necessary > to support Lotus Domino servers which use query syntax for > perfectly ordinary pages (well not _completely_ ordinary, they are > dynamically generated from Lotus databases). Looks like this support is now there. Someones trying to index my site, including the generated url's that contain query arguments. It's going to take them probably 9 million hits (or more to get all the combinations of authors and notesfiles and personal names) to complete. The user agent that I'm getting hit by is: User-Agent: AltaVista Intranet V1.0 sbu_antony van-trung.truong@aty.mts.dec.com and I've sent mail to them letting them know that they better have alot of disk space (and time, at the rate they are going I compute it's going to take them 5+ years to finish :-). Can't the crawler only following links with query arguments if the server is a "Lotus Domino" server? I'd add support for .txt files to my http server so I can support a robots.txt, but it's too late now as it appears the crawler only tries to fetch robots.txt once, no matter how many links or how much time has past ...
3934.9		teco.mro.dec.com::tecotoo.mro.dec.com::mayer	Danny Mayer	`Tue Mar 25 1997 08:49`	8
	> I'd add support for .txt files to my http server so I can support > a robots.txt, but it's too late now as it appears the crawler > only tries to fetch robots.txt once, no matter how many links or > how much time has past ... How about cutting the connection so that it has to start again? Danny