| T.R | Title | User | Personal Name | Date | Lines |
|---|
| 3934.1 | | RANGER::WASSER | John A. Wasser | Tue Aug 06 1996 12:34 | 11 |
| 3934.2 | | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Tue Aug 06 1996 12:49 | 21 |
| 3934.3 | | RANGER::WASSER | John A. Wasser | Wed Aug 07 1996 09:54 | 20 |
| 3934.4 | Typos -- Corrected in current build | NETRIX::"chris.lord@ljo.dec.com" | Chris Lord | Wed Aug 07 1996 17:12 | 8 |
| 3934.5 | | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Wed Aug 07 1996 19:28 | 52 |
| 3934.6 | additional notes | TUXEDO::ROSENBAUM | Rich Rosenbaum | Thu Aug 08 1996 23:31 | 14 |
| 3934.7 | | MR1MI1::VILCANS | | Fri Aug 09 1996 13:18 | 9 |
| 3934.8 | Warning: Crawler on the Intranet is following links w/query arguments! | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Mon Mar 24 1997 16:30 | 27 |
| > - at some point the crawler will have the (optional) ability to
> crawl link URLs that contain "?" query arguments. This is necessary
> to support Lotus Domino servers which use query syntax for
> perfectly ordinary pages (well not _completely_ ordinary, they are
> dynamically generated from Lotus databases).
Looks like this support is now there. Someones trying to index
my site, including the generated url's that contain query
arguments. It's going to take them probably 9 million hits
(or more to get all the combinations of authors and notesfiles
and personal names) to complete.
The user agent that I'm getting hit by is:
User-Agent: AltaVista Intranet V1.0 sbu_antony van-trung.truong@aty.mts.dec.com
and I've sent mail to them letting them know that they better
have alot of disk space (and time, at the rate they are going
I compute it's going to take them 5+ years to finish :-).
Can't the crawler only following links with query arguments if
the server *is* a "Lotus Domino" server?
I'd add support for .txt files to my http server so I can support
a robots.txt, but it's too late now as it appears the crawler
only tries to fetch robots.txt once, no matter how many links or
how much time has past ...
|
| 3934.9 | | teco.mro.dec.com::tecotoo.mro.dec.com::mayer | Danny Mayer | Tue Mar 25 1997 08:49 | 8 |
| > I'd add support for .txt files to my http server so I can support
> a robots.txt, but it's too late now as it appears the crawler
> only tries to fetch robots.txt once, no matter how many links or
> how much time has past ...
How about cutting the connection so that it has to start again?
Danny
|