| T.R | Title | User | Personal Name | Date | Lines |
|---|
| 4475.1 | | teco.mro.dec.com::tecotoo.mro.dec.com::mayer | Danny Mayer | Fri Feb 14 1997 17:10 | 5 |
| I think you may have a problem here. AltaVista Search does not deal
with databases. You also need a webserver running on the VAX with the database
even assuming you can do this.
Danny
|
| 4475.2 | Web will run | 33102::JAUNG | | Fri Feb 14 1997 18:05 | 16 |
| ref .1
Thanks for respond.
Yes, we plan to run Web server (NT or UNIX) on the middleware to
submit inqueries to VAX/Rdb. One of those inqueries is to read
a table full of texts information (not indexed). We "hope" he
Alta Vista search engine can help us to search the text contents.
The problem is as indicated from previous note is that since the
AltaVista Search does not deal with database, (we think) we may
need to build some kinds of index files for specified tables
on the middleware for AltaVista to search through ( if we could).
When users want more detailed information, based on the pointers,
we'll retrieve data from Rdb. Question is will this work or is
there any better solutions for this?
|
| 4475.3 | | netrix.lkg.dec.com::thomas | The Code Warrior | Sat Feb 15 1997 12:05 | 14 |
| Having done something similar, maybe I can give some advise.
First of all, you need will a way to access the RDB data via
the Web. This will probably involve custom CGI work (I wrote
my own HTTP server).
To fill the AltaVista index, you can use scooter to traverse
your RDB data if your pages allow it (ie. noforms, a real
heirarchical URL scheme).
If not, you may consider writing your own program to create
the index files directly (that's what I did).
|
| 4475.4 | | LGP30::FLEISCHER | without vision the people perish (DTN 381-0426 ZKO1-1) | Sun Feb 16 1997 11:27 | 10 |
| Matt's advice in Note 4475.3 is right on -- I just consulted
with a delivery team on a similar customer project. If you
can put a crawlable web interface on the database, it then is
really easy to have it searched by AltaVista (and in many
cases you would want a web interface to the data, anyway).
Otherwise you can use the AltaVista Search SDK to index the
data (again, as 4475.3 suggested).
Bob
|
| 4475.5 | Any document available? | 33102::JAUNG | | Sun Feb 16 1997 15:50 | 7 |
| ref .3 and .4
Thanks for advice. Is there any documents available to instruct
how to put a cralable web interface on the Rdb (we do plan to use
cgi submit SQL call to the Rdb) ?
|
| 4475.6 | Consider MS Index Server ? | XSTACY::imladris.ilo.dec.com::grainne | | Mon Feb 17 1997 05:19 | 24 |
|
If for some reason you decide that the AltaVista approach isn't
feasible, it might be worth looking at the Microsoft Index
Server for IIS v3.0 (formerly codenamed Tripoli.) The SDK
required to develop filters for custom data formats is
pretty well documented. The IFilter SDK also includes
source code for a sample filter. The documentation, SDK,
and sample filter source code is available from:
http://www.microsoft.com/iis/default.asp
(select 'Using IIS' from the left navigation pane, then
'Developing for IIS', then 'IFilter'.)
A friend of mine is using this approach to index a large
collection of AutoCad drawings, stored as .DXF files. I would
also expect that either MS themselves or third parties will
in the future provide ready-made filters for all of the major data
types, including ODBC datasources.
|
| 4475.7 | | teco.mro.dec.com::tecotoo.mro.dec.com::mayer | Danny Mayer | Mon Feb 17 1997 10:05 | 7 |
| > If for some reason you decide that the AltaVista approach isn't
> feasible, it might be worth looking at the Microsoft Index
> Server for IIS v3.0 (formerly codenamed Tripoli.) The SDK
This doesn't run on a VAX.
Danny
|
| 4475.8 | | XSTACY::imladris.ilo.dec.com::grainne | | Mon Feb 17 1997 15:53 | 23 |
| >>> If for some reason you decide that the AltaVista approach isn't
>>> feasible, it might be worth looking at the Microsoft Index
>>> Server for IIS v3.0 (formerly codenamed Tripoli.) The SDK
>> This doesn't run on a VAX.
>> Danny
In .2, the basenoter stated that the HHTP server could be on
NT.
Using the MS IFilter SDK, I don't think you require that the index
server run on the same node as the database server. You could
run both the index server and HTTP server on NT and use the ODBC
driver for Rdb and associated underpinnings to access the
Rdb database.
I know of at least one project (outside of Digital) that's
considering this approach, using IIS v3.0 on NT and an Oracle
database on UNIX. They have the problem that their application
consists almost entirely of ASP pages, generated dynamically
from the contents of their database using the IIS v3.0 ADODB
Active Server Component and the ODBC drivers for Oracle. Therefore,
it cannot be indexed by conventional web search engines.
|
| 4475.9 | | LGP30::FLEISCHER | without vision the people perish (DTN 381-0426 ZKO1-1) | Tue Feb 18 1997 10:04 | 37 |
| re Note 4475.8 by XSTACY::imladris.ilo.dec.com::grainne:
> They have the problem that their application
> consists almost entirely of ASP pages, generated dynamically
> from the contents of their database using the IIS v3.0 ADODB
> Active Server Component and the ODBC drivers for Oracle. Therefore,
> it cannot be indexed by conventional web search engines.
Dynamic pages per se are not inherently non-indexable by
crawlers.
A web client really cant tell that a given page was the
output of a program as opposed to a file.
What it can tell is whether the page was (or would be)
generated in response to form input, a query, or your basic
HREF link to a URL. It can also be told just to stay away
from certain URLs by the use of the robots.txt file.
Note that "your basic HREF link to a URL" traditionally
returns a (static) file, but there is no guarantee of that.
What makes a web site (or portion thereof) non-indexable are
forms, queries, passwords, and required cookies. (Obviously,
with applets, there are many new opportunities for content
that can't be crawled.) If there is a way to get to all the
content just by clicking on links a crawler should be able to
find it. (Most crawlers have logic to detect and cut
recursive paths, also.) It doesn't really matter how or when
the page was created.
(Obviously, for crawling to produce useful results, a given
URL when used again should generally return the same page, or
a later version thereof, and not some totally unrelated page
or nothing.)
Bob
|