[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference gyro::internet_toolss

Title:	Internet Tools
Notice:	Report ALL NETSCAPE Problems directly to kdlucas@netscape.com.rnet? Read note 448.L for beginner information.
Moderator:	teco.mro.dec.com::tecotoo.mro.dec.com::mayer

Created:	Fri Jun 25 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	4714
Total number of notes:	40609

4475.0. "Need input on Alta Vista" by 33102::JAUNG () Fri Feb 14 1997 16:23

    We are considering to implement an customized applications
    combine AltaVista search engine for text search over intrnet.
    There are about 2.5 million pages of text data (2.5 GB) stored in 
    Rdb Multimedia version (together with image data) running on VAX 4000.
    Users will access from PC via netscape/explorer.  We'd appreciate any 
    input on the following from people who has experience with it.
    
    1. Do we need Alpha Unix or Alpha NT as middleware?
    
    2. What is the Structural Design by using AltaVista?  Is the
       followng right?
    
    	Build Alta Vista Forum
    	Initiate AltaVista Mail server
    	Access Alta Vista Search Intranet Private eXtension
    	Assemble Alta Vista Directory
    
    3. Approximately, how long it mau take to implement it?
    
    Thanks in advance.
    
    						NYOSS1::Jaung

T.R	Title	User	Personal Name	Date	Lines
4475.1		teco.mro.dec.com::tecotoo.mro.dec.com::mayer	Danny Mayer	`Fri Feb 14 1997 17:10`	5
	I think you may have a problem here. AltaVista Search does not deal with databases. You also need a webserver running on the VAX with the database even assuming you can do this. Danny
4475.2	Web will run	33102::JAUNG		`Fri Feb 14 1997 18:05`	16
	ref .1 Thanks for respond. Yes, we plan to run Web server (NT or UNIX) on the middleware to submit inqueries to VAX/Rdb. One of those inqueries is to read a table full of texts information (not indexed). We "hope" he Alta Vista search engine can help us to search the text contents. The problem is as indicated from previous note is that since the AltaVista Search does not deal with database, (we think) we may need to build some kinds of index files for specified tables on the middleware for AltaVista to search through ( if we could). When users want more detailed information, based on the pointers, we'll retrieve data from Rdb. Question is will this work or is there any better solutions for this?
4475.3		netrix.lkg.dec.com::thomas	The Code Warrior	`Sat Feb 15 1997 12:05`	14
	Having done something similar, maybe I can give some advise. First of all, you need will a way to access the RDB data via the Web. This will probably involve custom CGI work (I wrote my own HTTP server). To fill the AltaVista index, you can use scooter to traverse your RDB data if your pages allow it (ie. noforms, a real heirarchical URL scheme). If not, you may consider writing your own program to create the index files directly (that's what I did).
4475.4		LGP30::FLEISCHER	without vision the people perish (DTN 381-0426 ZKO1-1)	`Sun Feb 16 1997 11:27`	10
	Matt's advice in Note 4475.3 is right on -- I just consulted with a delivery team on a similar customer project. If you can put a crawlable web interface on the database, it then is really easy to have it searched by AltaVista (and in many cases you would want a web interface to the data, anyway). Otherwise you can use the AltaVista Search SDK to index the data (again, as 4475.3 suggested). Bob
4475.5	Any document available?	33102::JAUNG		`Sun Feb 16 1997 15:50`	7
	ref .3 and .4 Thanks for advice. Is there any documents available to instruct how to put a cralable web interface on the Rdb (we do plan to use cgi submit SQL call to the Rdb) ?
4475.6	Consider MS Index Server ?	XSTACY::imladris.ilo.dec.com::grainne		`Mon Feb 17 1997 05:19`	24
	If for some reason you decide that the AltaVista approach isn't feasible, it might be worth looking at the Microsoft Index Server for IIS v3.0 (formerly codenamed Tripoli.) The SDK required to develop filters for custom data formats is pretty well documented. The IFilter SDK also includes source code for a sample filter. The documentation, SDK, and sample filter source code is available from: http://www.microsoft.com/iis/default.asp (select 'Using IIS' from the left navigation pane, then 'Developing for IIS', then 'IFilter'.) A friend of mine is using this approach to index a large collection of AutoCad drawings, stored as .DXF files. I would also expect that either MS themselves or third parties will in the future provide ready-made filters for all of the major data types, including ODBC datasources.
4475.7		teco.mro.dec.com::tecotoo.mro.dec.com::mayer	Danny Mayer	`Mon Feb 17 1997 10:05`	7
	> If for some reason you decide that the AltaVista approach isn't > feasible, it might be worth looking at the Microsoft Index > Server for IIS v3.0 (formerly codenamed Tripoli.) The SDK This doesn't run on a VAX. Danny
4475.8		XSTACY::imladris.ilo.dec.com::grainne		`Mon Feb 17 1997 15:53`	23
	>>> If for some reason you decide that the AltaVista approach isn't >>> feasible, it might be worth looking at the Microsoft Index >>> Server for IIS v3.0 (formerly codenamed Tripoli.) The SDK >> This doesn't run on a VAX. >> Danny In .2, the basenoter stated that the HHTP server could be on NT. Using the MS IFilter SDK, I don't think you require that the index server run on the same node as the database server. You could run both the index server and HTTP server on NT and use the ODBC driver for Rdb and associated underpinnings to access the Rdb database. I know of at least one project (outside of Digital) that's considering this approach, using IIS v3.0 on NT and an Oracle database on UNIX. They have the problem that their application consists almost entirely of ASP pages, generated dynamically from the contents of their database using the IIS v3.0 ADODB Active Server Component and the ODBC drivers for Oracle. Therefore, it cannot be indexed by conventional web search engines.
4475.9		LGP30::FLEISCHER	without vision the people perish (DTN 381-0426 ZKO1-1)	`Tue Feb 18 1997 10:04`	37
	re Note 4475.8 by XSTACY::imladris.ilo.dec.com::grainne: > They have the problem that their application > consists almost entirely of ASP pages, generated dynamically > from the contents of their database using the IIS v3.0 ADODB > Active Server Component and the ODBC drivers for Oracle. Therefore, > it cannot be indexed by conventional web search engines. Dynamic pages per se are not inherently non-indexable by crawlers. A web client really cant tell that a given page was the output of a program as opposed to a file. What it can tell is whether the page was (or would be) generated in response to form input, a query, or your basic HREF link to a URL. It can also be told just to stay away from certain URLs by the use of the robots.txt file. Note that "your basic HREF link to a URL" traditionally returns a (static) file, but there is no guarantee of that. What makes a web site (or portion thereof) non-indexable are forms, queries, passwords, and required cookies. (Obviously, with applets, there are many new opportunities for content that can't be crawled.) If there is a way to get to all the content just by clicking on links a crawler should be able to find it. (Most crawlers have logic to detect and cut recursive paths, also.) It doesn't really matter how or when the page was created. (Obviously, for crawling to produce useful results, a given URL when used again should generally return the same page, or a later version thereof, and not some totally unrelated page or nothing.) Bob