[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference gyro::internet_toolss

Title:	Internet Tools
Notice:	Report ALL NETSCAPE Problems directly to kdlucas@netscape.com.rnet? Read note 448.L for beginner information.
Moderator:	teco.mro.dec.com::tecotoo.mro.dec.com::mayer

Created:	Fri Jun 25 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	4714
Total number of notes:	40609

4569.0. "The proxy cache efficiency discussion topic" by UTRUST::KUIJPER (Caught in a World-Wide-Web !) Tue Mar 25 1997 09:09

    Hi,
    
    I'm supporting a couple of firewalls at some large Dutch customers.
    The tpoic of interest is WWW caching.
    Some are based on the good old CERN proxy (as part of the AltaVista
    firewall, some run with Netscape 2.0/2.5 and some with Purveyor).
    
    I am looking for figures, recommendations etc. concerning caching in
    these proxy servers.
    
    My largest customer has a 10 Mb link to the Internet, about 300000
    requests per day for a total of about 2 Gb of traffic.
    They use 140 Netscape threads (version 2.5) on Digital UNIX with a 2 
    drive cache (RZ28) with a maximum size of 2 Gb. Only http is being cached.
    
    They have a cache effiency of about 10%.
    With such a figure, I wonder wether caching is actually usefull given
    the huge I/O load on the cache disks for lookups (each disk has about
    60 I/O's sustained per second, and I had to create the domain with a
    large preallocated metadata size to prevent fragmentation.
    
    I have done some analysis on the data, but can not find an obvious way
    to increase the efficiency.
    
    Questions:
    1) What is the general opinion on caching, that is at which point
       should one cache. At 5% efficiency, at 10%, at 40%. Depending
       on linespeed ?
    2) What are recommendations for the cache size ? If I increase it
       further, file lookups will increase and the disks might well
       become saturated.
    3) What steps can be taken to increase cache efficiency. I am
       thinking of:
    	- caching just .gif and .jpg files (they have the best hit ratio)
    	- no caching but just preloading the cache based on yesterday's
    	  access data (only usefull for the Netscape server)
    	- using different cache scenario's, such as the distributed
    	  SQUID/Harvest cache.
    4) What are normal figures for cache efficiency ?
    
    I know that there is/has been a study on caching, but could not find
    the report. I welcome all remarks, recommendations, thoughts etc.
    
    Thanks,
    Frank, NSIS The Netherlands

T.R	Title	User	Personal Name	Date	Lines
4569.1		teco.mro.dec.com::tecotoo.mro.dec.com::mayer	Danny Mayer	`Tue Mar 25 1997 11:59`	7
	I seem to remember that Jeff Mogul had done some analysis of caching on the proxy server in Palo Alto and that the general conclusion was that it wasn't really worth it, the variety of requests was just too broad to get much efficiency by using caching. I believe that Palo Alto no longer caches pages in the proxy server. Danny
4569.2		VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Tue Mar 25 1997 12:27`	11
	> I seem to remember that Jeff Mogul had done some analysis of caching > on the proxy server in Palo Alto and that the general conclusion was that > it wasn't really worth it, the variety of requests was just too broad to > get much efficiency by using caching. Did he also do that analysis on the proxy servers as well? I remember reading in here that someone (Jeff?) had done an analysis on the requests coming into the AltaVista search engine and reached the conclusion that it was not worthwhile to cache search results for the same reasons (an AltaVista NOTES search could probably find that string of notes :-)
4569.3	Re: The proxy cache efficiency discussion topic	QUABBI::"ongbh@zpo.dec.com"	Ong Beng Hui	`Tue Mar 25 1997 23:38`	62
	Hi, > I am looking for figures, recommendations etc. concerning caching in > these proxy servers. > > My largest customer has a 10 Mb link to the Internet, about 300000 > requests per day for a total of about 2 Gb of traffic. > They use 140 Netscape threads (version 2.5) on Digital UNIX with a 2 > drive cache (RZ28) with a maximum size of 2 Gb. Only http is being cached. I am have a customer (ISP) here with 2 x 4100, 2 x 8200 as proxy cache. The total storage of the four machine is around 100G and they are running Harvest from network appliance (www.netapp.com) Request is around 1.5 millions per day per proxy. Their cache effiency is around 40-50%. There are two reasons for the deployment of proxy cache. content filtering and caching. A single international E1 costs around a loaded 4100 per month. 50% bandwidth saving can easily be translated to 1 new 4100 every month. > I have done some analysis on the data, but can not find an obvious way > to increase the efficiency. > > Questions: > 1) What is the general opinion on caching, that is at which point > should one cache. At 5% efficiency, at 10%, at 40%. Depending > on linespeed ? Sorry, I didn't get you on this... > 2) What are recommendations for the cache size ? If I increase it > further, file lookups will increase and the disks might well > become saturated. Squid and Harvest store it's index in memory. I am not sure if Netscape does that. From experience with Squid and Harvest, cache size is directly proportional to the amount of memory you got. > 3) What steps can be taken to increase cache efficiency. I am > thinking of: > - caching just .gif and .jpg files (they have the best hit ratio) > - no caching but just preloading the cache based on yesterday's > access data (only usefull for the Netscape server) > - using different cache scenario's, such as the distributed > SQUID/Harvest cache. You might want to tune your cache expiry factor. Squid/Harvest are pretty good proxy cache. > 4) What are normal figures for cache efficiency ? I guess cache efficiency differ from individual situation. > I know that there is/has been a study on caching, but could not find > the report. I welcome all remarks, recommendations, thoughts etc. Check out http://www.nlanr.net/Cache and follow the links. [posted by Notes-News gateway]
4569.4	Re: The proxy cache efficiency discussion topic	QUABBI::"mogul@actitis.pa.dec.com"	Jeffrey Mogul	`Wed Mar 26 1997 21:38`	56
	Danny Meyer wrote: \|> I seem to remember that Jeff Mogul had done some analysis of caching \|> on the proxy server in Palo Alto and that the general conclusion was that \|> it wasn't really worth it, the variety of requests was just too broad to \|> get much efficiency by using caching. Not quite right. I think you are confusing several things that I said, perhaps not all in this newsgroup: In article <4569.2-970325-122624@networking.internet_tools>, michaud@vaxcpu.enet.dec.com (Jeff Michaud - ObjectBroker) writes: \|> \|> Did he also do that analysis on the proxy servers as well? I \|> remember reading in here that someone (Jeff?) had done an \|> analysis on the requests coming into the AltaVista search \|> engine and reached the conclusion that it was not worthwhile \|> to cache search results for the same reasons (an AltaVista \|> NOTES search could probably find that string of notes :-) (1) Caching AltaVista responses (or other search-engine responses) would not yield much benefit. I simulated a perfect cache over a 24-hour traces of AltaVista URLs from the middle of last year, and it would probably get at most a 15% hit rate. A more reasonably sized cache would get closer to 10%, if my memory is right. (Note that AltaVista's internal cache holds query results, which is not the same as a URL-keyed cache.) (2) For many months, the Palo Alto proxies were run without any caching, because the disk I/O requirements for handling such large caches seemed to slow things down. Apparently, the people who run these proxies have now re-enabled caching. I'm not sure they have really solved the disk I/O delays, though. (3) My analyses of other traces suggest that the best that a Web cache can do is probably around 60%-70% hit rate. This assumes a very large cache, and that our traces are "representative" of other pools of clients. A 70% hit rate sounds good, but if (for example) the average hit takes 1 second, and the average miss takes 10 seconds, then the overall average retrieval time will be almost four seconds; i.e., a 70% hit rate does not necessarily correspond to a 70% improvement in perceived performance. (The situation is complicated by the possibility of reduced congestion.) If the ratio of hit/miss costs is closer (which it seems to be, in reality) then caching might be even less impressive to the actual user. One of the things I need to spend some time on (when I can find some free time to spend!) is to figure out a way of reliably benchmarking the response times provided by proxy caches. Looking at hit rates is not enough, because if the cost of a "hit" is too high, it can actually make matters worse. -Jeff [posted by Notes-News gateway]
4569.5	Re: The proxy cache efficiency discussion topic	QUABBI::"steveg@pa.dec.com"	Steve Glassman	`Wed Mar 26 1997 22:18`	18
	Back when I set up the first caching proxy in Digital, I did a bunch of measurement and wrote it up for the first WWW conference in 1994. You can read the paper: http://www.research.digital.com/SRC/personal/Steve_Glassman/CachingTheWeb/CachingTheWeb.html The rough numbers I found (and seem to be supported in most other studies) give a cache hit rate of roughly 33%. If your companies really have good connectivity and a hit rate of only 10%, I would be tempted to either turn off the cache or just make the cache smaller. Most of the cache hits come from a small fraction of the files. Making the cache larger only very marginally improves the hit rate. In my mind, the places that can justify caching are those with bad network connectivity and/or per-byte charges for network access. Steve [posted by Notes-News gateway]
4569.6	Information gathering, please providers papers	UTRUST::KUIJPER	Caught in a World-Wide-Web !	`Fri Mar 28 1997 16:48`	9
	:RE .4 Jeff, Can you point me to the information you have gathered on this subject (a technical report, a white-paper, a public submission ?) Thanks, Frank
4569.7	Re: The proxy cache efficiency discussion topic	QUABBI::"steveg@pa.dec.com"	Steve Glassman	`Fri Mar 28 1997 20:38`	21
	My number for a cache hit rate - 1/3 (33%), and Jeff Mogul's number - 70% are not as contradictory as they first seem. A cache hit rate of /13 comes from a fairly lazy caching scheme that simply caches the replies to actual requests and uses that result if another request for the same item comes along "soon enough". My analysis showed a potential hit rate of about 2/3 if all of the items in the cache are "fresh enough". This means that if the cache does automatic freshening of cached pages it could get the hit rate up to 2/3. Note, that keeping cached pages fresh takes extra bandwidth from the cache to the servers because some of the freshened pages won't be requested from the cache. So, if the goal is maximum browser-perceived hit rate on the cache then a 2/3 hit rate is possible. If the goal is minimum bandwidth between the proxy and servers, then the hit rate is closer to 1/3. Steve [posted by Notes-News gateway]