| T.R | Title | User | Personal Name | Date | Lines |
|---|
| 4569.1 | | teco.mro.dec.com::tecotoo.mro.dec.com::mayer | Danny Mayer | Tue Mar 25 1997 11:59 | 7 |
| I seem to remember that Jeff Mogul had done some analysis of caching
on the proxy server in Palo Alto and that the general conclusion was that
it wasn't really worth it, the variety of requests was just too broad to
get much efficiency by using caching. I believe that Palo Alto no longer
caches pages in the proxy server.
Danny
|
| 4569.2 | | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Tue Mar 25 1997 12:27 | 11 |
| > I seem to remember that Jeff Mogul had done some analysis of caching
> on the proxy server in Palo Alto and that the general conclusion was that
> it wasn't really worth it, the variety of requests was just too broad to
> get much efficiency by using caching.
Did he also do that analysis on the proxy servers as well? I
remember reading in here that someone (Jeff?) had done an
analysis on the requests coming into the AltaVista search
engine and reached the conclusion that it was not worthwhile
to cache search results for the same reasons (an AltaVista
NOTES search could probably find that string of notes :-)
|
| 4569.3 | Re: The proxy cache efficiency discussion topic | QUABBI::"ongbh@zpo.dec.com" | Ong Beng Hui | Tue Mar 25 1997 23:38 | 62 |
|
Hi,
> I am looking for figures, recommendations etc. concerning caching in
> these proxy servers.
>
> My largest customer has a 10 Mb link to the Internet, about 300000
> requests per day for a total of about 2 Gb of traffic.
> They use 140 Netscape threads (version 2.5) on Digital UNIX with a 2
> drive cache (RZ28) with a maximum size of 2 Gb. Only http is being cached.
I am have a customer (ISP) here with 2 x 4100, 2 x 8200 as proxy
cache. The total storage of the four machine is around 100G and they
are running Harvest from network appliance (www.netapp.com)
Request is around 1.5 millions per day per proxy. Their cache effiency
is around 40-50%.
There are two reasons for the deployment of proxy cache. content
filtering and caching. A single international E1 costs around a loaded
4100 per month. 50% bandwidth saving can easily be translated to 1 new
4100 every month.
> I have done some analysis on the data, but can not find an obvious way
> to increase the efficiency.
>
> Questions:
> 1) What is the general opinion on caching, that is at which point
> should one cache. At 5% efficiency, at 10%, at 40%. Depending
> on linespeed ?
Sorry, I didn't get you on this...
> 2) What are recommendations for the cache size ? If I increase it
> further, file lookups will increase and the disks might well
> become saturated.
Squid and Harvest store it's index in memory. I am not sure if
Netscape does that. From experience with Squid and Harvest,
cache size is directly proportional to the amount of memory you got.
> 3) What steps can be taken to increase cache efficiency. I am
> thinking of:
> - caching just .gif and .jpg files (they have the best hit ratio)
> - no caching but just preloading the cache based on yesterday's
> access data (only usefull for the Netscape server)
> - using different cache scenario's, such as the distributed
> SQUID/Harvest cache.
You might want to tune your cache expiry factor. Squid/Harvest are
pretty good proxy cache.
> 4) What are normal figures for cache efficiency ?
I guess cache efficiency differ from individual situation.
> I know that there is/has been a study on caching, but could not find
> the report. I welcome all remarks, recommendations, thoughts etc.
Check out http://www.nlanr.net/Cache and follow the links.
[posted by Notes-News gateway]
|
| 4569.4 | Re: The proxy cache efficiency discussion topic | QUABBI::"mogul@actitis.pa.dec.com" | Jeffrey Mogul | Wed Mar 26 1997 21:38 | 56 |
|
Danny Meyer wrote:
|> I seem to remember that Jeff Mogul had done some analysis of caching
|> on the proxy server in Palo Alto and that the general conclusion was that
|> it wasn't really worth it, the variety of requests was just too broad to
|> get much efficiency by using caching.
Not quite right. I think you are confusing several things that I
said, perhaps not all in this newsgroup:
In article <4569.2-970325-122624@networking.internet_tools>,
michaud@vaxcpu.enet.dec.com (Jeff Michaud - ObjectBroker) writes:
|>
|> Did he also do that analysis on the proxy servers as well? I
|> remember reading in here that someone (Jeff?) had done an
|> analysis on the requests coming into the AltaVista search
|> engine and reached the conclusion that it was not worthwhile
|> to cache search results for the same reasons (an AltaVista
|> NOTES search could probably find that string of notes :-)
(1) Caching AltaVista responses (or other search-engine responses)
would not yield much benefit. I simulated a perfect cache over
a 24-hour traces of AltaVista URLs from the middle of last year,
and it would probably get at most a 15% hit rate. A more reasonably
sized cache would get closer to 10%, if my memory is right. (Note
that AltaVista's internal cache holds query results, which is not
the same as a URL-keyed cache.)
(2) For many months, the Palo Alto proxies were run without any
caching, because the disk I/O requirements for handling such
large caches seemed to slow things down. Apparently, the people
who run these proxies have now re-enabled caching. I'm not sure
they have really solved the disk I/O delays, though.
(3) My analyses of other traces suggest that the best that a Web
cache can do is probably around 60%-70% hit rate. This assumes
a very large cache, and that our traces are "representative" of
other pools of clients. A 70% hit rate sounds good, but if (for
example) the average hit takes 1 second, and the average miss
takes 10 seconds, then the overall average retrieval time will
be almost four seconds; i.e., a 70% hit rate does not necessarily
correspond to a 70% improvement in perceived performance. (The
situation is complicated by the possibility of reduced congestion.)
If the ratio of hit/miss costs is closer (which it seems to be,
in reality) then caching might be even less impressive to the
actual user.
One of the things I need to spend some time on (when I can
find some free time to spend!) is to figure out a way of
reliably benchmarking the response times provided by proxy
caches. Looking at hit rates is not enough, because if the
cost of a "hit" is too high, it can actually make matters worse.
-Jeff
[posted by Notes-News gateway]
|
| 4569.5 | Re: The proxy cache efficiency discussion topic | QUABBI::"steveg@pa.dec.com" | Steve Glassman | Wed Mar 26 1997 22:18 | 18 |
| Back when I set up the first caching proxy in Digital, I did a
bunch of measurement and wrote it up for the first WWW conference
in 1994. You can read the paper:
http://www.research.digital.com/SRC/personal/Steve_Glassman/CachingTheWeb/CachingTheWeb.html
The rough numbers I found (and seem to be supported in most other
studies) give a cache hit rate of roughly 33%. If your companies really
have good connectivity and a hit rate of only 10%, I would be tempted
to either turn off the cache or just make the cache smaller.
Most of the cache hits come from a small fraction of the files.
Making the cache larger only very marginally improves the hit rate.
In my mind, the places that can justify caching are those with
bad network connectivity and/or per-byte charges for network
access.
Steve
[posted by Notes-News gateway]
|
| 4569.6 | Information gathering, please providers papers | UTRUST::KUIJPER | Caught in a World-Wide-Web ! | Fri Mar 28 1997 16:48 | 9 |
| :RE .4
Jeff,
Can you point me to the information you have gathered on this subject
(a technical report, a white-paper, a public submission ?)
Thanks,
Frank
|
| 4569.7 | Re: The proxy cache efficiency discussion topic | QUABBI::"steveg@pa.dec.com" | Steve Glassman | Fri Mar 28 1997 20:38 | 21 |
| My number for a cache hit rate - 1/3 (33%), and Jeff Mogul's
number - 70% are not as contradictory as they first seem. A
cache hit rate of /13 comes from a fairly lazy caching scheme
that simply caches the replies to actual requests and uses that
result if another request for the same item comes along "soon
enough". My analysis showed a potential hit rate of about 2/3
if all of the items in the cache are "fresh enough".
This means that if the cache does automatic freshening
of cached pages it could get the hit rate up to 2/3. Note,
that keeping cached pages fresh takes extra bandwidth from
the cache to the servers because some of the freshened pages
won't be requested from the cache.
So, if the goal is maximum browser-perceived hit rate on the
cache then a 2/3 hit rate is possible. If the goal is minimum
bandwidth between the proxy and servers, then the hit rate is
closer to 1/3.
Steve
[posted by Notes-News gateway]
|