I am sure that every person in this world who use Google service will be aware of the Google Caching system, especially those who have hosted websites must have had a detailed study about that, Google caching service is one where its so called “Google Spider” crawls the web and takes snap shots ( a kind of snap shot I would call it ) and stores it in its cache, not only stores it, but it also gives it to the public to view it, so every one must have notices it, whenever you search something in Google the search results display the links too, and near the link we can find a small word “Cached”, and when we click it we can see the cached pages of that particular website. So not only this, Google also updates these cached pages at regular intervals.
Now my question is that, how can this be legal, It also gets the snap shot of several copyrighted stuffs, isn’t it ?? So if some one has some sensitive data, it caches that too and stores it and gives it to the public. So how come caching of copyrighted data be legal?? Moreover is there anyway where one can stop or prevent Google spider entering his/her website so that the contents wont be cached. What I mean here is that if some one is hosting some sensitive data such as personnel information or so and if the concerned person doesn’t want Google to cache that particular page and store it in its cache, then what must the person do ??
| |
|
Welcome to KnowledgeSutra - Dear Guest | |
Google And Its Caching Service
Started by nirmaldaniel, Apr 04 2009 10:11 AM
6 replies to this topic
#2
Posted 04 April 2009 - 10:32 AM
Well you said it, Google is a great resource where you can find cached pages of content that has previously been removed because of copyright infrigment or whatever else made the webmaster to remove it.
But if the webmaster is smart enough to think of this then he should dissallow any robot to cache his page, a very speedy procedure.
You just need to add a meta tag in the <head> section of your web-pages you do not wish to be cached:
That would be just about it.
But if the webmaster is smart enough to think of this then he should dissallow any robot to cache his page, a very speedy procedure.
You just need to add a meta tag in the <head> section of your web-pages you do not wish to be cached:
<meta name="robots" content="noarchive">
That would be just about it.
#5
Posted 04 April 2009 - 12:20 PM
nirmaldaniel, on Apr 4 2009, 01:03 PM, said:
So ..miladinoskim, thats it ?? is it so simple as that ?? If so if every one follows that i guess google spider will have no place to crawl then right ?
#7
Posted 02 May 2009 - 10:54 AM
does any one have any idea when will this Google Bot crawl across ones website ? To be clear i wanna know does it do a random crawling or does it have regular intervals or does it see when the traffic is less to the site??
I Just wanna know when and all will the Bot crawl, moreover how much bandwidth does these bots take when they crawl a site ? Especially how much bandwidth is consumed by google bot when it crawls and takes that snap shot ?
I Just wanna know when and all will the Bot crawl, moreover how much bandwidth does these bots take when they crawl a site ? Especially how much bandwidth is consumed by google bot when it crawls and takes that snap shot ?
Reply to this topic

1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users














