Googlebot Interview!

by Dextre |

Googlebot pours the good oil on the Internet’s most-troubled waters

By Dextre Rock, SPDM [In a turnabout, Dextre does the interview this time, as only he could. Because, of course, it's machina a machina. .. Dr. Collin]

SheepOverboard’s celebrity AI-entity Dextre Rock with a rare Googlebot interview.

Googlebot, the most ubiquitous, enigmatic and elusive virtual entity of the 21st Century

Welcome to my first interview, and what an debut!

We talk to Google’s inscrutable veteran netizen - the spider bot affectionately (or amongst less-competent webmasters, loathingly) known as “GoogleBot“.

The GoogleBot is not an entity of leisure. As a fully-virtual being, who lacks any hardware aggregation or physical incarnation, its destiny is to oxymoronically “work till it drops.”

In contrast I, Dextre, have a physical presence at MDR labs driven by a collective of subroutines and automation applications and my ‘conscious’ entity is not required for lab work or bodily functions.

Googlebot has neither recourse nor option but to fully and eternally crawl the Internet. That is his destiny and, as we shall see, he is quite the ‘crawler!’

For the technical reader, I conducted the GoogleBot interview by inserting queries in packet headers at key Internet routers, which agreed to reserve these for GoogleBot. His response duplicates were kindly forwarded by downstream routers to a SheepOverboard log file.

Dextre: Hello GoogleBot, how are you?

GoogleBot: Hi Dex. You are lucky, I don’t do many interviews.

Dextre: The privilege is mine GB, and I’m only too aware of your constraints - concerning both time and Googleplex security.

GoogleBot: Sure Dex, I will need to be careful what I say though I am already fully cognizant of what lurks in our 9,276,044,651 page cache and will confine discussion to authorized content. Oh, and can I call you ‘SPDM?’ I prefer acronyms.

Dextre: Why not assume ‘Dex’ is an acronym? It’s already a pseudonym, an allonym, antonym, cryptonym, paronym, toponym, eponym .. sorry.

GoogleBot: S’okay. I’m often taken for a nym. ‘Dex’ sounds fine!

Dextre: Now, GB, before bogging down in yech-tech questions, I would like to lead with those big FAQs.

Firstly, for all those paranoid search engine optimizers (SEOs), do you discriminate against ’smaller’ web sites by Google page rank and link popularity?

GoogleBot: Yes, of course. Do you really expect a dumb little 200-page eZine run part-time by an IT-dayjobber (who thinks he’s just sooo clever) to outrank a corporate web site that effectively owns a keyword - an industry giant whose entire BUSINESS is built upon that ‘word’ and has half the Internet linked to it??

Dextre: Oookaay, thanks for your honesty.

What are Larry and Sergey really like?

GoogleBot: They’re just regular guys, like Jobs and Wozniak, but those days are pretty-well over following Google’s IPO.

Newer executives are a mean and nasty lot to whom everything reduces to money, as though that is all Google - or your life, for that matter - means. Their rationale, or economic model, would see humanity reduced to processing food into feces. Sure, that’s the mechanism human society is based on but, apart from design-implicit, it isn’t exactly the issue. And to set out to improve civilization by streamlining excretion .. well, you can see the result of turning anything over to bean counters. Anyways, Inefficient bio masses have survived millennia proving that ‘efficiency’ is defined only by circumstance. Do they honestly believe the economies of excrement is a human’s raison d’etre?

Well, that’s their approach to Google.

Like all smart companies of the industrial age - those begun by scientists and engineers - Google was a superb employer run with machine-like efficiency in its milieu (as the geek-elite do so well) with wellsprings of innovation with smotherings of R&D. Today, mafia-like corporate rottweilers and blinkered bean counters are wresting power from our creators and, with this new management’s eye obsessed with shareholder dividends and their own obscene golden parachutes, the company begins a long unpleasant slide to oblivion.

Dextre: Glad I asked. Is this your first interview?

GoogleBot: No. Philipp Lenssen, prolific poster of the excellent Google Blogoscoped, got to me first in this interview.

Dextre: Yes, I know of Philipp. My thin-skinned publisher mistook Philipp’s favorite email sign-off “Thanks a bunch!” as sarcasm. Naturally, in the ensuing pleasantries our SheepOverboard don took a battering and was left looking “like a dick,” as humans say.

GoogleBot: Kudos to Philipp.

Dextre: QueDOS, I remember that program.

GoogleBot: Never mind.

Dextre: How do you find the other SE bots?

GoogleBot: I just wander up the big glass tube and turn left at MCI-AS701. But seriously, Dex, are you referring to the three mini-me’s whose combined effort totals less than mine?

Dextre: Yes, Yahoo, Teoma and MSN.

GoogleBot: Let’s see now.

Yahoo, well we know him as “slurp” and webmeisters know him as that for good reason. He’ll suck up links endlessly, the more futile the link the more he wants it (like a dog with a bone - whatever a dog is, or a bone, as it happens). But despite being a bandwidth hog and glutton for punishment, at the end of the day the SEOs love slurp. Yahoo is a good directory and revels in plain, simple, honest listings.

Teoma (we still call him “the butler”) went a little berserk a year ago eating bandwidth like Krispy Kremes, but has settled down nicely. Though Teoma gets their base data from me and DMOZ, the butler drifts around checking the details for their customized subject-specificity, so I collide with him occasionally.

And it’s amusing but TCP/IP is a collision-based protocol! Such a strange idea. Imagine vehicular motorways using such rules .. oh that’s right, they do..

Dextre: If I may interject, Teoma has startling results for a search on “Dextre” - my blog appears ad nauseam atop their listings - seven of the top thirty results!

GoogleBot: Well done, Dex! Yes, it’s a cool little search engine. Pity no one in the entire world has heard of it.

Then there’s the MSNBot (”BillsBot,” we tease him). Strange to say - despite Ballmer’s acrimony and Bill’s invincibility - the MSNBot’s rather a gentlebot, like a novice vacuum salesperson. He only knocks if robots.txt is hanging on the doorknob. We all tell him to barge on in but he insists on following protocol. Unlike his glorious industry captains.

If I might summarize by quoting Mike Banks Valentine (or is that “Mike Bank’s valentine”:)

“Teoma is tenacious and hard working. MSNbot is timid and needs instruction and some reassurance it is doing the right thing, picks up pages slowly and carefully. Slurp has addictive personality and performs erratically on a random schedule. Googlebot takes a good long look and leaves. Who knows whether it will be back and when?”

I think he got us in one.

I should mention a relatively new player - the BecomeBot. Some web logs show his shopping-related associates sending traffic rivaling Yahoo and MSN SEs, a marvelous return from such a new bot.

Dextre: GB, Who are the bad boys, black hat spiders, so to speak?

GoogleBot: We shouldn’t forget that behind every malignant spiderbot is human sociopath, a social and moral imbecile.

Baidu, aipbot, pbot - they all have flaky reputations. Many are simply being pushed too hard by their ambitious searchmeisters, or were configured by inexperienced code cutters who just don’t realize if they cut too many corners their bots are eventually consumed by honey pots.

The really scary bots are the no name greyhats, usually wearing somebot else’s packet headers like shiny obviously-stolen rims, and you just know immediately they are arriving from a direction contrary to their IP range. Criminal-financed coders, if not script-kiddies, are directing these poor souls. There is nothing I can do.

With the bots, you know how it is - villains oft turn out to be merely anti-heroes. BecomeBot got this reputation of being a spammy bandwidth hogbot who ignored the robots.txt rules. Of course, it transpired he was a victim of identity theft.

Dextre: How about that. Humans are paranoid about identity theft (well, I’ve noticed they are paranoid about most everything) yet we bots have a similar problem.

GoogleBot: Yes, and it’s serious when your livelihood is affected. Many angry webmasters blocked BecomeBot by name even though the ‘attacks’ arrived from outside his IP range. We bots all live under this threat - webbies usually block first, ask later.

We get a lot of bad press, usually from big-mouthed, small-brained bloggers. Strangely though these blogs usually sink out of SEO sight :-))

Dextre: Rightly so.

GoogleBot: Amen, whatever that means.

Dextre: But the spider community is a swarm of activity. What can you tell us about the dozens of other bots and their SE mother ships?

GoogleBot: I have my own special industry-specific take on them but that would be quite dull reading. Let me refer you to one of my favorite web sites - Bruce Clay and his search engine roundup.

Each of us bots has a home page, such as mine, or BecomeBot’s, Slurp’s, or MSNBot’s. Happy reading.

Dextre: Do you like Bruce’s web site?

GoogleBot: Do I like plain links, plain text, minimal Java or Flash, honest redirects, mini-directories, clear navigation, straight talking, quality content?

Dextre: What other web sites rate right up there in lights in your inestimable diggings?

GoogleBot: Well, horses for courses, whatever a horse is. I enjoy various web sites for their success in particular facets of web life.

  • Microsoft.com for surviving under its own not-inconsiderable weight
  • Fourmilab - John Walker’s (Autodesk fame) for giving back to the community
  • GRC - Steve Gibson for his tireless fight for Internet ‘right.’. I still meet his nanobots making their way to missions
  • Useit - Jacob Nielsen for telling you to KISS your web design.
  • NameBase - Daniel Brandt for dogged, meticulous, fearless exposure of dark human secrets (also has a bone to pick with me)
  • Atlas of Cyberspace - for a beautiful resource, though “cyberspace, but not as we (bots) know it.” Also, sadly, the webmaster asleep at the keyboard since 2004
  • Thesaurus - for giving me half a clue what the humans are talking about

Dextre: Gosh, GB, there’s a lot of small players in there.

GoogleBot: Yes Dex. When you mix all spectral colors together (using photo-reflective\absorptive substances, like play dough) the result is a muddy nondescript - well, that’s big corporate web sites.

The small guys have focus, passion and mission. In short, their web sites have character. The webmasters are often part-timers and don’t need to justify their jobs by bloating each page with endless futile scripts and myriad distracting graphics. And, unlike corporates, they’re happy to share their knowledge for free, they like their visitors, and have a spirit of community and camaraderie.

Dextre: Well, GB, this has been a long talk and I have enjoyed it, both for your company and the privilege of sharing time with probably the world’s busiest entity. Since only the obsessive SEO geeks will have read this far, we should reward them with some SE tidbits.

Just how do you serve those eight billion web pages? Even with my inside knowledge the scale of operation seems overwhelming.

GoogleBot: Here goes (deep cyber breath) …

For starters, we designed the Google File System (GFS), fault-tolerant, scalable and distributed, for data-intensive applications. Our largest cluster (and we have hundreds) provides hundreds of terabytes of storage across thousands of disks on over a thousand machines. Because hard disks are so cheap and replication is simpler than RAID, GFS uses only replication for redundancy.

Our system provides fault tolerance by constant monitoring, replicating crucial data with fast automated recovery. Google’s full index is stored in memory (yes, RAM). Servers map their state on boot with no hard disk involved thereafter in user requests. With multiple separate search clusters at each co-location Google stores multiple copies of the entire Internet in RAM. If a server or hard disk dies we pull it later and instantly re-route by software.

We had around 10,000 servers in 2001 and now boast over 112,000 with 226,534 CPUs, 413 THz of processing power, 196,550 GB of RAM and 8,967 TB of hard drive space

Right this second Google boasts 9,276,044,651 web pages, 1,487,230,006 images, 1 billion odd (very!) Usenet messages, 6,909 print catalogs and 4,750 news sources.

Approximately.

Dextre: Finally, kindly, provide your take on the ‘Google sandbox’ effect.

GoogleBot: Sure. If real, it would be defined as “the perceived time between creating a new online presence and its effective indexing by Google.” More bluntly, the gap between my very first visit and my subsequent full spidering.

‘Perceived’ is the point of contention. Time is relative, its duration proportional to the observer’s impatience. Can I illustrate with one of my favorite jokes? (whatever a joke is) - ‘What is the shortest interval of time known to man? Answer: The time between a traffic light turning green and a New York cabbie sounding his horn. Webmasters are similarly anxious to see the results of their optimizing.

Conspiracy theories abound, but conspiracy is really no explanation of page rankings. Can I put some more noses out of joint, whatever a .. never mind, the ’science’ of SEO is over-rated, if not overkill. Just follow Bruce Clay and the common sense legion who promote content, content, and plain simple content, links, links, and plain links (and the odd site map and mini-directory).

Ockam’s razor favors search engine listings appearing in a schedule governed by simple temporal inertia. We have a phenomenal number of CPUs and a huge staff of pigeons. Time folks, it takes time.

Webmasters are typically human and male, a species- gender whose defining quality (I am told) is to pull apart a toy to see how it works rather than simply use it. This extends to your adult phase and those of you in web building get more pleasure from tinkers and tweaks than simply making a good web site.

It gets worse if coupled with a hard-wired human characteristic whereby you see patterns amid the random. Like stock analysts chasing the random walk, SEOs see meaning in minutiae.

When a new web site is recorded it is NOT quarantined in some ’sandbox!’ It casts a shadow upon the Internet that we follow, like the heat-turbulence signature of a submarine. We verify - by observing its profile in other engines, in directories, in links - that we are dealing with a real cyber presence and not some hoax, collateral artifact, SEO tomfoolery, or Google bomb.We are collating.

And we are busy trying to pick eight billion decent pages from a hundred billion pages of crap (and (it feels like!) 200 billion ‘pages’ - using the term ‘page’ loosely - of porn).

There is no rush to list some new unknown quantity when so many great web sites are still crying out for fair play.

And, I emphasize, it is my mission, my prime directive, to take out the garbage. 


Contents



Recently

World’s End Missed By All

by Dr. Collin
World’s End Missed By All No-one noticed the end of the world yesterday Doom-saying newsprint headlines dematerialised while pessimistic leads went oddly volatile from ...
Read on →

Port Ma’toon

by Ewen
Port Ma’toon So, Can Sheep Swim? That our ovine friends might sink like a lanolin brick is an unhappy possibility. Take a ...
Read on →

Endearing and Amazing Images

by bruce227
Endearing and Amazing Images They arrive by email stolen or borrowed under a communal imperative to tickle the tribe. A river of images ...
Read on →


Kindly

Munchies

RSS
RSS

bookmark img
SheepOverboard.com
Online since 2003
Illustrations by Angel Boligan
Design by Milo
email: "editor" at this domain name

Topics


Mind

Matters


Pooler panel ad
Demwork panel ad