Google definitely came down on the “cattle” side of “pets vs cattle.”. Kinda hard not to at the scale of our datacenters. So just like cattle, Google servers didn’t have names, they had numbers. You didn’t know or care which server ran your app, and if one went bad, we’d unceremoniously replace it (…and sometimes dissect it to find the misbehaving CPU, but that’s another story.)
But working on the Linux CPU Scheduling team, we couldn’t make do with the usual bag-of-cpu-cycles approach—we needed root access to try new kernels, and isolated machines to test load balancers on, and machines we could panic & mistreat without anyone complaining. (You, know, like you mistreat…pets? I suppose the metaphor breaks down.)
They could have given us a custom lab somewhere with special one off servers, our own room full of laboratory mice, but that was more work—so they took a perfectly normal rack in a perfectly normal cluster and sliced it off. Our own little herd of kine, just like any others, but without anyone else’s jobs or junk. Hand us the keys to the root account and that’d do us quite well. (Or perhaps this wasn’t about convenience—another nice thing about this approach is its realism. We found all sorts of bugs in our testing that maybe wouldn’t have shown up if the machines had been configured or equipped differently, as machines from a different supply chain would surely have been. There’s a lot of value in testing being like prod! I’m not sure of the “why”—the rack of Kernel Team’s Private Servers predated me at G.)
Google servers were numbered with some letters and some numbers, never mind you why, so a server in our rack was named something like ibsy62
. We could have built our own automation over running things—used the standard cluster management software—but we mostly didn’t. We all wrote bash script after bash script that began with scp test.exe root@ibsy62:
. Or we ssh
’d in manually and ran our tests.
And as humans do, we—or at least I—started to get used to the situation. And we started to get comfortable. “Ibsy” is a pronouncable word, even! I’d tell my teammates “Hey, stay off the ibsies today, I’m running some stress tests across all of ‘em.”. I found myself, without ever thinking about it, treating them—my favorite was ibsy62
—as a safe place, a home. root
’s homedir was full of analysis scripts I used over and over again. You can’t spend years inside a single server without starting to have feelings about that server being different than others.
Then one day, I came in and my normal ssh root@ibsy62
gave No route to host
.
I asked around—turns out these cattle were too old. Not worth keeping. The whole aisle had been retired and ours were no exception. But don’t worry, we got you guys a new rack—go play with the isje
s! They even have the new CPUs, don’t you want to try AVX-512?…Well, yes, I do, but where did my server go? It was a practical mess—you catch above how, like the terrible lab scientist I am, I had left all my analysis scripting, unbacked up, in the local hard drive of an anonymous server I liked? Seriously, that one I regret—but I also remember a distinct feeling of loss. I was upset. My pet cow wasn’t worth keeping, apparently, and it was gone. The new cow might be better, might be faster, might have fascinating tech, but it wasn’t my cow.
One of my biggest regrets from my time at Google [1] is that I did not think, then, to pull in some favors and get my hands on that server blade. I really wish I’d bought it from Google as scrap. It’d be nice to have on my wall. Yeah, maybe a 1U blade is a heavy, oversized eyesore, but ibsy62
and I had some great times. We had long nights building Temeraire; we had long days debugging bizarre kernel regressions; we experimented with cursed tricks and we honed my perf
skills together. I miss that blade a lot. I once explained “pets vs cattle” to a dear friend of mine, at the time in her last year of vet school. I said, “Well, you know how when a cow gets sick you just shoot it?” She told me “Actually, this week I’m doing an amputation for a sick cow in a sanctuary, so…” Some cattle get lucky, I suppose.
[1] Other than selling stock. I mean, autosale is good and it was absolutely risk adjusted correct, but man did I leave some money on the table.