True confessions of the world's busiest websites

Do not want fail? Why then, can has win, say the folks behind the curtains at Flickr, Digg, Media Temple, and StumbleUpon. Six of them showed up at a panel organized by Kevin Rose to explain how to make websites that stay online, more or less. Being a not very clever gossip, I just listened in for the quips. Oh, and the drama. Automattic founder Matt Mullenweg almost didn't make it. Check out how his fellow panelists updated the lineup right before he showed up.

True confessions of the world's busiest websites

Strikeout! Mullenweg showed up at the last minute. One wonders: Was the recently minted millionaire dealing with fallout from his nasty Twitter fight with Six Apart's Anil Dash? Or was he just calling his broker? (Mullenweg later told me he just went to the wrong green room.)

Flickr's Cal Henderson says "fuck" a lot, which would seem to come with his job. "I'm Cal Henderson from Flickr, the kitten-sharing website" is how he introduces himself. He admits that one Flickr breakdown came about when he failed to use a basic Linux utility, df, to measure if he had enough storage available — a problem when you serve terabytes of photos. Still, it's a good problem to have. "A lot of people can ignore scale forever," he notes — because they never get enough users to bring their site down. "We serve 32,000 photos a second," says Henderson.

"One of the things I don't like about Web 2.0 is you as users want your data to be available, to stay up forever," says Digg architect Joe Stump. "As an engineer, I hate that."

Most of the panelists favor open-source software and cheap hardware. "Buying enterprise means they don't put their prices on the Web," says Mullenweg, the creator of blogging software WordPress. "It means you have to talk to someone with slick-backed hair for 30 minutes. It's uncomfortable."

If you can get over that, says Henderson, "the easiest way to solve scaling problems is to throw money at it. When you're a startup and you don't pay your engineers, then engineering is cheap and hardware is expensive. If you're paying for engineering time, that's expensive."

Stump takes a question from Pownce creator Leah Culver: "Where do you find your bottlenecks?" Stump's answer: "Bottlenecks never have to do with your [programming] language." Henderson instantly retorts: "Unless you're using Ruby." (Ruby is the language used by Twitter, among others, and some blame it for Twitter's outages.) Stump's comeback: "It's always your database or your file system."

StumbleUpon's Garrett Camp suggests testing new features on a small set of your audience, rather than everyone at once, so you test under real conditions but don't afflict buggy code on all of your users at once.

"When we look at the site, we ask, 'What don't we have to do right now?'" says Digg's Stump. Avoiding real-time updates helps avoid bottlenecks. Henderson says Flickr sometimes shows photo pages that are a minute old — again, to minimize load on the site.

That's a rare moment of agreement between Henderson and Stump. The two are back to sparring in minutes. Henderson's comeback to an obscure point Stump makes: "I don't want to work at Digg." Stump then ribs him: "So, Cal, you're moving over to Microsoft technology soon, right?" "Yes, we're moving over to .NET and SQL Server," is Henderson's deadpan response. That's the last zinger before the show wraps up.