.

Apologies and postmortem for Silicon Florist downtime this morning

You know how those awesome Web services always do some sort of statement or postmortem after their sites are down? It’s awesome, right? And insightful.

Well, this won’t be that. Nor is it a site that’s considered “mission critical.”

But you have to understand—as silly as it may be—I feel as if you and I have an unwritten and unspoken SLA. That my bad prose and crappy headlines will be here whenever you need them. And when they’re not? I feel as if I’ve let you down.

Long story short, I’m sorry that this happened.

So let me dissect the downtime, this morning. If only for cathartic reasons. It wasn’t a happy time.

That said, I’m not going to throw my host under the bus. This stuff happens. Screws fall out all of the time. The world is an imperfect place.

Precursor: On Friday evenings, I send out the Silicon Florist email newsletter—via Mailchimp (<3)—which recaps stories from the previous week and directs users to the site for additional details. Apparently, a good number of folks read this email on Saturday morning. As some form of torture or something.

So here we go…

6:55AM PDT: Darren Stowell kindly alerts me that none of the links in the Silicon Florist newsletter are working. I immediately try the site and am unable to connect to anything.

That’s odd, I think. I haven’t been mucking with WordPress. Or breaking it like I usually do. And, honestly, this error appears to happening even before the site is hit. What happened?

My first thought? My original host was acquired a few years back. I suddenly wonder if maybe some domain name server stuff might be out of whack.

But it’s early in the morning. So I fall for opting to refresh the page. In the hopes that some sort of Internet reverse entropy will take hold. And fix the whole thing. So I press that little circling arrow. Like every 30 seconds. From my phone.

7:18AM PDT: After a series of refreshes and increasing nervousness and inbound emails, I submit a ticket to my host. With a sort of “WTF?”

8:39AM PDT: No response from host. But they’re a small local shop. And it’s a Saturday morning. So I press on my assumption, “It’s looking like it might be a DNS issue…? Are you seeing anything on your end?”

8:54AM PDT: I send a tweet to my host asking about response times.

9:00AM PDT: Still no response from my host. So I run a traceroute to test my assumptions. Sure enough. The traceroute to siliconflorist.com shows the connection stalling out before it hits my host. I upload my traceroute to the support ticket to help with diagnosing the problem.

9:10AM PDT: I solicit help on Twitter. Asking if any other West Coasters are experiencing issues.

Andrew Hyde lends a little moral support. Aaron Hockley confirms that he is not experiencing any issues.

9:15AM PDT: I jump back into email to discover that Adam Boettiger has been kind enough to run his own traceroute and provide some DNS insights. Basically confirming my suspicions.

DNS checks

Delegation

Superfluous name server listed at parent: ns1.taproothosting.com

A name server listed at the parent, but not at the child, was found. This is most likely an administrative error. You should update the parent to match the name servers at the child as soon as possible.

Superfluous name server listed at parent: ns2.taproothosting.com

A name server listed at the parent, but not at the child, was found. This is most likely an administrative error. You should update the parent to match the name servers at the child as soon as possible.

Total parent/child glue mismatch.

The parent lists name servers that the child doesnt know about, see details in advanced. This configuration could actually work but breaks very easily if one of these zones change slightly.

Nameserver

granite.canvasdreams.com.

Everything is fine.

All tests successful in this part, no errors or warnings.

granite2.canvasdreams.com.

Everything is fine.

All tests successful in this part, no errors or warnings.

Consistency

Everything is fine.

9:16AM PDT: I submit a snarky “*crickets* Hello?” message in an attempt to get a response from my host’s tech support.

9:19AM PDT: My host’s tech support finally confirms that they’re getting my messages and looking into the problem.

9:24AM PDT: I respond to the ticket. Realizing that their support for these time periods is in the UK, I apologize for mucking with a Saturday evening.

9:44AM PDT: With no additional communication, I upload Adam’s traceroute and DNS assessment. I postulate that my initial Occam’s Razor assumption—that something from the acquisition days has failed, mainly the DNS records—is still the best assumption and ask if I should change the DNS records.

9:49AM PDT: My host’s support crew sends another “looking into it” message.

10:14AM PDT: Out of frustration, I change the nameservers to those highlighted by Adam and advise the support team that I am doing so. But encourage them to look into the issue, in case other people are still on the same old DNS.

10:20AM PDT: Frustration levels with my technical ineptitude are running high.

10:31AM PDT: Early propagation points are correctly directing folks to my site. I confirm with Twitter that other folks are seeing the same thing.

10:50AM PDT: Enough confirmations have rolled in that I’m feeling a little better that the problem may be fixed.

Currently: My ticket remains open. Still “waiting on tech.” Waiting final confirmation from the host that my workaround was appropriate.

Again, I apologize for the downtime

I realize that Silicon Florist isn’t critical to your business. But I feel like we’ve got a good thing going here. You know, with me writing. And with you overlooking my shitty writing to extract some value out of the content.

It’s symbiotic.

So I thought I owed you an explanation.

Sorry for the downtime. My sincere apologies. Seriously. But I think we’ve got it fixed.

We now return you to your regularly scheduled barely intelligible gibberish.

  1. “What a new amazing post you have made. I just stopped directly into show you I honestly enjoyed the actual read and will probably be dropping by every so often from today on.”

  2. PS listed under “Dad”
    Wise words. WP engine and a smart bear Jason CEO of WP engine and many other enterprises now. Makes the best managed WordPress hosting on the web. Your ability to name pair because of its uncanny uptime around same as Datapipe which starts that $1,000 dollars a month or based on Pair networks having reliability as it’s in the top 5 in the world. They both are not inexpensive, but each has a fantastic value up time matters a lot. Or on the subject your DNS I believe you will agree with me that DynECT is the best DNS provider available. 11 years of uptime is amazing.
    Tom

  3. A client has there DNS is set up just like this now I am in the process of moving it over to a managed anycast DNS provider.

    To CTB,
    The reason people do not like to use WordPress.com as opposed to a hosted or WordPress.org website or blog is the ability to do so much more with a WordPress.org a.k.a. hosted WordPress site and/blog WordPress.com is very limited and designed for people who want to have a journal about their life on the web. That’s all fine, but it has none of the abilities to blend with the HTML site or whatever else is being used. The plug-ins our nonexistent in addition the difference is comparable to using a bike that would be WordPress.com and a car that would be WordPress.org hosted WordPress.

  4. Sounds like you need to switch providers; that’s not good enough service. Perhaps wp engine (by @asmartbear) might be worth considering? Or host with pair.com whose support has be outstanding for > 12 years for me so far. They even have a phone number to call 🙂 or maybe appfog? 🙂

  5. Thank you Rick! Enjoy reading the site and appreciate everything you do!! Keep it up!

  6. Good thing going? How about great? Or freaking fantastic? Thanks for doing what you do.

  7. Just curious, why not host the blog on wordpress.com? (That is, unless you have content/features that have outside dependencies.)

  8. It’s a blessing and a curse 🙂

  9. Why were you awake before 7am on a Saturday morning? You crazy hustler, you!

Comments are closed.

Discover more from

Subscribe now to keep reading and get access to the full archive.

Continue reading