DDOS? Didn’t Notice

One of the best feelings when you’ve got a SaaS business is when the complicated architecture shit you’ve done actually works.

Admittedly, it working so well cost us £1000, but we were repaid that in warm fuzzies because it proved that our tech can pass a serious stress test.

A lot of Convertri could be built by any competent developer with enough time on their hands (LOTS of time. Like, fucking years)

But there’s some parts that are just beyond the average code monkey. No matter how hard they try, they’re not going to be able to pull it off because they just don’t have the experience.

Your SaaS app doesn’t necessarily need complex elements like this to be commercially viable. Many don’t have them.

But without this, it’s a lot easier for someone to come along and build the exact same app as you. Then you’re just competing on who’s a better marketer.

Maybe you’re an amazing marketer and that’s your turf, so bring it on.

But we’re not – as any reading of this blog will make painfully obvious. Nor are most SaaS founders.

This extra layer of complexity, this thing that makes your app just BETTER and is really hard to copy – that means you’ve got an extra layer. Even if someone tries to copy you, you’ve still got an advantage. It’s the reason having a CTO with serious technical chops can pay off.

One of these things with Convertri is the CDN.

It’s one of the things that make our pages load so fast, and what makes them ridiculously stable. They’re completely detached from the app, so even if the editor goes down the pages you’ve published stay live. Almost 3 years in and we’ve still – touchwood – got a 100% uptime record.

Originally we used Fastly for this.

They were great, but stopped being viable as SSL became more and more important. Their bill for SSL was $250/mo/domain. With our highest-paying customers charged $199/mo and unlimited domains on the account (in retrospect, this was dumb, but eh, we’re here now), the answer to that was clearly ‘ahahahahahaha no’.

So we built our own CDN. With blackjack, and hookers.

And this… this is NOT something that any competent developer can do, given time.

This is something that requires complicated architecture. It requires university-grade learning. A plucky attitude and Stack Overflow aren’t going to cut it.

And what this has given us:

Extra tracking and analytics capabilities (admittedly these are yet to be realised in the app, but the foundation is there when we have the dev cycles to build it)
Incredible page stability
An enormous traffic capacity
Load balancing, able to handle massive spikes without affecting other pages on the network
A greatly reduced monthly bill

The thing is, often when people say stuff like this works, what they mean is ‘it works in theory’. We knew we should have a huge traffic capacity, but we hadn’t actually thrown vast quantities of traffic to find out.

But then, someone did.

Around the 3^rd December 2018, we started effectively getting DDOSed.

It wasn’t malicious. One of our users deployed a script that was inadvertently pinging the wrong address, which resolved to a 404 on a domain we were hosting, but it was pinging it several hundred times a minute.

(Tech heads are currently yelling ‘that’s not a DDOS’. They’re right, but it’s a very similar effect – the main difference is a DDOS is an attack designed to overload a network. This was just an accident that was using similar methods to what a DDOS does)

It worked out to about 100M ‘views’ per day.

And it kept doing it.

For a month.

We only noticed when our bandwidth bill came in with a traffic component that was hundreds of times higher than normal volumes. Our network had delivered 11 terabytes of data. As text.

And we didn’t know.

Which frankly, is pretty awesome. A network that hadn’t been designed to this extent would have fallen apart under the strain.

The Convertri CDN hadn’t blinked. It absorbed the massive sustained traffic increase with no performance issues. It didn’t mitigate the DDOS so much as fail to acknowledge it shared the planet.

And that is fucking cool.

It revealed some stupid we’d done as well. While being able to absorb a massive traffic increase is pretty vindicating, we really should have had some checks built in to tell us it was happening.

If we did, we would have been able to get in touch with the user before we ran up a £1000 bandwidth bill.

Of course, we’ve patched the holes. We’ve added extra measures into the CDN to make sure this kind of thing can’t happen again.

We’ve also added alerts to our bandwidth billing. This is something we really don’t have an excuse for not having in place before – when your $/day on ANY service is a fairly standard size and it suddenly gets ten times bigger, you really should have an alert in place to tell you that’s happened. We’ve fixed that too.

It definitely feels ironic that we knocked it out of the park on the hard stuff, and then fell at an extremely basic hurdle – but that did at least mean the fix was easier.

And even if this was an unintentional stress test, I’m still taking the pass as a win.