It’s 2:00 a.m. A pager goes off in a suburban home. A bleary-eyed woman makes her way to a terminal hooked up to a 1200-baud modem. She works until 6:00 a.m. to fix the system problem remotely, this time just having to edit and restart a data load job, then monitoring the log to ensure that things are fixed.
If you’re thinking that’s so 1980s, you’re right. However, substitute “smartphone” for “pager” and “laptop with Wi-Fi” for “1200-baud terminal,” and you’re not that far off from how things are done today on the cloud.
It’s called cloud operations (aka cloudops), the ability to maintain cloud-based systems in good working order. It is typically a mix of proactive and reactive work, with data-load job failures, missing directory entries, and user- or developer-caused problems taking up your 10-to 12-hour work day.
If this sounds unglamorous, it is. It’s operations, and while you’ve traded platforms that sit in your datacenter to platforms that sit in the public cloud, the tasks and patterns of work are largely the same around operating migrated applications and data sets.
Some people might argue that ops automation, incuding tools such as cloud-enabled monitoring and management, make ops just a matter of looking at your smartphone once or twice through the day. So what’s the big deal?
But the reality is a bit more primitive, because most application workloads did not get an upgrade in operational procedures when they were moved to the cloud, nor were they modified to make ops easier. Most enterprises just did not bother to spend the money and time making the workloads proactive and self-healing, which is a best practice now.
As a result, the people doing traditional ops end up doing cloudops as they always have. Most enterprises don’t think about what needs to change.
If you don’t think about what needs to change, not much changes. And you lose the opportunity to do cloudops better than datacenter ops.