In IT, we’ve traditionally had different tools for collecting information and for acting on that information. Monitoring systems, log aggregators, and configuration management databases (CMDBs) collect information about your IT estate, while remote execution frameworks and configuration management systems help you take action on that estate.
IT professionals have been the glue between these systems, manually bridging between information and action. But now, we’re in the middle of a quiet revolution in how these systems are becoming connected to each other, with the insights derived from all of that information increasingly driving automated action across the enterprise.
Insight-driven automation isn’t just for cloud-native
Cloud-native architectures typically offer the promise of “self-healing” systems—the ability of the runtime to drive corrective action from the telemetry information it generates and listens to. This is insight-driven automation at work.
A few examples include Cloud Foundry’s ability to detect failure, restart services, and scale containers based on load; the Kubernetes Operator framework, which promises the ability to automate many operational tasks based on telemetry; and OpenStack’s Monasca project, which offers thresholds, alarms, and event-driven automation of common cloud infrastructure failure modes.
Of course, closed-loop systems aren’t the sole purview of cloud-native architectures—for example, Systemd has been able to detect failed services and restart them for many years. But we are now starting to see this same opportunity in bridging together previously disparate distributed systems.
Bringing this type of insight-driven automation to traditional or hybrid IT systems is an immense opportunity for the IT industry. This requires tying together the more traditional IT systems across the enterprise, without requiring a wholesale movement to cloud-native architectures. Three domains where this is happening today are IT service intelligence; vulnerability detection and remediation; and continuous infrastructure discovery and automation.
IT service intelligence: deriving signal from noise at big data scale
Log aggregation systems like Splunk and the ElasticSearch/Logstash/Kibana (ELK) stack have focused on aggregating all of the machine-generated “exhaust” (data, not CO2) from the software that runs on your estate, making it searchable, and extracting signal from all of that noise through algorithms. The signal has been getting steadily better through machine learning-based correlation across an ever-increasing set of data sources but driving automated action from that insight has proven elusive. The next evolution of these products will offer far deeper integration with “action engines” that close the loop and automate the resolution of issues, sometimes before they even occur.
A canonical example is the detection of a web server that is emitting an increasing number of failures in its error log. A closed-loop system would automate the steps to remove this node from the load-balancer group and provision a new node with an identical software stack to take its place, through an orchestrated series of tasks executed by a remote task execution system or desired-state configuration framework. Today, high-scale web services like Netflix and AWS accomplish this using home-grown custom logic, but very soon this kind of capability will be democratized across many such scenarios, using common off-the-shelf tools.
Vulnerability detection and remediation
In today’s IT environment, every organization must have the capability to quickly and continuously assess its applications and their dependencies against known vulnerability databases. The detection and prioritization of vulnerabilities across the IT estate is performed by secops using specialized tools, and is largely disconnected from the remediation actions, which are performed by ops using a different set of tools.
Today, the most commonly employed collaboration tool across these functions is a spreadsheet. But IT is on the verge of being able to marry these vulnerability insights with the specific actions necessary to remediate those vulnerabilities. In the near future, we will see this entire workflow being automated via a deeper integration of vulnerability scanning tools with configuration management automation systems that can upgrade package versions across the enterprise.
Continuous infrastructure discovery and automation
Automation tools offer the promise of replacing manual, error-prone, repetitive, soul-crushing tasks with automated ones, so that your teams can focus on higher-level activities that are more valuable to the business. But most organizations tasked with automation lack an accurate and real-time inventory of the infrastructure and applications they need to manage. This problem only gets worse with the move to cloud infrastructure, when the lifetime of resources is often measured in hours, not months.
Today, the task of cataloging and classifying infrastructure across your IT estate falls on CMDBs, while the automation of infrastructure and applications is performed by configuration management systems and remote execute frameworks. These worlds are about to merge: Traditional CMDBs that don’t continuously refresh their inventory and offer the ability to automate the desired state of those systems will become obsolete. Likewise, configuration management systems that don’t offer the ability to natively discover, or at least integrate with continuous discovery systems, will not keep pace with ones that do.
Data + insight = action (and better automation)
The next stage of IT automation is about bridging across the silos of tools and data across your enterprise—particularly the tools you use to discover and aggregate information, and the systems you use to take action. Three scenarios that will lead the way are IT service intelligence, vulnerability detection and remediation, and continuous discovery and automation. Every day your systems aren’t talking to each other is a day that your teams expend manual effort to discover and address conditions that could be more easily and reliably be done through automation.
This article is published as part of the IDG Contributor Network. Want to Join?
