One of the big issues facing anyone building a data-driven devops practice is, quite simply, the scale of the data you’re collecting. Logs from millions of users quickly add up, and the same is true of the internet of things or any other large source of data. It’s a world where you’re generating terabytes of data from which you need to understand quickly what that data is telling you.
Traditional databases aren’t much help, because you have to run that data through an extract, transform, load (ETL) process before you can start to explore it, even if you’re considering using data warehouse-style analytics tools. Tools to handle massive amounts of data is becoming increasingly important, not only for analytical systems, but also to provide the training data needed to build machine learning models.
Introducing Azure Data Explorer
That’s where Azure’s Data Explorer comes in. It’s a tool for delving through your data, making ad-hoc queries while quickly bringing your data into a central store. Microsoft claims import speeds of up to 200MB/sec per node and queries across a billion records taking less than a second. Data can be analyzed using conventional techniques or across time series, with a fully managed platform where you only need to consider your data and your queries.
Working at cloud scale can mean generating large amounts of data, which can be hard to analyze using traditional tools. Like Cosmos DB, Azure Data Explorer is another example of Microsoft giving its own internal tools to its customers. Running a public cloud at scale has meant that Microsoft has needed to create new tools to handle issues in handling terabytes of data and managing massive data centers. Azure Data Explorer brings those elements together, and turns them into a tool that can work with your log files and your streaming data. That makes it an essential tool for any one building massive distributed applications, on-premises or in the cloud.
Originally code-named Kusto, Azure Data Explorer is the commercial version of the tools Microsoft uses to manage its own logging data across Azure. Back in 2016, Microsoft was handling more than a trillion eventsand more than 600TB of data daily—enough data to well and truly stress test the underlying system. Unless you’re running all the IoT systems for BP or another large oil company, you’re unlikely to need to process that much data, but it’s good to know that the option is there.