What to do when data and computing exceed human scale

Danny Lieberman
5 min readJan 6, 2021

Hint: Automate

Gartner visualization of an AIOps platform

I’m always curious to learn new ideas from other disciplines and apply them for my own selfish purposes.

Use ML and AI to automate IT operations

Gartner is the world’s leading research and advisory company for IT. They pretty much invented the space. Gartner often invents new models — one of them is AIOps. The idea is to use machine learning to drive IT operations.

Gartner’s AIOps model integrates 3 disciplines:

  • Service management (“Engage”)
  • Performance management (“Observe”)
  • Automation (“Act”)

By the way, I love the 3 part thing; Ready, Set, Go. Collect, Detect and Act (flaskdata.io) and Engage, Observe and Act.

There is something very powerful about 3 part slogans isn’t there?

What’s AIOps?

Besides combining AI with a short catchy word like Ops (which sounds like Black Ops, IT Ops, Clinical Ops) — AIOps is the evolution of IT operational analytics (ITOA). It evolved out of the extreme growth in cloud computing.

  • IT environments exceeding human scale. Traditional approaches to managing IT complexity — offline, manual efforts that require human intervention — don’t work in dynamic, elastic environments. Tracking and managing this complexity through manual, human oversight is no longer possible. ITOps has been exceeding human scale for years and it continues to get worse.
  • The amount of data that we need to retain is exponentially increasing. Performance monitoring is generating exponentially larger numbers of events and alerts. Service ticket volumes experience step-function increases with the introduction of IoT devices, APIs, mobile applications, and digital or machine users. Again, it is simply becoming too complex for manual reporting and analysis.
  • Infrastructure problems must be addressed at ever-increasing speeds. As organizations digitize their business, IT becomes the business. The “consumerization” of technology has changed user expectations for all industries. Reactions to IT events — whether real or perceived — need to occur immediately, particularly when an issue impacts user experience.
  • More computing power is moving to the edges of the network. The ease with which cloud infrastructure and third-party services can be adopted has empowered line of business (LOB) functions to build their own IT solutions and applications. Control and budget have shifted from the core of IT to the edge. And more computing power (that can be leveraged) is being added from outside core IT.
  • Developers have more power and influence but accountability still sits with core IT. As I talk about in my post on application-centric infrastructure, DevOps and Agile are forcing programmers to take on more monitoring responsibility at the application level, but accountability for the overall health of the IT ecosystem and the interaction between applications, services, and infrastructure still remains the province of core IT. ITOps is taking on more responsibility just as their networks are getting more complex.

Automation does not replace humans

It should be noted that an acknowledgement that cloud computing is exceeding human scale does not mean that the machines are replacing humans. It means we need automation to deal with the new reality. Humans aren’t replaced, but operations people will need to develop new skills. New roles will emerge.

In the clinical operations space, this means that human study monitors can now become data scientists.

The elements of AI-driven operations

I’m going to take a moment here to go through the elements of AIOps as represented in the Gartner diagram above.

  • Extensive and diverse IT data. Enumerated in the black and blue chevrons, AIOps is predicated on bringing together diverse data from both IT operations management (ITOM) (metrics, events, etc.) and IT service management (ITSM) (incidents, changes, etc.). This is often referred to as “breaking down data silos” — bringing data together from disparate tools so they can “speak” to each other and accelerate root cause identification or enable automation.
  • Aggregated big data platform. At the heart of the platform, the center of the above graphic, is big data. As the data is liberated from siloed tools, it needs to be brought together to support next-level analytics. This needs to occur not just offline — as in a forensic investigation using historical data — but also in real-time as data is ingested. See my other post for more on AIOps and big data.
  • Machine learning. Big data enables the application of ML to analyze vast quantities of diverse data. This is not possible prior to bringing the data together nor by manual human effort. ML automates existing, manual analytics and enables new analytics on new data — all at a scale and speed unavailable without AIOps.
  • Observe. This is the evolution of the traditional ITOM domain that integrates development (traces) and other non-ITOM data (topology, business metrics) to enable new modalities of correlation and contextualization. In combination with real-time processing, probable-cause identification becomes simultaneous with issue generation.
  • Engage. The evolution of the traditional ITSM domain includes bi-directional communication with ITOM data to support the above analyses and auto-create documentation for audit and compliance/regulatory requirements. AI/ML expresses itself here in cognitive classification plus routing and intelligence at the user touchpoint, e.g., chatbots.
  • Act. This is the “final mile” of the AIOps value chain. Automating analysis, workflow, and documentation is for naught if responsibility for action is put back in human hands. Act encompasses the codification of human domain knowledge into the automation and orchestration of remediation and response.

The future of AI-driven operations

As IT moves beyond human scale, we need to automate even more. But simply reacting defensively is not enough.

Driving operations with AI and ML is an opportunity to grow, evolve, innovate, and disrupt.

Here are some ways that AIOps-enabled organizations will transform their business in the next five years.

  • Technology becomes more human: Analytics and orchestration enable frictionless experiences, allowing ubiquitous self-service.
  • The automation of technology, and, hence, business processes: Costs lower, speed increases, and errors decrease while freeing up human capital for higher-level achievement.
  • Enterprise ITOps gains DevOps agility: Continuous delivery extends to operations and the business.
  • Data becomes currency: The vast wealth of untapped business data is capitalized, unleashing high-value use cases and monetization opportunities.

--

--

Danny Lieberman

Helping people do their best work, at any age, at any time with AI.