placeholder

The Rise of Intelligent Observability in Software Development

author

Giulia Di Pietro

Aug 18

Just monitoring production is not enough for complex, cloud-native systems

We are being observed, but not in the way you might think.

In August 2021, Dynatrace celebrated the 2nd anniversary of its Initial Public Offering (IPO). This comes simultaneously to the company hitting the mark of both 3000 employees and 3000 customers, showing a 50% and 85% increase (respectively) in that timespan.

Dynatrace has been growing exponentially, and part of its success is driven by observability adoption within software development.

And in this blog post, I want to tell the story of why observability has grown, what it brings to software development, and what the future holds for it.

Let’s start with: what is observability exactly?

According to the “textbook” definition, observability is the ability to understand what is happening inside a system based on its external outputs. These outputs include metrics, distributed traces, and logs coming from the backend and the frontend, which show how your system is performing in the “real world” (UX). The key is knowing how to analyze the data and use it to your advantage to fix issues after they’ve happened and learned about unknown unknowns before they occur. And by adding Artificial Intelligence and Machine Learning to the mix, you create “Intelligent Observability,” which can automate the remediation of issues, learn from patterns to prevent recurring incidents, and much more.

This is incredibly useful for software developers and operations engineers because it automates some of the most tedious parts of their job, like debugging and log analysis, leaving more time for innovation and development. It also takes a weight off of their chests: you can sleep well at night, knowing that an AI will take care of failures in your system.

Monitoring your applications in production is not something groundbreaking. It has been done since the dawn of software development. But it is something that operations teams were responsible for. After a developer is done with coding, they’ll push and commit it and then go home, leaving production issues to operations engineers. Nowadays, thanks to the current digitalization trends and — let’s not forget — the advent of DevOps, everything is shifting further and further left towards development. Observability helps developers take responsibility for what they develop, and we are all for it.

There are plenty of articles out there talking about the role of observability in the success of software companies. Still, not much is said about why software development teams are adopting this in their SDLC. So, I thought I’d summarize some of the critical factors that are contributing to this trend.

Digital transformation has started and is here to stay

It all starts with digital transformation.

The digitalization of services and products is old news, but it has accelerated, especially now since the pandemic. Before this happened, it was hard for many industries to start a digital transformation. They did not have the resources and could not stop everything to start focusing on something new (potentially affecting their business).

But, with nobody allowed out on the streets to buy non-essentials, companies with a typical brick-and-mortar business model had to switch to digital to not lose their clientele. Other companies that were already digitalized, like food delivery services, needed to improve their infrastructure and UX to comply with the spike in demand. (A hungry person trying to order a pizza would become hAngry if the “add to basket” button wouldn’t work — steering them away from your platform).

Cloud infrastructures have become increasingly popular

We are a digitalized society that wants everything to happen smoothly, efficiently, and on-demand. And to allow for faster, always available, and lightweight software, many companies either migrated to the cloud or started cloud-native. This movement also goes hand in hand with the transition to 5G networks, allowing faster data transfer and collection.

This is all good and great for users, but cloud infrastructures are much more complex to build for developers and software architects than monoliths. (Let’s just not go into the discussion of whether it always makes sense to build cloud-native apps or if it’s done just because it’s trendy.)

Cloud-native software is built upon hundreds of interdependent services, making it impossible for normal humans (or operations engineers) to manage them all. When software is a monolith and releases happen once every few months, the operations team can take care of everything. Nowadays, we update services and apps multiple times a day, and the whole delivery infrastructure that allows for this is incredibly complex.

More complexity = more possibility for errors.

The long-awaited shift to DevOps

Operations engineers are overwhelmed because of the complexity of what they need to maintain. And developers often don’t have the correct knowledge to support them fully.

Introducing DevOps: combining development and operations to ensure smoother deployments and more stable production.

Developers nowadays do not just write code, throw it over the fence to ops, wash their hands and forget about it. No, they need to be more aware of what they are building and how it affects other parts of a system. This also encourages the shift-left mentality, where code is being tested in every stage long before production.

In comes intelligent observability

With intelligent observability, developers and operations engineers can see the whole picture of their system and how every part interacts with each other. The overview helps them recognize where there might be infrastructural issues. Machine learning can suggest ways to improve your existing structure. Artificial intelligence can auto-remediate issues.

Why repeat the same, time-consuming manual tasks multiple times when an AI can just automate them for you? Why lose sleep over possible bugs in production when they can be remediated autoMAGICally? (see what I did there?) Why maintain the same architecture when an all-encompassing AI can suggest ways to improve it to give users a better UX?

At Dynatrace, we don’t have a traditional ops team anymore, thanks to the built-in observability in our systems that helps us achieve an always-on system. An AWS data center outage recently affected many companies in Europe, but our clients remained unscathed thanks to the in-built Dynatrace observability. As the host became unavailable and nodes went down, the load balancer automatically redirected traffic to the healthy nodes hosted on other data centers. Users could still use the platforms from our clients without noticing any difference, and our clients had no financial loss because of it.

Furthermore, as already mentioned in a previous paragraph, increasing amounts of data are collected every minute. To give you an idea of the magnitude of this increase: in December 2019, Dynatrace Davis used to analyze 6 trillion dependencies per minute. Today, this number has grown 6-fold, up to 40 trillion per minute. It’s clear that this amount just adds to the strain on the infrastructure for data processing and analysis.

As a developer or data engineer, you need to know which data you’re telling your software to collect and how this data is then analyzed to give you answers to specific business cases. Intelligent observability helps you along the way by showing you patterns and information that you don’t have to look for manually.

What’s in the future of observability in software development

Though observability has gained more and more traction, it’s still a topic that, in many ways, is a work-in-progress due to its complexity. Though many standardization efforts exist, new technologies are continuously created (look at how new Kubernetes is). Some technologies may become successful, some may not, but whichever you decide to adopt needs the correct outputs to make it observable. And the more complex the technology, the higher the need for intelligent observability.

Thankfully, this is a problem that’s well-known in the community at large, and many engineers are putting in the effort to make observability more understandable to those working in tech. Check out Henrik’s YouTube Channel, Is it Observable, providing tutorials on collecting metrics for observability with various technologies.

With the advent of BizDevSecOps, which aims to combine DevOps with Security and Business (security for, well, the protection of the product; business to enhance the end user’s experience), we will see a further rise in the adoption of observability. Business teams will need to be more involved in development to ensure that products have a seamless user experience to increase profits further. Observability is a crucial component to make this happen and enable successful BizDevSecOps.


The Rise of Intelligent Observability in Software Development was originally published in Dynatrace Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Written by

author

Giulia Di Pietro