It’s now widely accepted that monitoring is only a subset of observability. Monitoring shows you when something is wrong with your IT infrastructure and applications, while observability helps you understand why, typically by analyzing logs, metrics, and traces. In today’s environment, a variety of data streams are needed to determine the “root cause” of performance issues, the holy grail of observability, including availability data, performance metrics, custom metrics, events, logs/traces, and incidents. The observability framework is built from these data sources, and it allows operations teams to explore this data with confidence. Observability can also determine what prescriptive actions to take, with or without human intervention, to respond to or even prevent critical business disruption scenarios. Reaching advanced levels of observability requires an evolution of monitoring from reactive to proactive (or predictive) and finally prescriptive monitoring. Let’s discuss what this evolution includes. It's not an easy thingFirst, a look at the current state of federated IT operations reveals the challenges. Infrastructure and applications are scattered across staging, pre-production, and production environments, both on-premises and in the cloud, and IT operations teams are constantly engaged to ensure these environments are always available and meet business needs. Operations teams must deal with multiple tools, teams, and processes. There is often confusion about how many data flows are required to implement an observability platform and how to align business and IT operations teams within the enterprise to follow a framework that will improve operational optimization over time. In order for monitoring efforts to mature beyond indicator dashboards and into this observable posture, it typically develops in three phases. Reactive, proactive (predictive), and prescriptive. Let’s look at what these are. Phase 1: Reactive monitoring.These are monitoring platforms, tools or frameworks that set performance baselines or norms and then detect if these thresholds are breached and raise the corresponding alerts. They help determine the required optimization configurations to prevent performance thresholds from being reached. Over time, as more hybrid infrastructure is called upon or deployed to support an increasing number of business services and an expanding enterprise scope, the pre-defined baselines may change. This can lead to poor performance becoming normalized, not triggering alerts, and causing the system to completely break down. Enterprises then look to proactive and predictive monitoring to alert them in advance of performance anomalies that may indicate an impending incident. Phase 2: Proactive/predictive monitoring.Although the two words sound different, predictive monitoring can be considered a subset of active monitoring. Active monitoring enables enterprises to look at signals from the environment that may or may not be the cause of a business service disruption. This enables enterprises to prepare remediation plans or standard operating procedures (SOPs) to overcome priority zero incidents. One of the common ways to implement active monitoring is to provide a unified user interface for "managers of managers" where operations teams can access all alerts from multiple monitoring domains to understand the "normal" behavior and "performance bottleneck" behavior of their systems. When a certain pattern of behavior matches an existing machine learning model, indicating a potential problem, the monitoring system triggers an alert. Predictive monitoring uses dynamic thresholds for technologies that are newer to the market, without first-hand experience of how they should perform. These tools then learn the behavior of indicators over time and send alerts when they notice deviations from the standard, which could result in outages or performance degradation that end users would notice. Appropriate actions can be taken based on these alerts to prevent business-impacting incidents from occurring. Phase 3: Normative monitoring.This is the final stage of the observability framework where the monitoring system can learn from the events and remediation/automation packages in the environment and understand the following.
Looking aheadMonitoring is not observability, but a key part of it, starting with reactive monitoring that tells you when pre-defined performance thresholds are breached. As you bring more infrastructure and application services online, monitoring needs to move toward proactive and predictive models that analyze larger monitoring data sets and detect anomalies that could indicate potential problems before service levels and user experience are impacted. The observability framework then needs to analyze a series of data points to determine the most likely cause of a performance issue or outage scenario within the first few minutes of detecting an anomaly, and then start working to remediate that performance issue before it reaches a war room/situation analysis call. The end result is a better user experience, an always-available system, and improved business operations. |
<<: What will the future world look like under the 5G technology revolution?
Today, the typical structure of an internet conne...
[51CTO.com original article] On the second day of...
Virtono is a foreign VPS hosting company founded ...
As the digital transformation of various industri...
Mellanox increases its market share in high-perfo...
On May 21, at the 2018 Global Next Generation Int...
As of April this year, the total number of 5G bas...
OneTechCloud (Yike Cloud) has started its July pr...
666clouds recently launched a three-year annivers...
Computer Network What is TCP congestion control? ...
Megalayer is a foreign hosting company registered...
[51CTO.com original article] Recently, Riverbed l...
Fiber optic networks have become popular over the...
PacificRack is a site under QN Data Center, mainl...
The integration of 5G technology is expected to s...