Domain 1 · Understand Usage and Cost
Caught the day it happens, not month-end.
Anomaly Management is the difference between a five-figure surprise and a Slack thread that closes by lunch. CloudMonitor detects anomalies daily at the resource level, baselines them with a Bayesian model, and routes each one to the owner the cost group already knows about.
The problem
Anomalies that show up at month-end.
Monthly review, too late.
A misconfigured test cluster running for twenty-eight days is a budget hit, not an incident. By the time anyone notices, the money is gone.
Threshold alerts on every line.
A 20% threshold fires on every weekend batch job and every quarter-end backup. Practitioners learn to mute the channel.
Alerts with no owner.
The platform team gets paged for a spike on someone else's workload. Triage costs more than the anomaly.
How CloudMonitor answers
Daily detection, owner-routed, signal not noise.
Daily resource-level detection.
Every resource is checked daily against its own history. A spike on a single VM surfaces before it compounds into a weekly report line.
Bayesian baselining.
Weekly cycles, monthly closes, and quarter-end batch are learned, not flagged. The alert that fires is the one worth investigating.
Owner-routed via cost group.
The same allocation tree that runs invoices also routes anomalies. Each one lands with the team that owns the workload, not a central queue.
Teams plus webhook fabric.
Post to a Teams channel, raise a Jira issue, open a ServiceNow ticket — all from the same alert, no glue code to maintain.
Outcomes
Anomalies caught when they're cheap.
Daily
Resource-level detection
Owner
Routed via cost group, not central queue
Signal
Bayesian baseline, not threshold spam
See an anomaly land in Teams, owner-attached.
The demo tenant has a seeded anomaly and an end-to-end routing trail.