Skip to content

Domain 1 · Understand Usage and Cost

Caught the day it happens, not month-end.

Anomaly Management is the difference between a five-figure surprise and a Slack thread that closes by lunch. CloudMonitor detects anomalies daily at the resource level, baselines them with a Bayesian model, and routes each one to the owner the cost group already knows about.

The problem

Anomalies that show up at month-end.

Monthly review, too late.

A misconfigured test cluster running for twenty-eight days is a budget hit, not an incident. By the time anyone notices, the money is gone.

Threshold alerts on every line.

A 20% threshold fires on every weekend batch job and every quarter-end backup. Practitioners learn to mute the channel.

Alerts with no owner.

The platform team gets paged for a spike on someone else's workload. Triage costs more than the anomaly.

How CloudMonitor answers

Daily detection, owner-routed, signal not noise.

Daily resource-level detection.

Every resource is checked daily against its own history. A spike on a single VM surfaces before it compounds into a weekly report line.

Bayesian baselining.

Weekly cycles, monthly closes, and quarter-end batch are learned, not flagged. The alert that fires is the one worth investigating.

Owner-routed via cost group.

The same allocation tree that runs invoices also routes anomalies. Each one lands with the team that owns the workload, not a central queue.

Teams plus webhook fabric.

Post to a Teams channel, raise a Jira issue, open a ServiceNow ticket — all from the same alert, no glue code to maintain.

Outcomes

Anomalies caught when they're cheap.

Daily

Resource-level detection

Owner

Routed via cost group, not central queue

Signal

Bayesian baseline, not threshold spam

See an anomaly land in Teams, owner-attached.

The demo tenant has a seeded anomaly and an end-to-end routing trail.