TL:DR: Our spend estimates are based on a machine learning model that relies on a growing collection of signals. The 3 signals below are some of the most influential.

  1. Product Traffic - how much application traffic the product delivers.

  2. Product Adoption - how widely deployed the product is across a company’s applications or infrastructure.

  3. Product Cost - how expensive the product is relative to other products in its category.

Let's look at an example

Say we want to estimate a company’s spend with Amazon EC2. Our spend model will look at which applications are deployed across EC2 and the amount of application traffic EC2 is supporting (in addition to many other signals). The model will then estimate a cost for Amazon EC2 based on actual spend data we have from cloud hosting customers with similar signals.

Applications, Traffic, Product Cost are some of the more influential signal to our spend model.

Improving the model with data, data, and more data!

There are two investments we continuously make to improve our spend estimates.

  1. We're always collecting real-world product spend data from partners, customers, and even companies themselves who either openly publish their costs or share them with us privately.

  2. We're on a never-ending journey to expand our signal library by building new features into our sensor-network.

Both of these on-going data collection activities feed our spend model intelligence that make it better over time.

Note: When you share feedback with us about a product spend being too high or too low, we incorporate that into our dataset which improves our spend estimates! ❤️

No model is perfect, here are a few common challenges

Our spend model is quite good at estimating product spends for over 90% of businesses world-wide but we do run into challenges when trying to estimate spends in certain situations.

Outliers and massive companies

When it comes to estimating product spends for Netflix, Facebook, Apple and similar-sized businesses, our spend model will always underestimate real-world spend, sometimes by a lot. This is because the scale at which these companies operate is unlike that of other businesses so our model doesn’t have a lot of real-world data to rely on. The good news is, we all know these huge companies spend a lot, so use our spend estimates as a relative measure of scale rather than one that's absolute.

Private and unreachable infrastructure

Another challenge our sensor network is always up against is getting comprehensive visibility into a company's traffic and application infrastructure. If a company's applications are entirely hidden from view, inside of a VPC (virtual-private cloud) or behind a firewall that's blocked off from our sensors, we won't have a direct view into the applications and workloads running there. And while our spend model does a darned good job of estimating spend we can't see directly, it can be particularly challenging when we don't have a lot of data to work with. This is another reason why our spend estimates may be low at times.

How to use spend estimates correctly

It's a common misconception that our spend estimates are intended to communicate actual spend. That's not quite right and you can avoid this common pitfall by reading further.

Our spend estimates are designed to accomplish two things:

  1. Communicate a directional sense of product usage.

  2. Allow you to perform math and relative comparisons of one spend with another.

It's important to keep this in-mind when using our spend estimates. We provide an unprecedented view into relative product usage but we don't actually know how much a company is spending on their infrastructure, (no-one does except the company themselves and the provider they're using). That said, we're confident we've developed a methodology that delivers directionally accurate results for the majority of the millions of businesses out there.

A deeper dive into data collection

To address the challenges of estimating infrastructure costs, our sensor network was purposely designed to collect detail on product deployments, application configurations, application traffic, product adoption/usage and more.

A few examples of the types of infrastructure and data our sensor network monitors are:

  1. The web, mobile, and back-end applications a company operates

  2. The operating environments a company manages

  3. The cloud and infrastructure products each application and environment relies on

  4. The usage and application traffic of each product

Discover, measure, monitor and repeat

There are some core concepts we follow

  • Application & environment discovery: We’re always on the hunt for new applications and hosting environments. Companies launch these regularly and our goal is to be the first to know about them.

  • Service change detection: Our monitoring agents see when an application starts or stops using a cloud service. These events typically signify a migration from one provider to another and will impact spend, sometimes significantly.

  • Traffic estimation: Traffic is a key ingredient to our spend estimates. We have developed a platform-agnostic view into application traffic and when combined with third-party log and audience measurement data, the end result is the most complete and unbiased view of global traffic and application demand.

We perform substantial data collection, machine learning, and model refinements to continuously improve the accuracy of our spend estimates. By establishing a consistent method for understanding spend, we're able to provide valuable insight into the cloud and data center ecosystem.

If you’d like to learn more about how we estimate spends, please contact us.

Did this answer your question?