Machine Learning Workloads: On-Premises vs. the Cloud

Written by Paul Welch | May 21, 2020 11:11:57 PM

Unlocking the very real benefits of machine learning (ML) comes at a cost, with high-performance GPUs and storage being at the top of the list of expenses.

While there’s no question experimenting with ML in the cloud is a great way to start, most organizations will eventually face the question of whether using a cloud provider or hosting ML workloads on-premises makes more sense.

The answer to this question often depends on some key business factors, including:

The resources you have available
Where you data currently resides
Your number of data science engineers and whether that number is expected to grow

Most of all, whether you go on-premises or with the cloud depends on the almighty dollar.

Read Now: Accelerating Your Success with Artificial Intelligence and Machine Learning

How much you’re willing to invest both upfront and on an ongoing basis, your predicted revenue opportunities from utilizing ML, how much your investment will need to increase as you scale, and so on.

To help you better understand the differences in cost when it comes to ML workloads on-premises versus the cloud, let’s look at an estimated investment breakdown for each platform over three years.

Before we get into the numbers, a quick word on how we’ve made our calculations—for both the on-premises and cloud platforms we:

Compared GPU server instances only
Excluded peripheral expenses like operating expenses (power, space, cooling, administration, etc.)
Utilized a target requirement of 8 x Nvidia V100 GPUs, as well as high memory and local SSD
Compared lowest cost option for public clouds
Compared public cloud server configuration closest to the hardware spec

With those boundaries in place, we estimate that over a three year lifecycle, the breakdown on average GPU server instance costs would be:

$194,444

Physical Server Hardware
(On-Premises)

vs.

$324,087

Public Clouds (AWS, Azure, GCP)
(60% more expensive)

To sum up

While the above numbers are not precise—and the gap between on-premises and the cloud certainly narrows when you add in operational costs—they are at least in the ballpark for what you can expect to pay for GPU server instances over three years.

For organizations looking for options, this calculation should at least give you an idea of the level of investment you’ll be looking at.

Certainly, on-premises seems like the better bargain in our exercise, but as with anything that scales, the growth of your business will dictate the price tag you end up facing.

Ready to learn more about how to overcome the challenges and achieve success in building out your advanced analytics capabilities? Click here to check out our FREE in-depth guide: Accelerating Your Success with Artificial Intelligence and Machine Learning.

View full post