Scaling Horizontally to Accelerate Your ML Efforts

Written by Paul Welch | Jul 20, 2021 8:56:17 PM

When it comes to taking the plunge with machine learning (ML), it’s not uncommon for enterprises to feel skittish about costs.

Beyond the investment in data scientists, the sheer horsepower necessary for creating and teaching ML models can require hundreds of thousands of dollars to be spent on high-end workstations.

While there are benefits to making that sort of upfront investment, many organizations adopting ML quickly realize that as their models become more complex, the math stops working in their favor.

When that happens, enterprises are faced with two options: scale back their ML efforts or find a way to scale differently.

Divide and conquer

One of the most effective ways to reign in ML costs is to scale horizontally rather than vertically.

In other words, instead of investing in increasingly powerful workstations, architect a solution that allows you to spread ML workloads across a battery of servers running in parallel.

This approach can provide benefits beyond simple hardware costs, including:

Resource-intensive ML workloads can be completed much more quickly
Storage and compute can be added and reduced as needed
The barrier to entry for ML can be easier to break through

Additionally, horizontal scaling can help you avoid one of the most common pitfalls when it comes to taking ML models into production: the disconnect between data science and IT.

This disconnect, which can undermine even the most sophisticated ML operations, occurs when a model created on a dedicated workstation beneath a data scientist’s desk is handed over to IT, who are then forced to reverse-engineer how the model was created in order to make it productive on the organization’s infrastructure.

Not only does this lead to wasted time and resources for IT, it can outright cause ML models to never see the light of day once they leave the realm of data science.

In contrast to this scenario, taking a horizontal scaling approach essentially allows IT to be a part of the ML process from the start, since the workloads are spread out among clusters in the cloud managed by IT operations.

Controlling clusters

With a clusters approach, you’re also able to pool servers for specific steps in the ML modeling process.

This means you can have the capacity of a set of servers for experimentation and development work and another set for your production workloads — and have all of them running in parallel — essentially, a DevOps approach to ML.

The key to the cluster approach is effectively managing the pools of servers being used. That’s where a tool like SUSE Rancher comes in.

With SUSE Rancher, you’re able to manage all your servers employed for ML workloads from a single source—all with consistent governance and security of data across clusters.

SUSE Rancher is so effective at managing large quantities of clusters that we’ve used it as the foundation for our ML Accelerator, which is designed to quick-start enterprise ML adoption.

ML on your terms

Regardless of where your organization is currently at when it comes to ML, scaling horizontally can be an effective way to accelerate the path of your models from creation to production.

To learn more about how you can accelerate your success with ML, AI, and other advanced analytics solutions, check out our free guide. You can also watch our webinar on the Redapt ML Accelerator.

View full post