The Development Life Cycle of Machine Learning (Video)

Written by Paul Welch | Sep 9, 2021 2:46:19 PM

The rise in popularity of machine learning (ML) is leading to an influx of newcomers to the technology.

Many of these newcomers are under the assumption that an ML project is fairly straightforward once they have the data and computing resources necessary to begin training models.

This couldn’t be more wrong.

In this discussion, Redapt Senior VP of Product Engineering Paul Welch covers some of the common challenges in adopting ML, as well as industry best practices for addressing technical debt and development lifecycles.

David Self: Welcome everyone to a discussion on the development life cycle of machine learning. My name is David Self and I'm a strategist with A Brave New. In this discussion, Paul Welch, Redapt's Senior Vice-President of Production, and I will be covering, lessons learned, technical debt, development life cycles, industry best practices, and more. Welcome, Paul, and thank you for being here.

Paul Welch: Thanks, David. Happy to be here.

David Self: So I know you shared with me that Redapt has been working with customers to deploy machine learning projects both in the cloud and on-premises, what are some of the best practices that drive success in this space?

Paul Welch: Yeah, that's a great question. So I think one of the best practice categories is how you go about developing your machine learning models and really applying the lessons learned over the past several decades of software engineering to this slightly different newer machine learning model development.

David Self: Okay. So we know that snowflake spilled technical debt, and it also slows your ability to release new and improved production models. In fact, technical debt is estimated to cost businesses $5 trillion in the next 10 years, but it can be managed. I'm curious, Paul, what are some of the technical debt drivers that you've seen?

Paul Welch: Okay. So technical debt in building machine learning models can often come from one-off processes or dedicated environments where the data science and data engineering team is not sharing how they do things, how they build and develop the model as far as the environments and dependencies they use to build them. So, for example, in experimentation and development, in some cases go back and reverse engineer what you did, and what libraries and what versions of things are compatible with other things. Take that to a larger-scale organization and that technical debt can really slow you down and eat up a lot of your time and budget.

David Self: What are some of the ways that Redapt has addressed these concerns for some of their clients?

Paul Welch: Sure. Redapt has a long history of helping customers adopt cloud architectures and deploy applications to the cloud using DevOps and SRE principles, as well as software engineering best practices. And what we like to do is build on the things we're good at. And so, we take what we've been doing for a long time and we apply that to the slightly newer machine learning model development processes.

David Self: So now that we've discussed the hurdles a company can face, let's talk about how to apply these best practices during the development life cycle. Can you take us through the ML development life cycle, Paul?

Paul Welch: Sure. At a high level, developing machine learning models starts with data and the prediction that you want out of the model. So sourcing data, cleaning that data, and making it ready to use in the ML training process is a big part, as well as the feature engineering stuff, where you are really focused on which data attributes are most important to be able to make that prediction. Then the development stage of building the model looks a lot like traditional software engineering, where you're going to write some code that is reading the data, creating the structure of the model, and being able to execute iterations of predictions, which is done over and over in training cycles. That's slightly newer stuff to what you do with building traditional software. Then once you're happy with the outcome of the training model, then you can package that and deploy it to a production environment just like you would with other traditional software.

David Self: What is the difference between an ML model versus traditional software?

Paul Welch: Yeah. Good question. So an ML model, it has some similarities to other software in there's some code and it's packaged up in a way to be deployed. But an ML model is not a standalone application like many other software apps, an ML model is really made up of the structure of the model, maybe a neural network, or maybe some other structure, as well as a set of weights and biases that represent the state of the model. The model structure and the state of the model, that's the output of the training cycles, is the core of the model that goes along with the code. All of that has to be packaged up together. And then with traditional software, you normally package it in a way that's ready to run the entire business logic in production. An ML model is really deployed into a framework or a tool that knows how to deserialize that model and make new predictions in production.

David Self: Great. Thank you. How do you run an ML model?

Paul Welch: To run an ML model, it depends a lot on which toolset and frameworks you're using. There are different methods, but at a very high level, most of them take the model, that is, the state and structure of the model, and use that to create an instance of that model runtime using the toolset that it was built with.

David Self: Before we wrap things up, let's dig a little more about model life cycle practices. Model development is not traditional software, but there seem to be more similarities than differences. Can you help clarify?

Paul Welch: Yeah. I would say that model development does have more similarities than differences. I think in the stages of developing code as well as how you package that model and run it and operate it in production, I think that it is very similar to traditional software, and can get some of the same benefits out of using traditional software best practices.

David Self: Great. That's super helpful. Well, thank you so much, Paul, for taking the time out to join us for this discussion. We really appreciate it.

Paul Welch: Thank you, David.

David Self: You can visit redapt.com to learn how to successfully adopt AI and ML capabilities in the cloud or on-premises with ready-to-use solutions tailored for advanced analytics workloads.

View full post