My MLOps Adventure: Exploring Software Architecture in MLOps.

Gabriel Simon Tagbor
9 min readJun 23, 2024

--

Introduction

Hey there! Welcome to the first post in my series on MLOps projects. In this article, we’ll explore the concept of software architecture and how it matters in Machine learning operations(MLOps) through a hands-on project.

Just as you need a solid blueprint to build a house, you need a well-thought-out software architecture to build reliable and scalable ML solutions. A good software architecture is useful when navigating the complexities of machine learning models and data pipelines.

We’ll explore key patterns and principles for effective design. I will discuss the thought process and steps involved in creating a software architecture diagram for a real-world application. By the end of this post, you should have a better understanding of software architecture and its critical role in the success of MLOps projects. Buckle up, let’s dive right in!

What is Software Architecture?

let’s think for a second about a building. Before the building is constructed, an architect creates a blueprint that shows how the building will look like when it is completed. The blueprint shows the layout of the building, the rooms, the doors, the windows, the stairs, the roof, etc. The blueprint is a high-level view of the building. It does not show the details of how the building will be constructed, but it gives an idea of what the building will look like when it is completed.

Photo by Parrish Freeman on Unsplash

Software architecture is similar to a building blueprint. It is a high-level view of a software system. It shows the layout of the software system, the components of the software system, how the components interact with each other, and how the software system will work when it is completed.

However, unlike a building blueprint, a software architecture diagram is not a static document. It is a living document that evolves as the needs of software systems evolve.

Software Architecture Patterns

software architecture patterns are reusable solutions to common problems that software developers face when designing software systems. There are several software architecture patterns to consider when developing a software solution. Some of the most popular software system architecture patterns include:

Monolithic Architecture: Best for simpler applications with tightly coupled components where deployment and scaling requirements are minimal.

Microservices Architecture: Ideal for complex applications requiring independent deployment, scaling, and development of components.

Event-Driven Architecture: Suitable for applications that need to respond to a high number of events in real time.

Service-Oriented Architecture: Useful for enterprise systems where different services need to be integrated and reused across the organization.

Each of these patterns has its unique strengths and weaknesses.

The choice of software architecture pattern depends on the requirements of the software system, the needs of the users, and the constraints of the software development team.

In the coming sections, we will design a software architecture based on a given application specification. we will then present a software architecture diagram that shows how the components of the proposed software system interact with each other.

Buckle up, let’s get started!

The Problem Statement

ValleyJuice operates one of the biggest orange plantations in Ghana. The company is currently experiencing production challenges caused by the frequent and random breakdown of orange harvesters in its fleet. ValleyJuice decided to invest in building a predictive maintenance tool for improved servicing of orange harvesters.

The MLOps team(that’s us) just received our copy of the specification document for the predictive maintenance tool. A close look at the document reveals the following requirements:

  1. Data Streaming
  2. Raw Data Storage
  3. Real-time Data Processing Pipeline
  4. Model Inference Pipeline
  5. Inference Post-Processing
  6. User Interface
  7. Employee Maintenance Support Chatbot|

The Deliverable

Present a software architecture diagram that shows how the components of the predictive maintenance software interact with each other.

The Solution

Step 1: Understanding the Requirements

I gathered the ultimate duo of MLops Ninjas; You(the reader) and Ducky(the rubber duck). We read through the specification document and discussed the requirements taking into account the constraints and the team’s capability.

We discovered that we will be building a cloud-native, complex system that requires several components to work together to deliver the desired functionality. We also acknowledged that the system will be processing data in real-time and will be using machine learning models to make predictions.

Step 2: Choose an Appropriate Software Architecture Pattern

Based on our initial understanding of the requirements, we settled on the Microservices architecture pattern for delivering the predictive maintenance software system.

The Microservices architecture is an excellent fit for our needs because it allows us to break down the system into smaller, more manageable components that can be developed, deployed, and scaled independently. This makes development and maintenance easier and enables us to scale the system as user needs evolve.

Why Not Other Patterns?

Monolithic Architecture: While simpler and easier to implement initially, a monolithic architecture can become difficult to manage as the system grows. It is not as flexible or scalable as microservices, making it less suitable for a complex system like ours that requires frequent updates and scaling.

Event-Driven Architecture: This pattern is excellent for systems that need to respond to a high number of events in real time. However, our project’s primary focus is on maintaining a structured flow of data processing and model inference, where the modularity and scalability of microservices provide a better fit.

Service-Oriented Architecture(SOA): SOA is somewhat similar to microservices but typically involves more complex middleware and less flexibility in independent deployment. Given our need for rapid development and deployment cycles, microservices offer a more lightweight and adaptable approach.

having settled on the appropriate architecture pattern it’s time to design the software architecture for ValleyJuice.

Step 3: Design the Software Architecture

We started designing the software architecture by identifying the components of the software system. We identified the following components:

1. Real-time Data Processing Pipeline

This component will be responsible for streaming data from the orange harvesters to the software system. The data streaming service will be responsible for collecting the data from the orange harvesters and sending it to the raw data storage service.

We will use AWS IoT Core together with AWS Kinesis Data Streams to implement the data streaming service. AWS IoT Core will be used to connect the orange harvesters to the cloud, and AWS Kinesis Data Streams will be used to stream the data from the orange harvesters to the raw data storage service. Additionally, for real-time predictive analytics, the streamed data will be sent to `AWS SageMaker` for processing.

2. Raw Data Storage Service

This component will be responsible for storing the raw data collected from the orange harvesters. The raw data storage service’s sole purpose is to store the data in a format that can be easily accessed for periodic model retraining.

We will use AWS S3 to implement the raw data storage service. AWS S3 is a highly scalable, secure, and durable object storage service that can be used to store large amounts of data. The raw data collected from the orange harvesters will be stored in AWS S3 in a format that can be easily accessed for periodic model retraining.

3. Model Inference Pipeline

This component will be responsible for using the preprocessed data to train the appropriate machine learning model for the predictive maintenance tool. Data scientists will use AWS SageMaker to train, deploy, and scale the machine learning model, providing endpoints for making predictions on new data. Using AWS SageMaker allows us to focus on the model development process without worrying about the underlying infrastructure.

Monitoring and Retraining

An essential aspect of our proposed model inference pipeline is the continuous monitoring of model performance. Over time, the accuracy of the model can degrade due to changes in the data, known as data drift. To maintain high performance and accuracy, it is essential to:

Monitor Model Performance: Implement automated tools and dashboards to track key performance metrics such as accuracy, precision, recall, and F1 score. Regularly review these metrics to detect any decline in model performance.

Schedule Retraining: Establish a retraining schedule based on the monitored metrics. For instance, if the model’s accuracy drops below a certain threshold, trigger a retraining process using the latest data. This ensures the model remains up-to-date and effective in making predictions.

By integrating these practices into our model inference pipeline, we can ensure the predictive maintenance tool continues to provide accurate and reliable predictions, adapting to new data and evolving requirements.

4. Serverless Backend Service (Inference Post-Processing)

We need a component that will be responsible for orchestrating the data flow between the different components of the software system. The serverless backend service will be responsible for orchestrating and retrieving insights from the inference pipeline.

The decision to opt for a serverless backend infrastructure was made to reduce the complexity of managing the infrastructure and to allow the software development team to focus on developing the software system. We will use AWS Lambda to implement the serverless backend service. AWS Lambda is a serverless computing service that allows developers to run code without provisioning or managing servers. The serverless backend service will use `AWS Lambda` to orchestrate the data flow between the different components of the software system.

One potential downside of the serverless approach is when there is a need for the predictive maintenance tool to cache data for faster access or to manage stateful operations. In such cases, we will need to support the serverless backend with a managed service like AWS DynamoDB to store the data.

5. User Interface Service

This component will be responsible for displaying the predictions made by the model inference pipeline service to the users. The user interface service will be responsible for providing a user-friendly interface that allows users to interact with the predictions made by the model inference pipeline service.

we will leave the choice of the technology stack for the user interface service to the front-end developers. However, we will recommend a microservices architecture for the user interface service to allow the front-end developers to develop, deploy, and scale the user interface independently.

6. Employee Maintenance Support Chatbot

Finally, we need a component that will provide maintenance support to employees. The employee maintenance support chatbot will be responsible for providing maintenance support to employees who need help with the orange harvesters. The chatbot will be integrated with the user interface service to provide a seamless experience for employees who need maintenance support.

We will use AWS Lex to implement the employee maintenance support chatbot. AWS Lex is a service for building conversational interfaces into any application using voice and text. The chatbot will be integrated with the user interface service to provide maintenance support to employees who need help with the orange harvesters.

We then created a software architecture diagram that shows how the components of the software system interact with each other. The software architecture diagram is a high-level view of the software system that shows the layout of the software system, the components of the software system, and how the components interact with each other.

Here is the software architecture diagram we created for the predictive maintenance tool at ValleyJuice:

A software architecture diagram
Software Architecture Diagram Designed using Draw.io

Reflections

In this post, we explored the essentials of software architecture in MLOps, focusing on understanding requirements and selecting the appropriate design and we reinforced our understanding by designing a predictive maintenance tool for ValleyJuice.

Working on this project gave me a high-level understanding of the importance of careful planning and architecture design in MLOps.

Key takeaways:

Communication is the driver for Interdisciplinary Collaboration: Successful MLOps solutions will require collaboration between data scientists, engineers, and developers. learning to communicate effectively using techniques such as diagramming is key to project success.

Having a Holistic View is Helpful: MLOps engineers should consider the entire lifecycle of machine learning models, ensuring all components work together seamlessly.

Prioritize Scalability and Flexibility: Choosing the right architecture, like Microservices, allows for independent scaling and flexibility.

Design for Continuous Improvement: Continuous performance monitoring and automated retraining schedules are crucial for maintaining model accuracy

This project highlighted the need for thoughtful design and planning in MLOps, reinforcing that careful consideration is key to building robust and scalable systems I hope you found this post insightful and that you are now better equipped to design software architecture diagrams for your MLOps projects. Stay tuned for more hands-on projects in the MLOps journey series!

Further Reading Resources

Fundamentals of Software Architecture Design

AWS Architecture Center

Google Cloud Architecture Framework

Microservices Patterns

--

--

Gabriel Simon Tagbor

MLOps engineer passionate about designing and deploying scalable ML systems. Sharing insights on software architecture, data pipelines, and model deployment. 🚀