What is Kubeflow?

When working with machine learning, it is often necessary to use multiple tools to develop and deploy a model. At the same time, depending on the application, a relatively large amount of computing power and special hardware such as GPUs with high memory capacity or even TPUs are required to train the models in an acceptable time.

One way to overcome these challenges is to train the models directly in the cloud, for example with SageMaker from AWS. Furthermore, there are open-source platforms such as Kubeflow and MLFlow, which can be deployed on your own hardware or used as a service. In this TechUp, we will focus on Kubeflow, which we will use as a service from Civo (https://www.civo.com/machine-learning), which is currently available in beta.

Kubeflow provides a platform that brings these tools and resources together, simplifying the entire workflow. It is an open-source platform for machine learning (ML) built on Kubernetes. Kubeflow offers a wide range of tools and features that reduce the complexity of the ML workflow and accelerate the development process, from data preparation and training to deploying models in production.

Installation and Deployment

Kubeflow, as the name suggests, can be deployed on Kubernetes clusters. Whether you use a self-managed cluster or cloud offerings such as AWS EKS, Google GKE, or Azure AKS does not matter. Since ML workflows can be very computationally intensive and are often greatly accelerated by GPUs, it is important that the cluster is appropriately sized.

In addition, there are now also providers that offer Kubeflow as a service (such as Civo, which is used in this TechUp), which greatly simplifies the setup and management of Kubeflow.

Advantages of Kubeflow

One of the important features of Kubeflow is its flexibility regarding the multitude of tools, which allows developers to use their preferred machine learning framework and build models in various ways. By integrating TensorFlow, PyTorch, XGBoost, and many other frameworks, Kubeflow offers a wide range of options to meet the needs of different projects.

Furthermore, Kubeflow offers seamless collaboration within teams. Developers can work together on machine learning projects by accessing the same resources and tools. Kubeflow provides unified workflows and tools that enable integration of code and data repositories as well as infrastructure. This facilitates collaboration between teams and increases development efficiency.

Among the most important tools that Kubeflow provides are:

Katib: A framework for hyperparameter optimization that automatically finds the best parameters for a specific model.

KServe: A feature for deploying machine learning models that automates the model deployment process and facilitates model scaling.

Kubeflow Pipelines: A framework for creating end-to-end machine learning pipelines that enables the integration of various tools, steps, and algorithms.

TensorBoard: A tool for visualizing model training progress and metrics.

Jupyter Notebook: A tool for interactive data analysis that facilitates the creation of machine learning models.

Another important advantage of Kubeflow is the effective management of resources required for model development and deployment. Certain workflows can be enormously accelerated using GPUs or even TPUs, while others are less so. We had the opportunity to gain an interesting insight into this topic in a presentation by Diogo Guerra and Diana Gaponcic from CERN at this year’s KubeCon in Amsterdam, on how difficult the efficient provision of GPUs can be in the context of large companies with many teams and different workloads.

Kubeflow Pipelines

Kubeflow Pipelines is the central and most widely used element of Kubeflow, offering developers the ability to define, implement, and manage machine learning workflows. With Kubeflow Pipelines, users can create complex pipelines that prepare data, train models, and deploy them to automate the entire ML workflow.

An example of a Kubeflow pipeline could be a pipeline for predicting house prices. This pipeline would include the following steps:

Data collection: Collecting data from various sources such as real estate portals or publicly available databases.

Data preparation: Data cleaning and preprocessing, e.g., removing missing values or converting text to numerical data.

Model training: Selecting a machine learning model, training and tuning the model on the prepared data.

Model evaluation: Evaluating the model on new data to ensure that it is suitable for predictions.

Model deployment: Implementing the model in production so that it can be used for predictions.

In the next section, we will implement a pipeline as an example to classify images of dogs and cats.

Practical Kubeflow Pipeline Example

This techup has been translated automatically by Gemini