Introduction
We are hearing lot about MLOps for deployment of Machine Learning Models. Data Scientists faces lot of challenges with Software Engineering and DevOps related processes. The MLOps provides solution for these challenges and provide a methodology for scalable deployment by following the best practices of Software Engineering and DevOps.
We will explore one such tool - Kubeflow for MLOps and get ourself familiar with it.
Machine Learning Project Workflow
Data Science project has mainly 2 phases:
- Experimentation
- Inference
Based on the above steps, DS Project can be mainly divided into below steps:
- Data Prep
- Model Training
- Prediction
- Service Mangement
The above steps can increase and decrease based on the specific needs of the DS projects as well.
Kubeflow
Kubeflow project is one of the open source MLOps tool, dedicated for deploying ML Models on Kubernetes in simple, portable and scalable manner.
You can read more about architecture on Kubeflow
Deploy Kubeflow on Minikube
Minikube is a tool for creating local kubernetes cluster on your computer. You can minkube using below command:
$ minikube start --cpus 4 --memory 8096 --disk-size=40g 😄 minikube v1.23.2 on Darwin 11.7.2 ✨ Using the docker driver based on existing profile 👍 Starting control plane node minikube in cluster minikube 🚜 Pulling base image ... 🎉 minikube 1.28.0 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.28.0 💡 To disable this notice, run: 'minikube config set WantUpdateNotification false' 🔄 Restarting existing docker container for "minikube" ... 🐳 Preparing Kubernetes v1.22.2 on Docker 20.10.8 ... 🔎 Verifying Kubernetes components... ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5 ▪ Using image kubernetesui/dashboard:v2.3.1 ▪ Using image kubernetesui/metrics-scraper:v1.0.7 🌟 Enabled addons: storage-provisioner, default-storageclass, dashboard ❗ /usr/local/bin/kubectl is version 1.24.0, which may have incompatibilites with Kubernetes 1.22.2. ▪ Want kubectl v1.22.2? Try 'minikube kubectl -- get pods -A' 🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
Check minikube dashboard using:
🤔 Verifying dashboard health ... 🚀 Launching proxy ... 🤔 Verifying proxy health ... http://127.0.0.1:60665/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/$
minikube dashboard --url
Once the minkube is running, install KFP:
$ export PIPELINE_VERSION=1.8.5 $ kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION" $ kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io $ kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"
This will take time as it creates lot components needed for Kubeflow Pipelines.
You can access ML Pipeline User Interface using port forwarding:
Forwarding from 127.0.0.1:8080 -> 3000 Forwarding from [::1]:8080 -> 3000 Handling connection for 8080 Handling connection for 8080 Handling connection for 8080 Handling connection for 8080$
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
Then, open the Kubeflow Pipelines UI at
http://localhost:8080/
Kubeflow comes with some pre-packaged pipelines.
Start by clicking on the "Data passing in python components" pipeline and as you will notice, it is a quite simple pipeline that runs some Python commands. We will start by creating an experiment by clicking the "Create an Experiment" on the UI, give it a name, and then you should end up on a page to start a run.
You can dig more in the execution of pipeline by clicking the links in the screen shot above.
Next Steps
In our next blog we will learn about creating custom Kubeflow pipelines, compile and run them.
Happy Coding
No comments:
Post a Comment