Big Data: January 2023

Monday, January 9, 2023

Running Machine Learning Pipelines on Vertex AI

Introduction

We can run the Machine Learning Pipeline on Kubernetes locally on Minikube. The KFP can be compile and executed on Vertex AI.

Write Kubeflow Pipeline

We will create a basic pipeline, compile it using KFP SDK.

import os

import kfp
from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import component


@component()
def hello_world(text: str) -> str:
    print(text)
    return text


@dsl.pipeline(name='hello-world', description='A simple intro pipeline')
def pipeline_hello_world(text: str = 'hi there'):
    """Pipeline that passes small pipeline parameter string to consumer op."""

    consume_task = hello_world(
        text=text)  # Passing pipeline parameter as argument to consumer op


if __name__ == "__main__":
    # execute only if run as a script
    print("Compiling Pipeline")
    compiler.Compiler().compile(
        pipeline_func=pipeline_hello_world,
        package_path='hello_world_pipeline.json')

We can save the pipeline to hello_world_pipeline.py. We can compile this pipeline to hello_world_pipeline.json .

$ python hello_world_pipeline.py
Compiling Pipeline

Run Pipeline on Vertex-AI

We will run on Vertex-AI Pipeline.

Click on Create Run

Choose Upload File and then navigate to hello_world_pipeline.json path. Choose the default SA from Advanced Options.

Create a GCS bucket and provide the access to Default SA on it. Click Submit.

After few minutes the pipeline will be executed with status succeeded with Green Tick Symbol ✅

The output will also be available on GCS.

Click or download the file, it will have below output message.

Please post your queries below.

Happy Coding !

Kubeflow Pipeline Using Python Based Function Components

Introduction

In our last blog we learned about running Kubeflow on Kubernetes in Minikube. We will learn more about creating Kubeflow Pipelines, compiling these pipelines and running them on minikube in this blog.

Kubeflow Platform

KFP Platform consists of:

A UI for managing and tracking pipelines and their execution
An engine for scheduling a pipeline’s execution
An SDK for defining, building, and deploying pipelines in Python

Kubeflow SDK

We will create a virtual environment kubeflow and install kfp sdk in it.

$ mkvirtualenv kubeflow
$ pip install kfp

Check if the KFP installed correctly

$ python -c "import kfp; print('KFP SDK version: {}'.format(kfp.__version__))"
KFP SDK version: 1.8.16

Custom Kubeflow Pipeline

We will create a custom KF pipeline using python function based components. Read more about KFP here.

import os
import kfp
from kfp import compiler
from kfp.components import func_to_container_op

@func_to_container_op
def hello_world(text: str) -> str:
    print(text)
    return text

def pipeline_hello_world(text: str = 'hi there'):
    """Pipeline that passes small pipeline parameter string to consumer op."""

    consume_task = hello_world(
        text=text)  # Passing pipeline parameter as argument to consumer op

def compile():
    print("Compiling Pipeline")
    compiler.Compiler().compile(
      pipeline_func=pipeline_hello_world,
      package_path='hello_world_pipeline.yaml')

def run():
    print("Running Pipeline")
    client = kfp.Client(host="http://localhost:8080")
    client.create_run_from_pipeline_func(pipeline_hello_world, arguments={})

def main():
    # compile()
    # run()
    pass

if __name__ == "__main__":
    main()

Save the above code to hello_world_pipeline.py. In this code main function has two functions:

compile - Compiles the pipeline code and creates a hello_world_pipeline.yaml
run - Runs the pipeline directly on local KFP Platform http://localhost:8080/

You can un-comment the function based on your need.

Compile Kubeflow Pipeline

I will first un-comment compile function.

$ python hello_world_pipeline.py
Compiling Pipeline

The above creates hello_world_pipeline.yaml in the above path. This file can be imported on KFP UI as below:

Fill the Name and Description, Choose Upload File. Navigate to path.

Once the file is upload, click the create.

This will create a new Pipeline. You can run the pipeline by creating an experiment by clicking the "Create an Experiment" on the UI, give it a name, and then you should end up on a page to start a run.

Click start on below screenshot.

After few minutes the pipeline will complete with Green Tick Symbol ✅

Pipeline Executed Successfully.

Running Kubeflow Pipeline

I will first un-comment run function only. This directly launch the pipeline in KFP UI.

$ python hello_world_pipeline.py
Running Pipeline

We can check the status on KFP UI. After few minutes the pipeline will complete with Green Tick Symbol ✅

Pipeline Executed Successfully.

Please post your queries below.

Happy Coding

Running MLOps on Kubernetes (Minikube)

Introduction

We are hearing lot about MLOps for deployment of Machine Learning Models. Data Scientists faces lot of challenges with Software Engineering and DevOps related processes. The MLOps provides solution for these challenges and provide a methodology for scalable deployment by following the best practices of Software Engineering and DevOps.

We will explore one such tool - Kubeflow for MLOps and get ourself familiar with it.

Machine Learning Project Workflow

Data Science project has mainly 2 phases:

Experimentation
Inference

Data Scientists during experimentation does data prep, exploratory data analysis, feature engineering, model building, training, hyper-parameter tuning and evaluation. Once the model is ready and meets the criteria its ready for deployment so that the inferences can be made in batch or real-time mode.

Based on the above steps, DS Project can be mainly divided into below steps:

Data Prep
Model Training
Prediction
Service Mangement

The above steps can increase and decrease based on the specific needs of the DS projects as well.

Kubeflow

Kubeflow project is one of the open source MLOps tool, dedicated for deploying ML Models on Kubernetes in simple, portable and scalable manner.

You can read more about architecture on Kubeflow

Deploy Kubeflow on Minikube

Minikube is a tool for creating local kubernetes cluster on your computer. You can minkube using below command:

$ minikube start --cpus 4 --memory 8096 --disk-size=40g
😄  minikube v1.23.2 on Darwin 11.7.2
✨  Using the docker driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🎉  minikube 1.28.0 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.28.0
💡  To disable this notice, run: 'minikube config set WantUpdateNotification false'

🔄  Restarting existing docker container for "minikube" ...
🐳  Preparing Kubernetes v1.22.2 on Docker 20.10.8 ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
    ▪ Using image kubernetesui/dashboard:v2.3.1
    ▪ Using image kubernetesui/metrics-scraper:v1.0.7
🌟  Enabled addons: storage-provisioner, default-storageclass, dashboard

❗  /usr/local/bin/kubectl is version 1.24.0, which may have incompatibilites with Kubernetes 1.22.2.
    ▪ Want kubectl v1.22.2? Try 'minikube kubectl -- get pods -A'
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

Check minikube dashboard using:

$ minikube dashboard --url
🤔  Verifying dashboard health ...
🚀  Launching proxy ...
🤔  Verifying proxy health ...
http://127.0.0.1:60665/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/

Once the minkube is running, install KFP:

$ export PIPELINE_VERSION=1.8.5
$ kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
$ kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
$ kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"

This will take time as it creates lot components needed for Kubeflow Pipelines.

You can access ML Pipeline User Interface using port forwarding:

$ kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
Forwarding from 127.0.0.1:8080 -> 3000
Forwarding from [::1]:8080 -> 3000
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080

Then, open the Kubeflow Pipelines UI at http://localhost:8080/

Kubeflow comes with some pre-packaged pipelines.

Start by clicking on the "Data passing in python components" pipeline and as you will notice, it is a quite simple pipeline that runs some Python commands. We will start by creating an experiment by clicking the "Create an Experiment" on the UI, give it a name, and then you should end up on a page to start a run.

After few minutes the pipeline will complete with Green Tick Symbol ✅

You can dig more in the execution of pipeline by clicking the links in the screen shot above.

Next Steps

In our next blog we will learn about creating custom Kubeflow pipelines, compile and run them.

Happy Coding