Install Kubeflow Plugins
This page describes how to deploy Kubeflow-related plugins in Alauda AI 2.0 and later.
Supported plugins:
kfbase: Kubeflow base components, including authentication and authorization, the central dashboard, Notebooks, PVC Viewer, TensorBoards, Volumes, Model Registry UI, KServe Endpoints UI, and the Model Catalog API service.model-registry-operator: Kubeflow Model Registry Operator.kfp: Kubeflow Pipelines.kftraining: Kubeflow Training Operator. This plugin is deprecated.kubeflow-trainer: Kubeflow Trainer v2 for training job management. This plugin replaceskftraining.
TOC
Environment PreparationConfigure Dex RedirectionConfigure theoauth2-proxy PluginConfigure ASM v1 (Deprecated)Configure ASM v2Component OnboardingDeployment Steps1. Deploy kfbase (Kubeflow Base)2. Create a Kubeflow User Namespace and Bind a User3. Bind a User to an Existing Namespace4. Deploy kfp and kftraining (Deprecated)5. Deploy Kubeflow Model Registry6. Deploy kubeflow-trainer (Kubeflow Trainer v2)Environment Preparation
Before you begin, make sure the following prerequisites are met:
- An ACP environment is available and running.
- Alauda AI is already deployed. Alauda AI 2.0 or later is required.
- Alauda Build of KServe is installed.
- ASM is deployed in the business cluster where Kubeflow will run. If ASM is not already installed, deploy it before continuing. ASM v1 is deprecated. Use ASM v2 whenever possible.
- The LWS plugin, Alauda Build of LeaderWorkerSet, is installed if you plan to deploy
kubeflow-trainer. - The
oauth2-proxyplugin is configured as described below.
Configure Dex Redirection
Note: Configure the platform access URL for Dex redirection before installing the
kfbaseplugin. This step may update the platform CA certificate. If the certificate changes after you configureoauth2-proxy, theoauth2-proxyconfiguration may fail.
In Administrator > System Settings > Platform Parameters, click Edit next to Platform Access URLs and add a redirect URL in the format https://<your-kubeflow-domain>, for example https://kubeflow.example.com.
<your-kubeflow-domain>must match thekubeflowDomainvalue configured for thekfbaseplugin.
Configure the oauth2-proxy Plugin
Get the platform Dex CA certificate for use later in the Global cluster:
Configure ASM v1 (Deprecated)
In the global cluster, or in ACP Platform Management > Resource Management, update the ServiceMesh resource and add the following content under spec.
Note: If
spec.values.pilot.jwksResolverExtraRootCAis already configured, update onlyspec.meshConfig.extensionProviders. Add new entries without deleting the existing ones.
Configure ASM v2
Note: If any ASM v1 webhooks are still present, delete them first. Otherwise Kubeflow authentication may fail.
In ACP, go to Administrator > MarketPlace > OperatorHub, find Alauda Service Mesh v2, open the All Instances tab, locate the instance of type Istio such as default, click Update, and add the following content under spec:
Component Onboarding
Download the installation packages for the following plugins and upload them with violet:
kfbase: Kubeflow base functionality.model-registry-operator: Kubeflow Model Registry Operator.kfp: Kubeflow Pipelines.kftraining: Kubeflow Training Operator. This plugin is deprecated.kubeflow-trainer: Kubeflow Trainer v2. This plugin replaceskftraining.
Note: If you want to enable Volcano scheduler support for
kftraining, deploy Volcano before installingkftraining.
Deployment Steps
1. Deploy kfbase (Kubeflow Base)
In Cluster Plugins, find the kfbase plugin, complete the configuration on the page, and wait for the deployment to finish.
After deployment:
- In Administrator > System Settings > Platform Parameters, verify that Platform Access URLs contains an address in the format
https://<your-kubeflow-domain>, where<your-kubeflow-domain>is thekubeflowDomainconfigured for thekfbaseplugin. - Configure DNS resolution, or add a local hosts entry, so that
<your-kubeflow-domain>resolves to the IP address assigned tokubectl -n istio-system get gateway kubeflow-external-gateway.
After deployment, the Kubeflow entry appears under Tools in Alauda AI.
For upgrade-specific actions, see Upgrade Kubeflow Plugins.
2. Create a Kubeflow User Namespace and Bind a User
Before a user signs in to Kubeflow for the first time, bind the ACP user to a namespace. The following example creates namespace kubeflow-admin-cpaas-io and assigns admin@cpaas.io as the owner.
Note: If this
Profileresource was already created during Alauda AI deployment, you can skip this step.Note: You may need to lower the Pod Security Admission level of the user namespace before creating Notebook instances and similar workloads.
3. Bind a User to an Existing Namespace
If Alauda AI was already deployed and the namespace kubeflow-admin-cpaas-io already exists, the Profile may also already exist. If the namespace still does not appear in Kubeflow, create the following resources to bind the account to the namespace:
4. Deploy kfp and kftraining (Deprecated)
In Cluster Plugins, find kfp and kftraining and deploy them as needed.
Note: After
kfpis deployed, pipeline-related features become available in the Kubeflow UI.Note:
kftrainingis a background controller. It does not appear as a menu item in the Kubeflow UI.
5. Deploy Kubeflow Model Registry
In Administrator > MarketPlace > OperatorHub, find Model Registry Operator and click Install.
After the operator is installed, open the All Instances tab and create a ModelRegistry instance in the user's namespace.
Note: Create the instance in a namespace that is already bound to a Kubeflow
Profile. Otherwise the Model Registry UI is not displayed.
When creating the instance, configure the following fields as needed:
- Name: Name of the Model Registry instance.
- Namespace: Namespace where the instance will run. This must be a namespace that is already bound to a Kubeflow
Profile. - MySQL Storage Class: Storage class used for Model Registry metadata, for example
standard. - MySQL Storage Size: Storage size for the metadata database. The default is
10Gi. - DisplayName: Display name of the Model Registry instance.
- Description: Short description of the instance.
Note: After the instance starts, refresh the Model Registry entry in the Kubeflow left navigation to see the new instance. Before the first instance is created, the Model Registry page is empty.
Note: The Model Registry instance restricts network requests from other namespaces. To allow additional namespaces, edit
authorizationpolicyfor the instance, for examplekubectl -n <your-namespace> edit authorizationpolicy <model-registry-name>, and update the policy according to the Istio documentation.Note: You can deploy multiple Model Registry instances in different namespaces. Each instance is independent.
6. Deploy kubeflow-trainer (Kubeflow Trainer v2)
Note: If
kftrainingis already deployed, uninstall it before deployingkubeflow-trainer.Note: Install the LWS plugin before deploying
kubeflow-trainer, because LWS is a dependency ofkubeflow-trainer.Note: Kubeflow Trainer v2 requires Kubernetes
1.32.3or later. Older Kubernetes versions may lead to unexpected behavior.
In Cluster Plugins, find kubeflow-trainer, click Install, choose whether to enable JobSet, and complete the installation.