云架构


CentOs7 搭建 K8s + GPU分片

CentOs7 搭建 K8s + GPU分片

配置

Master : 4 OCPU(8vCPU) 16G 100G硬盘

GPU-Node:BM.GPU.A10.4 不是虚拟机,是裸金属服务器。磁盘100G

搭建K8s

Step 1. 扩容磁盘到100G
sudo yum -y install cloud-utils-growpart gdisk
sudo growpart /dev/sda 3
sudo xfs_growfs /dev/sda3
df -h

第2句别用FinnalShell执行,否则会出现中文失败问题

Step 2. 关闭各种限制

下面的语句用于测试环境,生产环境应谨慎。

sudo swapoff -a
sudo sed -i '/swap/s/^\(.*\)$/#\1/g' /etc/fstab
sudo setenforce 0 

sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
sudo systemctl disable firewalld
sudo systemctl stop firewalld
Step3. 开启转发
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

lsmod | grep br_netfilter
lsmod | grep overlay

sudo sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
Step 4. 安装容器
sudo yum install -y yum-utils
sudo yum-config-manager  --add-repo  https://download.docker.com/linux/centos/docker-ce.repo
#yum list docker-ce --showduplicates | sort -r
sudo yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y
sudo systemctl start docker
sudo systemctl enable docker
#sudo docker run hello-world

Master节点使用containerd+docker-ce的配置还有点问题,这里Master换了一个 cri-docker 运行时。 GPU节点不需要执行下面的操作

wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.0/cri-dockerd-0.3.0-3.el7.x86_64.rpm
sudo rpm -ivh cri-dockerd-0.3.0-3.el7.x86_64.rpm

sudo systemctl start cri-docker
sudo systemctl enable cri-docker

ll /var/run/cri-dockerd.sock

#关闭Containerd运行时
sudo systemctl disable containerd
sudo systemctl stop containerd
Step 5. 安装K8s工具

指定版本为1.24.3

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

sudo yum install -y kubelet-1.24.3 kubeadm-1.24.3 kubectl-1.24.3 --disableexcludes=kubernetes

sudo systemctl enable --now kubelet
Step 6. 创建K8s集群
cat <<EOF | sudo tee kubeadm-config.yaml
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.24.3
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
EOF

sudo kubeadm init --config kubeadm-config.yaml

记下下面的输出,以后会用到

image-20230119174514316

继续

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl get node -o wide

image-20230119174715881

Step 7. GPU Node

创建一台 BM.GPU.A10.4 型号的GPU Node,执行Step1 ~ 5.

Step 8. 安装驱动
sudo yum update -y
sudo yum install -y gcc kernel-devel
wget https://us.download.nvidia.com/tesla/515.65.01/NVIDIA-Linux-x86_64-515.65.01.run
chmod +x ./NVIDIA-Linux-x86_64-515.65.01.run
#禁用nouveau,
sudo vim /etc/default/grub
# 设置内核参数modprobe.blacklist=nouveau
sudo grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot

sudo ./NVIDIA-Linux-x86_64-515.65.01.run
#一直按回车键确定即可
nvidia-smi

image-20230119214526733

Step 9 . 安装 Nvidia Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
   
yum-config-manager --enable libnvidia-container-experimental
sudo yum clean expire-cache
sudo yum install -y nvidia-docker2
sudo systemctl restart docker
#sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

修改容器运行时

cat <<EOF | sudo tee /etc/containerd/config.toml
version = 2
[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    [plugins."io.containerd.grpc.v1.cri".registry]
       config_path = "/etc/containerd/certs.d"
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"
EOF

# runtime_type = "io.containerd.runtime.v1.linux"

sudo systemctl restart containerd
sudo systemctl enable containerd
sudo systemctl enable kubelet
Step 10. 把GPU Node加入集群

在GPU Node上执行加入操作

sudo kubeadm join 10.0.10.116:6443 --token sg9835.ouqey7wquuc6kugb \
        --discovery-token-ca-cert-hash sha256:0ae5c99780ba7c41861b4d032a4c462e873f4500cf9ef1dfdcb64b202548570e

image-20230119184702364

在Master上执行查看

kubectl get node -o wide

image-20230119214957132

Step 11. 安装容器网络插件

前面忘记分配Pod网络了,先加上

sudo vim /etc/kubernetes/manifests/kube-controller-manager.yaml
--allocate-node-cidrs=true
--cluster-cidr=10.244.0.0/16

image-20230119185919636

容器网络插件CNI用于跨Node的Pod间通信。在Master上执行

sudo systemctl restart kubelet

curl -O https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f  kube-flannel.yml

#等一会儿
kubectl get pod -A -o wide

image-20230119190029947

阿里GPU分片工具

Step 1. 安装scheduler-extender
cd /etc/kubernetes/
sudo curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/scheduler-policy-config.yaml


kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml
cd /etc/kubernetes/

#备份
sudo cp manifests/kube-scheduler.yaml ./kube-scheduler.yaml.bak

cat <<EOF | sudo tee manifests/kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-scheduler
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=true
    - --config=/etc/kubernetes/scheduler-policy-config.yaml
    image: k8s.gcr.io/kube-scheduler:v1.24.3
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: kube-scheduler
    resources:
      requests:
        cpu: 100m
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/kubernetes/scheduler.conf
      name: kubeconfig
      readOnly: true
    - mountPath: /etc/kubernetes/scheduler-policy-config.yaml
      name: scheduler-policy-config
      readOnly: true
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/scheduler.conf
      type: FileOrCreate
    name: kubeconfig
  - hostPath:
      path: /etc/kubernetes/scheduler-policy-config.yaml
      type: FileOrCreate
    name: scheduler-policy-config
status: {}
EOF

文件差异如下:

image-20230119191234245

scheduler配置修改后会自动重启

Step 2. 部署Device Plugin
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml

给GPU节点打标签

kubectl get node
kubectl label node gpu-node gpushare=true
kubectl label node master-28170 node-role.kubernetes.io/master=""
kubectl get node --show-labels

image-20230119215132607

现在看看GPU Node上有没有GPU Share资源

kubectl describe node gpu-node

image-20230119224504399

如果在国外无法拉取阿里云镜像,那就自行换个地址。 ocir是我自己上传到Oracle镜像仓库,里面我上传了一个k8s-gpushare-plugin:v2-1.11-aff8a23

wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml
sudo sed -i 's/registry.cn-hangzhou.aliyuncs.com\/acs/nrt.ocir.io\/sehubjapacprod/g' device-plugin-ds.yaml
kubectl apply -f device-plugin-ds.yaml
Step 3. 安装扩展工具
cd /usr/bin/
sudo wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
sudo chmod u+x /usr/bin/kubectl-inspect-gpushare

#这工具有权限限制,我们用root用户运行
sudo su
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl inspect gpushare

su opc
Step 3. 测试
vim gpu-share-test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: binpack-1
  labels:
    app: binpack-1
spec:
  replicas: 2
  selector: # define how the deployment finds the pods it mangages
    matchLabels:
      app: binpack-1

  template: # define the pods specifications
    metadata:
      labels:
        app: binpack-1

    spec:
      containers:
      - name: binpack-1
        image: cheyang/gpu-player:v2
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            # GiB
            aliyun.com/gpu-mem: 2
kubectl apply -f gpu-share-test.yaml
sudo kubectl inspect gpushare

image-20230119231511735

image-20230119231446428