CentOs7 搭建 K8s + GPU分片
CentOs7 搭建 K8s + GPU分片
配置
Master : 4 OCPU(8vCPU) 16G 100G硬盘
GPU-Node:BM.GPU.A10.4 不是虚拟机,是裸金属服务器。磁盘100G
搭建K8s
Step 1. 扩容磁盘到100G
sudo yum -y install cloud-utils-growpart gdisk
sudo growpart /dev/sda 3
sudo xfs_growfs /dev/sda3
df -h
第2句别用FinnalShell执行,否则会出现中文失败问题
Step 2. 关闭各种限制
下面的语句用于测试环境,生产环境应谨慎。
sudo swapoff -a
sudo sed -i '/swap/s/^\(.*\)$/#\1/g' /etc/fstab
sudo setenforce 0
sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
sudo systemctl disable firewalld
sudo systemctl stop firewalld
Step3. 开启转发
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply sysctl params without reboot
sudo sysctl --system
lsmod | grep br_netfilter
lsmod | grep overlay
sudo sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
Step 4. 安装容器
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
#yum list docker-ce --showduplicates | sort -r
sudo yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y
sudo systemctl start docker
sudo systemctl enable docker
#sudo docker run hello-world
Master节点使用containerd+docker-ce的配置还有点问题,这里Master换了一个 cri-docker 运行时。 GPU节点不需要执行下面的操作
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.0/cri-dockerd-0.3.0-3.el7.x86_64.rpm
sudo rpm -ivh cri-dockerd-0.3.0-3.el7.x86_64.rpm
sudo systemctl start cri-docker
sudo systemctl enable cri-docker
ll /var/run/cri-dockerd.sock
#关闭Containerd运行时
sudo systemctl disable containerd
sudo systemctl stop containerd
Step 5. 安装K8s工具
指定版本为1.24.3
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF
sudo yum install -y kubelet-1.24.3 kubeadm-1.24.3 kubectl-1.24.3 --disableexcludes=kubernetes
sudo systemctl enable --now kubelet
Step 6. 创建K8s集群
cat <<EOF | sudo tee kubeadm-config.yaml
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.24.3
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
EOF
sudo kubeadm init --config kubeadm-config.yaml
记下下面的输出,以后会用到
继续
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl get node -o wide
Step 7. GPU Node
创建一台 BM.GPU.A10.4 型号的GPU Node,执行Step1 ~ 5.
Step 8. 安装驱动
sudo yum update -y
sudo yum install -y gcc kernel-devel
wget https://us.download.nvidia.com/tesla/515.65.01/NVIDIA-Linux-x86_64-515.65.01.run
chmod +x ./NVIDIA-Linux-x86_64-515.65.01.run
#禁用nouveau,
sudo vim /etc/default/grub
# 设置内核参数modprobe.blacklist=nouveau
sudo grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot
sudo ./NVIDIA-Linux-x86_64-515.65.01.run
#一直按回车键确定即可
nvidia-smi
Step 9 . 安装 Nvidia Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
yum-config-manager --enable libnvidia-container-experimental
sudo yum clean expire-cache
sudo yum install -y nvidia-docker2
sudo systemctl restart docker
#sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
修改容器运行时
cat <<EOF | sudo tee /etc/containerd/config.toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d"
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
EOF
# runtime_type = "io.containerd.runtime.v1.linux"
sudo systemctl restart containerd
sudo systemctl enable containerd
sudo systemctl enable kubelet
Step 10. 把GPU Node加入集群
在GPU Node上执行加入操作
sudo kubeadm join 10.0.10.116:6443 --token sg9835.ouqey7wquuc6kugb \
--discovery-token-ca-cert-hash sha256:0ae5c99780ba7c41861b4d032a4c462e873f4500cf9ef1dfdcb64b202548570e
在Master上执行查看
kubectl get node -o wide
Step 11. 安装容器网络插件
前面忘记分配Pod网络了,先加上
sudo vim /etc/kubernetes/manifests/kube-controller-manager.yaml
--allocate-node-cidrs=true
--cluster-cidr=10.244.0.0/16
容器网络插件CNI用于跨Node的Pod间通信。在Master上执行
sudo systemctl restart kubelet
curl -O https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f kube-flannel.yml
#等一会儿
kubectl get pod -A -o wide
阿里GPU分片工具
Step 1. 安装scheduler-extender
cd /etc/kubernetes/
sudo curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/scheduler-policy-config.yaml
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml
cd /etc/kubernetes/
#备份
sudo cp manifests/kube-scheduler.yaml ./kube-scheduler.yaml.bak
cat <<EOF | sudo tee manifests/kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
- --config=/etc/kubernetes/scheduler-policy-config.yaml
image: k8s.gcr.io/kube-scheduler:v1.24.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/scheduler-policy-config.yaml
name: scheduler-policy-config
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/scheduler-policy-config.yaml
type: FileOrCreate
name: scheduler-policy-config
status: {}
EOF
文件差异如下:
scheduler配置修改后会自动重启
Step 2. 部署Device Plugin
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml
给GPU节点打标签
kubectl get node
kubectl label node gpu-node gpushare=true
kubectl label node master-28170 node-role.kubernetes.io/master=""
kubectl get node --show-labels
现在看看GPU Node上有没有GPU Share资源
kubectl describe node gpu-node
如果在国外无法拉取阿里云镜像,那就自行换个地址。 ocir是我自己上传到Oracle镜像仓库,里面我上传了一个k8s-gpushare-plugin:v2-1.11-aff8a23
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml
sudo sed -i 's/registry.cn-hangzhou.aliyuncs.com\/acs/nrt.ocir.io\/sehubjapacprod/g' device-plugin-ds.yaml
kubectl apply -f device-plugin-ds.yaml
Step 3. 安装扩展工具
cd /usr/bin/
sudo wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
sudo chmod u+x /usr/bin/kubectl-inspect-gpushare
#这工具有权限限制,我们用root用户运行
sudo su
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl inspect gpushare
su opc
Step 3. 测试
vim gpu-share-test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: binpack-1
labels:
app: binpack-1
spec:
replicas: 2
selector: # define how the deployment finds the pods it mangages
matchLabels:
app: binpack-1
template: # define the pods specifications
metadata:
labels:
app: binpack-1
spec:
containers:
- name: binpack-1
image: cheyang/gpu-player:v2
imagePullPolicy: IfNotPresent
resources:
limits:
# GiB
aliyun.com/gpu-mem: 2
kubectl apply -f gpu-share-test.yaml
sudo kubectl inspect gpushare