1、准备安装环境
安装要求:
- 每台主机必须有 至少 2核 CPU 和 至少 2G 的内存;
- 所有主机之间网络连通;
- 每台主机必须有唯一的 主机名、MAC 地址和 product_uuid;
- kubelet 要求关闭 Swap 才能正常工作。
这里使用 Vagrant 创建 3 台 2核 4G 内存的 CentOS7 虚拟机,1个 master 节点,2个 worker 节点:
[wedot@dx142 kubeadm]$ cat > Vagrantfile <<EOF
Vagrant.configure("2") do |config|
config.vm.define "master01" do |node|
node.vm.box = "centos/7"
node.vm.box_check_update = false
node.vm.hostname = "master01"
node.vm.network "private_network", ip: "192.168.81.101"
node.vm.provision "shell", path: "post-deploy.sh" ,run: "always"
node.vm.provider "virtualbox" do |vbox|
vbox.cpus = 2
vbox.memory = 4096
end
end
(01..02).each do |i|
config.vm.define "node0#{i}" do |node|
node.vm.box = "centos/7"
node.vm.box_check_update = false
node.vm.hostname = "node#{i}"
node.vm.network "private_network", ip: "192.168.81.20#{i}"
node.vm.provision "shell", path: "post-deploy.sh" ,run: "always"
node.vm.provider "virtualbox" do |vbox|
vbox.cpus = 2
vbox.memory = 4096
end
end
end
end
EOF
[wedot@dx142 kubeadm]$ cat > post-deploy.sh <<"EOF"
#!/bin/bash
value=$( grep -ic "entry" /etc/hosts )
if [ $value -eq 0 ]
then
echo "
################ kubernetes host entry ############
192.168.31.101 master-001
192.168.31.102 master-002
192.168.31.103 master-003
######################################################
" >> /etc/hosts
fi
if [ -e /etc/redhat-release ]
then
nmcli connection up System\ enp0s8
fi
EOF
[wedot@dx142 kubeadm]$ vagrant up
- 因为Vagrant配置的private网络“System enp0s8”无故被 CentOS7 替换了(尚未找到原因),使用脚本手动 up 一下。
检查是否满足安装要求,每台主机必须都拥有唯一的 主机名、MAC 地址和 product_uuid,以 master01 为例:
[wedot@dx142 kubeadm]$ vagrant ssh master01
[vagrant@master01 ~]$ su - root
Password:
Last login: Sun Apr 5 19:05:17 EEST 2015 on tty1
[root@master01 ~]# hostname
master01
[root@master01 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:c5:46:4e brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
valid_lft 86064sec preferred_lft 86064sec
inet6 fe80::a00:27ff:fec5:464e/64 scope link
valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:ac:d6:08 brd ff:ff:ff:ff:ff:ff
inet 192.168.81.101/24 brd 192.168.81.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:feac:d608/64 scope link
valid_lft forever preferred_lft forever
[root@master01 ~]# cat /sys/class/dmi/id/product_uuid
0F1AF63D-DBAD-4EAC-A600-BDC7F5955267
关闭 Swap,所有主机都需要操作:
[root@master01 ~]# swapoff -a
[root@master01 ~]# sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
2、安装容器运行时 Docker
所有主机都需要安装容器运行时,这里以 master01 为例:
下载和安装 docker-ce
配置 Docker 的 Yum 源,这里使用阿里云的 Docker Yum 源:
[root@master01 ~]# cat > /etc/yum.repos.d/docker.repo <<EOF
[docker-main]
name=Docker Repository
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/7/x86_64/stable/
enabled=1
gpgcheck=0
EOF
安装 kubernetes v1.14 推荐的 Docker 18.06:
[root@master01 ~]# yum install docker-ce-18.06.3.ce-3.el7.x86_64 -y
启动 docker.service
[root@master01 ~]# systemctl start docker
遇到的问题:Error while creating filesystem xfs on device docker-253:1-17291963-
base: exit status 1
[root@master01 ~]# systemctl start docker
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
[root@master01 ~]# journalctl -u docker -n 20|more
-- Logs begin at Sun 2019-05-19 16:10:43 EEST, end at Sun 2019-05-19 17:00:07 EEST. --
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.092825521+03:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=g
rpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.092883548+03:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/do
cker/containerd/docker-containerd.sock 0 <nil>}]" module=grpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.092899441+03:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.092935503+03:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4201f37e0, CONNECTIN
G" module=grpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.093091109+03:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4201f37e0, READY" mo
dule=grpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.101868083+03:00" level=warning msg="Usage of loopback devices is strongly discouraged for production
use. Please use `--storage-opt dm.thinpooldev` or use `man dockerd` to refer to dm.thinpooldev section." storage-driver=devicemapper
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.338689606+03:00" level=info msg="Creating filesystem xfs on device docker-253:1-17291963-base, mkfs a
rgs: [-m crc=0,finobt=0 /dev/mapper/docker-253:1-17291963-base]" storage-driver=devicemapper
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.339831716+03:00" level=info msg="Error while creating filesystem xfs on device docker-253:1-17291963-
base: exit status 1" storage-driver=devicemapper
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.339856098+03:00" level=error msg="[graphdriver] prior storage driver devicemapper failed: exit status
1"
May 19 17:00:07 master01 dockerd[7057]: Error starting daemon: error initializing graphdriver: exit status 1
May 19 17:00:07 master01 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
May 19 17:00:07 master01 systemd[1]: Failed to start Docker Application Container Engine.
May 19 17:00:07 master01 systemd[1]: Unit docker.service entered failed state.
May 19 17:00:07 master01 systemd[1]: docker.service failed.
May 19 17:00:07 master01 systemd[1]: docker.service holdoff time over, scheduling restart.
May 19 17:00:07 master01 systemd[1]: Stopped Docker Application Container Engine.
May 19 17:00:07 master01 systemd[1]: start request repeated too quickly for docker.service
May 19 17:00:07 master01 systemd[1]: Failed to start Docker Application Container Engine.
May 19 17:00:07 master01 systemd[1]: Unit docker.service entered failed state.
May 19 17:00:07 master01 systemd[1]: docker.service failed.
手动使用mkfs.xfs
创建文件系统,看看具体是什么错误:
[root@master01 ~]# mkfs.xfs -m crc=0,finobt=0 /dev/mapper/docker-253:1-17291963-base
unknown option -m finobt=0
Usage: mkfs.xfs
/* blocksize */ [-b log=n|size=num]
/* metadata */ [-m crc=[0|1]
/* data subvol */ [-d agcount=n,agsize=n,file,name=xxx,size=num,
(sunit=value,swidth=value|su=num,sw=num|noalign),
sectlog=n|sectsize=num
/* force overwrite */ [-f]
/* inode size */ [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,
projid32bit=0|1]
/* no discard */ [-K]
/* log subvol */ [-l agnum=n,internal,size=num,logdev=xxx,version=n
sunit=value|su=num,sectlog=n|sectsize=num,
lazy-count=0|1]
/* label */ [-L label (maximum 12 characters)]
/* naming */ [-n log=n|size=num,version=2|ci,ftype=0|1]
/* no-op info only */ [-N]
/* prototype file */ [-p fname]
/* quiet */ [-q]
/* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx]
/* sectorsize */ [-s log=n|size=num]
/* version */ [-V]
devicename
<devicename> is required unless -d name=xxx is given.
<num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),
xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB).
<value> is xxx (512 byte blocks).
mkfs.xfs
没有-m finobt=0
参数,可能是 XFS 软件包版本过底的问题,升级xfsprogs
软件包:
[root@master01 ~]# rpm -qa|grep -i xfs
xfsprogs-3.2.0-0.10.alpha2.el7.x86_64
[root@master01 ~]# yum update xfsprogs -y
[root@master01 ~]# rpm -qa|grep -i xfs
xfsprogs-4.5.0-19.el7_6.x86_64
升级xfsprogs
软件包之后,重启 docker.service,问题解决:
[root@master01 ~]# systemctl restart docker
[root@master01 ~]# systemctl status docker -l
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: active (running) since Sun 2019-05-19 17:03:25 EEST; 6s ago
Docs: https://docs.docker.com
Main PID: 7139 (dockerd)
Memory: 46.0M
CGroup: /system.slice/docker.service
├─7139 /usr/bin/dockerd
└─7146 docker-containerd --config /var/run/docker/containerd/containerd.toml
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.217188797+03:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4202090d0, CONNECTING" module=grpc
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.217999669+03:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4202090d0, READY" module=grpc
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.218018674+03:00" level=info msg="Loading containers: start."
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.590251077+03:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.871944746+03:00" level=info msg="Loading containers: done."
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.885618616+03:00" level=info msg="Docker daemon" commit=d7080c1 graphdriver(s)=devicemapper version=18.06.3-ce
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.885797435+03:00" level=info msg="Daemon has completed initialization"
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.894076881+03:00" level=warning msg="Could not register builder git source: failed to find git binary: exec: \"git\": executable file not found in $PATH"
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.908769784+03:00" level=info msg="API listen on /var/run/docker.sock"
May 19 17:03:25 master01 systemd[1]: Started Docker Application Container Engine.
docker info 查看 Docker 信息
[root@master01 ~]# docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.06.3-ce
Storage Driver: devicemapper
Pool Name: docker-253:1-17291963-pool
Pool Blocksize: 65.54kB
Base Device Size: 10.74GB
Backing Filesystem: xfs
Udev Sync Supported: true
Data file: /dev/loop0
Metadata file: /dev/loop1
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 10.94MB
Data Space Total: 107.4GB
Data Space Available: 5.577GB
Metadata Space Used: 581.6kB
Metadata Space Total: 2.147GB
Metadata Space Available: 2.147GB
Thin Pool Minimum Free Space: 10.74GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.149-RHEL7 (2018-07-20)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: a592beb5bc4c4092b1b1bac971afed27687340c5
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-123.20.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.704GiB
Name: master01
ID: P4CM:J5T3:Z7X7:EUN4:J3MW:QPGL:SWY2:KGCZ:EXMD:ACOE:HDHF:4VAW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
3、安装 kubernetes 软件包
所有主机都需要安装 Kubernetes 软件包,这里以 master01 为例:
配置 Kubernetes 的 Yum 源,这里使用阿里云的 Kubernetes Yum 源:
[root@master01 ~]# cat > /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
Yum 安装 kubeadm、kubectl、kubelet 和 kubernetes-cni 软件包:
[root@master01 ~]# yum install kubeadm kubectl kubelet kubernetes-cni -y
4、kubeadm init 初始化 master 节点
kubeadm init 之前的准备工作
1)查看kubeadm init
需要下载的容器镜像
[root@master01 ~]# kubeadm config images list
I0519 17:24:16.082322 17456 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0519 17:24:16.082637 17456 version.go:97] falling back to the local client version: v1.14.2
k8s.gcr.io/kube-apiserver:v1.14.2
k8s.gcr.io/kube-controller-manager:v1.14.2
k8s.gcr.io/kube-scheduler:v1.14.2
k8s.gcr.io/kube-proxy:v1.14.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.10
k8s.gcr.io/coredns:1.3.1
- 无法访问
k8s.gcr.io
,Kubernetes v1.14 中可以使用--image-repository
参数指定自定义的镜像仓库,而不需要手动docker pull
和docker tag
准备容器镜像,非常实用的一个参数。
2)如果使用网络插件 flannel,必须指定--pod-network-cidr=10.244.0.0/16
参数,否则部署 flannel 会报Error registering network: failed to acquire lease: node "master01" pod cidr not assigned
错误。
3)如果有对块网卡,kubeadm 选择默认路由使用的网卡部署集群
[root@master01 ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 enp0s3
10.0.2.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.81.0 0.0.0.0 255.255.255.0 U 101 0 0 enp0s8
- 这里 kubeadm 会选择网卡 enp0s3,可以使用
--apiserver-advertise-address=192.168.81.101
参数指定 kubeadm 使用虚拟机的 private_network:enp0s8 网卡。
4)如果kubeadm init
初始化 master 节点失败,找到问题原因之后可以使用kubeadm reset
命令重置集群,然后使用kubeadm init
命令重新初始化 master 节点。
kubeadm init 初始化 master 节点
[root@master01 ~]# kubeadm init --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.81.101
I0520 02:15:19.678613 13717 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0520 02:15:19.678845 13717 version.go:97] falling back to the local client version: v1.14.2
[init] Using Kubernetes version: v1.14.2
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.81.101]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master01 localhost] and IPs [192.168.81.101 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master01 localhost] and IPs [192.168.81.101 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 16.503470 seconds
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.14" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --experimental-upload-certs
[mark-control-plane] Marking the node master01 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node master01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 2yt3kq.ewm19tnly4rrym1a
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.81.101:6443 --token 2yt3kq.ewm19tnly4rrym1a \
--discovery-token-ca-cert-hash sha256:d644d599cd27612c1a2fbd62aa686ad00a6fe298946f229a0668c5ef637176a4
遇到的问题和解决方法
kubeadm init 的 WARNING
[root@master01 ~]# kubeadm init --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
I0519 17:24:56.462966 17462 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0519 17:24:56.463116 17462 version.go:97] falling back to the local client version: v1.14.2
[init] Using Kubernetes version: v1.14.2
[preflight] Running pre-flight checks
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
解决方法:
[root@master01 ~]# systemctl stop firewalld && systemctl disable firewalld
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
[root@master01 ~]# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@master01 ~]# systemctl enable kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.
[root@master01 ~]# cat > /etc/sysctl.d/k8s.conf <<EOF
vm.swappiness = 0
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
[root@master01 ~]# sysctl --system
* Applying /usr/lib/sysctl.d/00-system.conf ...
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
* Applying /usr/lib/sysctl.d/50-default.conf ...
kernel.sysrq = 16
kernel.core_uses_pid = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /etc/sysctl.d/99-sysctl.conf ...
* Applying /etc/sysctl.d/k8s.conf ...
vm.swappiness = 0
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
- 所有节点都需要操作。
kubeadm init 报“error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster”错误
[root@master01 ~]# kubeadm init --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
I0519 17:29:30.515120 17679 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0519 17:29:30.515257 17679 version.go:97] falling back to the local client version: v1.14.2
[init] Using Kubernetes version: v1.14.2
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master01 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master01 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.15]
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
查看 kubelet 日志:
[root@master01 ~]# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sun 2019-05-19 17:30:13 EEST; 1min 43s ago
Docs: https://kubernetes.io/docs/
Main PID: 18200 (kubelet)
Memory: 35.6M
CGroup: /system.slice/kubelet.service
└─18200 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
May 19 17:31:56 master01 kubelet[18200]: E0519 17:31:56.775342 18200 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://10.0.2.15:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster01&limit=500&resourceVersion=0: dial tcp 10.0.2.15:6443: connect: connection refused
May 19 17:31:56 master01 kubelet[18200]: E0519 17:31:56.872568 18200 kubelet.go:2244] node "master01" not found
May 19 17:31:56 master01 kubelet[18200]: E0519 17:31:56.978461 18200 kubelet.go:2244] node "master01" not found
May 19 17:31:56 master01 kubelet[18200]: E0519 17:31:56.985968 18200 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.2.15:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster01&limit=500&resourceVersion=0: dial tcp 10.0.2.15:6443: connect: connection refused
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.079359 18200 kubelet.go:2244] node "master01" not found
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.176252 18200 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://10.0.2.15:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.2.15:6443: connect: connection refused
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.179819 18200 kubelet.go:2244] node "master01" not found
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.286579 18200 kubelet.go:2244] node "master01" not found
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.398565 18200 kubelet.go:2244] node "master01" not found
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.398945 18200 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Get https://10.0.2.15:6443/apis/storage.k8s.io/v1beta1/csidrivers?limit=500&resourceVersion=0: dial tcp 10.0.2.15:6443: connect: connection refused
查看 Docker 容器,没有创建 kube-apiserver容器:
[root@master01 ~]# docker ps -a|grep apiserver|grep -v pause
怀疑是多块网卡导致的问题,kubeadm reset
清空 kubeadm 的所有修改,然后重新kubeadm init
并使用--apiserver-advertise-address
参数指定网卡:
[root@master01 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0519 17:37:43.865009 960 reset.go:73] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp 10.0.2.15:6443: connect: connection refused
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0519 17:37:48.273587 960 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
E0519 17:38:15.554250 960 reset.go:192] [reset] Failed to remove containers: failed to remove running container a11fe6947c5e: output: Error: No such container: a11fe6947c5e
, error: exit status 1
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
[root@master01 ~]# iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
kubelet 报错“Failed to start ContainerManager failed to initialize top level QOS containers”
kubeadm init
的报错信息:
[root@master01 ~]# kubeadm init --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers --apiserver-advertise-address 192.168.81.101
...
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
kubelet 的日志:
[root@master01 ~]# journalctl -u kubelet -n 10|more
-- Logs begin at Sun 2019-05-19 16:10:43 EEST, end at Sun 2019-05-19 17:45:16 EEST. --
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.865421 5240 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.866543 5240 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach
May 19 17:45:16 master01 kubelet[5240]: E0519 17:45:16.876942 5240 kubelet.go:2244] node "master01" not found
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.877319 5240 cpu_manager.go:155] [cpumanager] starting with none policy
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.877331 5240 cpu_manager.go:156] [cpumanager] reconciling every 10s
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.877344 5240 policy_none.go:42] [cpumanager] none policy: Start
May 19 17:45:16 master01 kubelet[5240]: F0519 17:45:16.877925 5240 kubelet.go:1359] Failed to start ContainerManager failed to initialize top level QOS containers:
failed to update top level Burstable QOS cgroup : failed to set supported cgroup subsystems for cgroup [kubepods burstable]: Failed to find subsystem mount for require
d subsystem: pids
May 19 17:45:16 master01 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
May 19 17:45:16 master01 systemd[1]: Unit kubelet.service entered failed state.
May 19 17:45:16 master01 systemd[1]: kubelet.service failed.
问题原因:kubelet之前的配置没有清理干净,Github 上说systemctl stop kubepods-burstable.slice
即可解决问题,但是测试并没有作用。kubeadm reset
之后重启系统解决问题:
kubelet 报错“Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container”
kubeadm init
失败,报错信息和上面一样,但是 kubelet 报Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container
错误:
[root@master01 ~]# journalctl -u kubelet
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.185184 3532 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get http
s://192.168.81.101:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster01&limit=500&resourceVersion=0: dial tcp 192.168.81.101:6443: connect: connection refused
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.272912 3532 kubelet.go:2244] node "master01" not found
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.373281 3532 kubelet.go:2244] node "master01" not found
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.384604 3532 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: G
et https://192.168.81.101:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster01&limit=500&resourceVersion=0: dial tcp 192.168.81.101:6443: connect: connection refuse
d
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.421942 3532 remote_runtime.go:109] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc
= failed to start sandbox container for pod "kube-apiserver-master01": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting containe
r process caused "process_linux.go:301: running exec setns process for init caused \"exit status 23\"": unknown
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.422032 3532 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "kube-apiserver-master01_kube-system(49ec77e4
78f52e3dbb19dec81a7aab04)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-master01": Error response from daemon: OC
I runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 23\"
": unknown
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.422057 3532 kuberuntime_manager.go:693] createPodSandbox for pod "kube-apiserver-master01_kube-system(49ec77e
478f52e3dbb19dec81a7aab04)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-master01": Error response from daemon: O
CI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 23\
"": unknown
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.422121 3532 pod_workers.go:190] Error syncing pod 49ec77e478f52e3dbb19dec81a7aab04 ("kube-apiserver-master01_
kube-system(49ec77e478f52e3dbb19dec81a7aab04)"), skipping: failed to "CreatePodSandbox" for "kube-apiserver-master01_kube-system(49ec77e478f52e3dbb19dec81a7aab04)" wit
h CreatePodSandboxError: "CreatePodSandbox for pod \"kube-apiserver-master01_kube-system(49ec77e478f52e3dbb19dec81a7aab04)\" failed: rpc error: code = Unknown desc = f
ailed to start sandbox container for pod \"kube-apiserver-master01\": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container
process caused \"process_linux.go:301: running exec setns process for init caused \\\"exit status 23\\\"\": unknown"
May 19 18:01:07 master01 kubelet[3532]: W0519 18:01:07.422465 3532 container.go:409] Failed to create summary reader for "/kubepods/burstable/pod49ec77e478f52e3dbb1
9dec81a7aab04/be115e01e71406d455c7e050251e9f2dc4f1517d989a3848c1c23900c1eb1573": none of the resources are being tracked.
May 19 18:01:07 master01 kubelet[3532]: I0519 18:01:07.427259 3532 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach
May 19 18:01:07 master01 kubelet[3532]: I0519 18:01:07.427730 3532 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach
May 19 18:01:07 master01 kubelet[3532]: W0519 18:01:07.430606 3532 pod_container_deletor.go:75] Container "db801ac4e304a3bbd160b8c15db0d09e0fe036b7b32cfbc958b802aec
f7c8064" not found in pod's containers
May 19 18:01:07 master01 kubelet[3532]: W0519 18:01:07.450680 3532 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.473438 3532 kubelet.go:2244] node "master01" not found
May 19 18:01:07 master01 kubelet[3532]: W0519 18:01:07.552663 3532 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/kubepods/burstable/pod1353086c450
cf89683ca588f417f9971/a51c38366ae5007b72bedb0b974cd2109d8defd158ccf07faf84a32e126dc3ed": 0x40000100 == IN_CREATE|IN_ISDIR): readdirent: no such file or directory
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.573647 3532 kubelet.go:2244] node "master01" not found
- 发现 Docker 创建容器失败。
查看 docker.service 的日志:
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04+03:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/2737cf3ce004839f0e630ea2e1cc1492d342ccb78d38f7f0ef122f691e8600a6/shim.sock" debug=false pid=31559
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04+03:00" level=info msg="shim reaped" id=2737cf3ce004839f0e630ea2e1cc1492d342ccb78d38f7f0ef122f691e8600a6
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04.225092029+03:00" level=error msg="stream copy error: reading from a closed fifo"
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04.226160661+03:00" level=error msg="stream copy error: reading from a closed fifo"
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04.655337215+03:00" level=error msg="2737cf3ce004839f0e630ea2e1cc1492d342ccb78d38f7f0ef122f691e8600a6 cleanup: failed to delete container from containerd: no such container"
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04.655390191+03:00" level=error msg="Handler for POST /v1.38/containers/2737cf3ce004839f0e630ea2e1cc1492d342ccb78d38f7f0ef122f691e8600a6/start returned error: OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:297: copying bootstrap data to pipe caused \\\"write init-p: broken pipe\\\"\": unknown"
查看 Docker 和内核版本:
[root@master01 ~]# docker --version
Docker version 18.06.3-ce, build d7080c1
[root@master01 ~]# docker-runc --version
runc version 1.0.0-rc5+dev.docker-18.06
commit: a592beb5bc4c4092b1b1bac971afed27687340c5
spec: 1.0.0
[root@master01 ~]# uname -r
3.10.0-123.20.1.el7.x86_64
可能是内核版本太低而 Docker 版本较新导致的问题,yum update
升级内核到最新版本,问题解决:
[root@master01 ~]# yum update -y && reboot
[root@master01 ~]# uname -r
3.10.0-957.12.2.el7.x86_64
拷贝 kubectl 的配置文件
拷贝 kubectl 的配置文件/etc/kubernetes/admin.conf
到 $HOME/.kube/config
:
[root@master01 ~]# mkdir -p $HOME/.kube
[root@master01 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@master01 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
使用 kubectl 命令:
[root@master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 NotReady master 3m4s v1.14.2
[root@master01 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-d5947d4b-2s9fm 0/1 Pending 0 2m53s
coredns-d5947d4b-hjbcx 0/1 Pending 0 2m53s
etcd-master01 1/1 Running 0 110s
kube-apiserver-master01 1/1 Running 0 2m13s
kube-controller-manager-master01 1/1 Running 0 2m14s
kube-proxy-jmdnj 1/1 Running 0 2m53s
kube-scheduler-master01 1/1 Running 0 105s
如果提示Unable to connect to the server: x509: certificate signed by unknown authority
,一般是发生在kubeadm reset
之后,使用了上次kubeadm init
生成的admin.conf
,重新执行上面的命令覆盖~/.kube/config
即可。
[root@master01 ~]# kubectl get pods -n kube-system
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
5、安装 CNI 网络插件
这里使用 flannel 网络插件:
[root@master01 ~]# wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
[root@master01 ~]# kubectl apply -f kube-flannel.yml
podsecuritypolicy.extensions/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.extensions/kube-flannel-ds-amd64 created
daemonset.extensions/kube-flannel-ds-arm64 created
daemonset.extensions/kube-flannel-ds-arm created
daemonset.extensions/kube-flannel-ds-ppc64le created
daemonset.extensions/kube-flannel-ds-s390x created
检查 flannel 是否启动成功,flannel 启动成功之后 CoreDNS 的将从 Pending 状态变为 Running 状态:
[root@master01 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-d5947d4b-2s9fm 1/1 Running 0 11m
coredns-d5947d4b-hjbcx 1/1 Running 0 11m
etcd-master01 1/1 Running 0 10m
kube-apiserver-master01 1/1 Running 0 10m
kube-controller-manager-master01 1/1 Running 0 10m
kube-flannel-ds-amd64-w2cvr 1/1 Running 0 3m17s
kube-proxy-jmdnj 1/1 Running 0 11m
kube-scheduler-master01 1/1 Running 0 9m58s
如果kubeadm init
时没有指定--pod-network-cidr=10.244.0.0/16
参数,则 flannel 的 Pod 启动失败,报错信息如下:
[root@master01 ~]# kubectl logs -n kube-system kube-flannel-ds-amd64-tj4wq
I0519 15:58:36.994408 1 main.go:514] Determining IP address of default interface
I0519 15:58:36.995315 1 main.go:527] Using interface with name enp0s3 and address 10.0.2.15
I0519 15:58:36.995395 1 main.go:544] Defaulting external address to interface address (10.0.2.15)
I0519 15:58:37.095446 1 kube.go:126] Waiting 10m0s for node controller to sync
I0519 15:58:37.095869 1 kube.go:309] Starting kube subnet manager
I0519 15:58:38.095922 1 kube.go:133] Node controller sync successful
I0519 15:58:38.095967 1 main.go:244] Created subnet manager: Kubernetes Subnet Manager - master01
I0519 15:58:38.095975 1 main.go:247] Installing signal handlers
I0519 15:58:38.097679 1 main.go:386] Found network config - Backend type: vxlan
I0519 15:58:38.097722 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E0519 15:58:38.099008 1 main.go:289] Error registering network: failed to acquire lease: node "master01" pod cidr not assigned
I0519 15:58:38.099040 1 main.go:366] Stopping shutdownHandler...
这时候查看 kubeadm 初始化的配置 ConfigMap:kubeadm-config
:
[root@master01 ~]# kubectl get configmaps -n kube-system kubeadm-config -o yaml
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: ""
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.14.2
networking:
dnsDomain: cluster.local
podSubnet: ""
serviceSubnet: 10.96.0.0/12
scheduler: {}
ClusterStatus: |
apiEndpoints:
master01:
advertiseAddress: 192.168.81.101
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterStatus
kind: ConfigMap
metadata:
creationTimestamp: "2019-05-19T15:34:57Z"
name: kubeadm-config
namespace: kube-system
resourceVersion: "157"
selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
uid: a7f40841-7a4b-11e9-b726-080027c5464e
- 可以看到这里
podSubnet
为空,解决方法:kubeadm reset
后添加--pod-network-cidr=10.244.0.0/16
参数重新kubeadm init
master 节点即可。
6、将 worker 节点加入集群
worker 节点加入集群非常简单,使用kubeadm join
命令即可,在kubeadm init
初始化 master 节点的最后会输出该命令及参数,直接拷贝到 worker 节点上执行:
[root@node1 ~]# kubeadm join 192.168.81.101:6443 --token 2yt3kq.ewm19tnly4rrym1a \
--discovery-token-ca-cert-hash sha256:d644d599cd27612c1a2fbd62aa686ad00a6fe298946f229a0668c5ef637176a4
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
将所有的 worker 节点都加入集群,最后在 master 节点上使用kubectl get nodes
检查节点是否成功加入 kubernetes 集群:
[root@master01 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master01 Ready master 18m v1.14.2 10.0.2.15 <none> CentOS Linux 7 (Core) 3.10.0-957.12.2.el7.x86_64 docker://18.9.6
node1 Ready <none> 3m43s v1.14.2 10.0.2.15 <none> CentOS Linux 7 (Core) 3.10.0-957.12.2.el7.x86_64 docker://18.9.6
node2 NotReady <none> 10s v1.14.2 10.0.2.15 <none> CentOS Linux 7 (Core) 3.10.0-957.12.2.el7.x86_64 docker://18.9.6
7、总结
至此,一个完整的 kubernetes 集群部署完成!总结一下,主要步骤有六个:
- 准备操作系统环境;
- 安装容器运行时 Docker;
- 安装 kubernetes 软件包;
kubeadm init
初始化 master 节点;- 安装 CNI 网络插件 flannel;
kubeadm join
加入 worker 节点。
kubeadm 创建集群是不是非常简单?但是这里只有一个 master 节点,存在单点故障,后面会介绍如何部署包含多个 master 节点的高可用 kubernetes 集群。