2 месяца назад я создал кластер kubernetes 1 главный и 2 рабочих узла, сегодня один рабочий узел начал выходить из строя, и я не знаю почему. Я думаю, что ничего необычного не случилось с моим работником.
Я использовал фланель и kubeadm для создания кластера, и он работал очень хорошо.
Если я опишу узел:
tommy@bxybackend:~$ kubectl describe node bxybackend-node01
Name: bxybackend-node01
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=bxybackend-node01
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"06:ca:97:82:50:10"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.168.10.4
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 03 Nov 2019 09:41:48 -0600
Taints: node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 11 Dec 2019 11:17:05 -0600 Wed, 11 Dec 2019 10:37:19 -0600 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 11 Dec 2019 11:17:05 -0600 Wed, 11 Dec 2019 10:37:19 -0600 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 11 Dec 2019 11:17:05 -0600 Wed, 11 Dec 2019 10:37:19 -0600 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 11 Dec 2019 11:17:05 -0600 Wed, 11 Dec 2019 10:37:19 -0600 KubeletNotReady Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Addresses:
InternalIP: 10.168.10.4
Hostname: bxybackend-node01
Capacity:
cpu: 12
ephemeral-storage: 102684600Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 14359964Ki
pods: 110
Allocatable:
cpu: 12
ephemeral-storage: 94634127204
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 14257564Ki
pods: 110
System Info:
Machine ID: 3afa24bb05994ceaaf00e7f22b9322ab
System UUID: 80951742-F69F-6487-F2F7-BE2FB7FEFBF8
Boot ID: 115fbacc-143d-4007-90e4-7fdcb5462680
Kernel Version: 4.15.0-72-generic
OS Image: Ubuntu 18.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.9.7
Kubelet Version: v1.17.0
Kube-Proxy Version: v1.17.0
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system kube-flannel-ds-amd64-sslbg 100m (0%) 100m (0%) 50Mi (0%) 50Mi (0%) 8m31s
kube-system kube-proxy-c5gxc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 8m52s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (0%) 100m (0%)
memory 50Mi (0%) 50Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning SystemOOM 52m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 12170
Normal NodeHasNoDiskPressure 52m (x12 over 38d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 52m (x12 over 38d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasSufficientPID
Normal NodeNotReady 52m (x6 over 23d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeNotReady
Normal NodeHasSufficientMemory 52m (x12 over 38d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasSufficientMemory
Warning ContainerGCFailed 52m (x3 over 6d23h) kubelet, bxybackend-node01 rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal NodeReady 52m (x13 over 38d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeReady
Normal NodeAllocatableEnforced 43m kubelet, bxybackend-node01 Updated Node Allocatable limit across pods
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 9699
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 12639
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 16194
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 19618
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 12170
Normal Starting 43m kubelet, bxybackend-node01 Starting kubelet.
Normal NodeHasSufficientMemory 43m (x2 over 43m) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 43m (x2 over 43m) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasSufficientPID
Normal NodeNotReady 43m kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeNotReady
Normal NodeHasNoDiskPressure 43m (x2 over 43m) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasNoDiskPressure
Normal Starting 42m kubelet, bxybackend-node01 Starting kubelet.
Если я смотрю системный журнал в рабочем:
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552152 19331 kuberuntime_manager.go:981] updating runtime config through cri with podcidr 10.244.1.0/24
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552162 19331 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552352 19331 docker_service.go:355] docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:10.244.1.0/24,},}
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552600 19331 kubelet_network.go:77] Setting Pod CIDR: -> 10.244.1.0/24
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.555142 19331 kubelet_node_status.go:70] Attempting to register node bxybackend-node01
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.652843 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy" (UniqueName: "kubernetes.io/configmap/d6b534db-c32c-491b-a665-cf1ccd6cd089-kube-proxy") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753179 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/d6b534db-c32c-491b-a665-cf1ccd6cd089-xtables-lock") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753249 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/d6b534db-c32c-491b-a665-cf1ccd6cd089-lib-modules") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753285 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy-token-ztrh4" (UniqueName: "kubernetes.io/secret/d6b534db-c32c-491b-a665-cf1ccd6cd089-kube-proxy-token-ztrh4") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753316 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "run" (UniqueName: "kubernetes.io/host-path/6a2299cf-63a4-4e96-8b3b-acd373de12c2-run") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753342 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "cni" (UniqueName: "kubernetes.io/host-path/6a2299cf-63a4-4e96-8b3b-acd373de12c2-cni") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753461 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-cfg" (UniqueName: "kubernetes.io/configmap/6a2299cf-63a4-4e96-8b3b-acd373de12c2-flannel-cfg") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753516 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-token-ts2qt" (UniqueName: "kubernetes.io/secret/6a2299cf-63a4-4e96-8b3b-acd373de12c2-flannel-token-ts2qt") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753531 19331 reconciler.go:156] Reconciler: start to sync state
Dec 11 11:20:12 bxybackend-node01 kubelet[19331]: I1211 11:20:12.052813 19331 kubelet_node_status.go:112] Node bxybackend-node01 was previously registered
Dec 11 11:20:12 bxybackend-node01 kubelet[19331]: I1211 11:20:12.052921 19331 kubelet_node_status.go:73] Successfully registered node bxybackend-node01
Dec 11 11:20:13 bxybackend-node01 kubelet[19331]: E1211 11:20:13.051159 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:16 bxybackend-node01 kubelet[19331]: E1211 11:20:16.051264 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:18 bxybackend-node01 kubelet[19331]: E1211 11:20:18.451166 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:21 bxybackend-node01 kubelet[19331]: E1211 11:20:21.251289 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:25 bxybackend-node01 kubelet[19331]: E1211 11:20:25.019276 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:46 bxybackend-node01 kubelet[19331]: E1211 11:20:46.772862 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:46 bxybackend-node01 kubelet[19331]: F1211 11:20:46.772895 19331 csi_plugin.go:281] Failed to initialize CSINodeInfo after retrying
Dec 11 11:20:46 bxybackend-node01 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Dec 11 11:20:46 bxybackend-node01 systemd[1]: kubelet.service: Failed with result 'exit-code'.