关于 K8s 集群 CentOS Linux 7 节点批量 Kernel 升级的一些笔记
写在前面
k8s 集群安装一个观测工具检查发现内核版本太低不支持,所有决定升级
操作环境为实验环境,所以没什么顾虑
如果生产环境升级,需要做错误预算哈,最好用 Velero 备份,做好集群迁移的准备
高内核版本支持 cgroup2,如果新集群部署需要考虑下这块。
理解不足小伙伴帮忙指正
对每个人而言,真正的职责只有一个:找到自我。然后在心中坚守其一生,全心全意,永不停息。所有其它的路都是不完整的,是人的逃避方式,是对大众理想的懦弱回归,是随波逐流,是对内心的恐惧 ——赫尔曼·黑塞《德米安》
本地的 k8s 集群,CentOS Linux 7 (Core) 的系统
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get nodes
NAME STATUS ROLES AGE VERSION
vms100.liruilongs.github.io Ready control-plane 6d4h v1.25.1
vms101.liruilongs.github.io Ready control-plane 6d4h v1.25.1
vms102.liruilongs.github.io Ready control-plane 6d4h v1.25.1
vms103.liruilongs.github.io Ready <none> 6d4h v1.25.1
vms105.liruilongs.github.io Ready <none> 6d4h v1.25.1
vms106.liruilongs.github.io Ready <none> 6d4h v1.25.1
vms107.liruilongs.github.io Ready <none> 6d4h v1.25.1
vms108.liruilongs.github.io Ready <none> 6d4h v1.25.1
┌──[root@vms100.liruilongs.github.io]-[~]
└─$
内核版本 Linux 3.10.0-693.el7.x86_64
┌──[root@vms100.liruilongs.github.io]-[~/ansible/pixie]
└─$hostnamectl
Static hostname: vms100.liruilongs.github.io
Icon name: computer-vm
Chassis: vm
Machine ID: e93ae3f6cb354f3ba509eeb73568087e
Boot ID: 5ed408a863df48ae80b51f1b6c4be85f
Virtualization: vmware
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-693.el7.x86_64
Architecture: x86-64
┌──[root@vms100.liruilongs.github.io]-[~/ansible/pixie]
└─$
在安装一个观测工具时,提示内核版本太低
┌──[root@vms100.liruilongs.github.io]-[~/ansible/pixie]
└─$px deploy --check_only
Pixie CLI
Running Cluster Checks:
✕ Kernel version > 4.14.0 ERR: kernel version for node (vms100.liruilongs.github.io) not supported ✕ Kernel version > 4.14.0 ERR: kernel version for node (vms100.liruilongs.github.io) not supported. Must have minimum kernel version of (4.14.0)
Check pre-check has failed. To bypass pass in --check=false. error=kernel version for node (vms100.liruilongs.github.io) not supported. Must have minimum kernel version of (4.14.0)
决定升级内核,
这里升级方案,先升级一台机器,确认没有问题,对集群做简单测试,半小时后,如果集群运行正常,然后通过 Ansible 批量升级其他的节点。
Linux 官方内核 需要从 https://www.kernel.org/ 下载并编译安装
大多数 Linux 发行版提供自行维护的内核,可以通过 yum 、df或 rpm 等包管理系统升级。
ELRepo 是一个为Linux提供驱动程序和内核镜像的存储库,一个用于企业 Linux 软件包的 RPM 存储库。ELRepo 支持 Red Hat Enterprise Linux (RHEL) 及其重建项目.
ELRepo 项目专注于硬件相关的软件包,以增强您使用 Enterprise Linux 的体验。这包括文件系统驱动程序、图形驱动程序、网络驱动程序、声音驱动程序、网络摄像头和视频驱动程序。
ELRepo官网:http://elrepo.org/tiki/tiki-index.php
#查看 yum 中可升级的内核版本
yum list kernel --showduplicates
#如果list中有需要的版本可以直接执行 update 升级,多数是没有的,所以要按以下步骤操作
#导入ELRepo软件仓库的公共秘钥
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
#Centos7系统安装ELRepo
yum install https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
#Centos8系统安装ELRepo
yum install https://www.elrepo.org/elrepo-release-8.el8.elrepo.noarch.rpm
#查看ELRepo提供的内核版本
yum --disablerepo="*" --enablerepo="elrepo-kernel" list available
Kernel 升级
先找一台机器单独升级
Centos7系统安装ELRepo , 导入ELRepo软件仓库的公共秘钥
┌──[root@vms100.liruilongs.github.io]-[~/back]
└─$yum -y install https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
┌──[root@vms100.liruilongs.github.io]-[~/back]
└─$rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
查看ELRepo提供的内核版本
┌──[root@vms100.liruilongs.github.io]-[~/back]
└─$yum --disablerepo="*" --enablerepo="elrepo-kernel" list available
已加载插件:fastestmirror
elrepo-kernel | 3.0 kB 00:00:00
elrepo-kernel/primary_db | 2.1 MB 00:01:40
Loading mirror speeds from cached hostfile
* elrepo-kernel: ftp.yz.yamagata-u.ac.jp
可安装的软件包
kernel-lt.x86_64 5.4.230-1.el7.elrepo elrepo-kernel
kernel-lt-devel.x86_64 5.4.230-1.el7.elrepo elrepo-kernel
kernel-lt-doc.noarch 5.4.230-1.el7.elrepo elrepo-kernel
kernel-lt-headers.x86_64 5.4.230-1.el7.elrepo elrepo-kernel
kernel-lt-tools.x86_64 5.4.230-1.el7.elrepo elrepo-kernel
kernel-lt-tools-libs.x86_64 5.4.230-1.el7.elrepo elrepo-kernel
kernel-lt-tools-libs-devel.x86_64 5.4.230-1.el7.elrepo elrepo-kernel
kernel-ml.x86_64 6.1.8-1.el7.elrepo elrepo-kernel
kernel-ml-devel.x86_64 6.1.8-1.el7.elrepo elrepo-kernel
kernel-ml-doc.noarch 6.1.8-1.el7.elrepo elrepo-kernel
kernel-ml-headers.x86_64 6.1.8-1.el7.elrepo elrepo-kernel
kernel-ml-tools.x86_64 6.1.8-1.el7.elrepo elrepo-kernel
kernel-ml-tools-libs.x86_64 6.1.8-1.el7.elrepo elrepo-kernel
kernel-ml-tools-libs-devel.x86_64 6.1.8-1.el7.elrepo elrepo-kernel
perf.x86_64 5.4.230-1.el7.elrepo elrepo-kernel
python-perf.x86_64 5.4.230-1.el7.elrepo elrepo-kernel
┌──[root@vms100.liruilongs.github.io]-[~/back]
└─$
kernel-lt:表示longterm,即长期支持的内核;当前为5.4.
kernel-ml:表示mainline,即当前主线的内核;当前为5.17.
这里我们升级长期支持的版本,直接升级
#长期支持的内核
┌──[root@vms100.liruilongs.github.io]-[~/back]
└─$yum -y --enablerepo=elrepo-kernel install kernel-lt.x86_64
查看系统可用内核,并设置启动项
┌──[root@vms100.liruilongs.github.io]-[~/back]
└─$sudo awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
0 : CentOS Linux (5.4.230-1.el7.elrepo.x86_64) 7 (Core)
1 : CentOS Linux 7 Rescue e93ae3f6cb354f3ba509eeb73568087e (3.10.0-1160.83.1.el7.x86_64)
2 : CentOS Linux (3.10.0-1160.83.1.el7.x86_64) 7 (Core)
3 : CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)
4 : CentOS Linux (0-rescue-80c608ceab5342779ba1adc2ac29c213) 7 (Core)
指定开机启动内核版本
┌──[root@vms100.liruilongs.github.io]-[~/back]
└─$grub2-set-default 0
生成 grub 配置文件
┌──[root@vms100.liruilongs.github.io]-[~/back]
└─$grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.4.230-1.el7.elrepo.x86_64
Found initrd image: /boot/initramfs-5.4.230-1.el7.elrepo.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-1160.83.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1160.83.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-693.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-80c608ceab5342779ba1adc2ac29c213
Found initrd image: /boot/initramfs-0-rescue-80c608ceab5342779ba1adc2ac29c213.img
Found linux image: /boot/vmlinuz-0-rescue-e93ae3f6cb354f3ba509eeb73568087e
Found initrd image: /boot/initramfs-0-rescue-e93ae3f6cb354f3ba509eeb73568087e.img
done
重启系统,验证
┌──[root@vms100.liruilongs.github.io]-[~/back]
└─$reboot
Connection to 192.168.26.100 closed by remote host.
Connection to 192.168.26.100 closed.
....
┌──[root@vms100.liruilongs.github.io]-[~]
└─$hostnamectl
Static hostname: vms100.liruilongs.github.io
Icon name: computer-vm
Chassis: vm
Machine ID: e93ae3f6cb354f3ba509eeb73568087e
Boot ID: a1150b6d97dc4afbb81dae58f131a487
Virtualization: vmware
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 5.4.230-1.el7.elrepo.x86_64
Architecture: x86-64
┌──[root@vms100.liruilongs.github.io]-[~]
└─$
确实没有问题之后,对集群做简单测试,等半个小时,批量升级一下
编写升级脚本
#!/bin/bash
#@File : update_kernel
#@Time : 2023/02/01 23:58:23
#@Author : Li Ruilong
#@Version : 1.0
#@Desc : contos 7 批量升级内核脚本
#@Contact : liruilonger@gmail.com
yum -y install https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
yum -y --enablerepo=elrepo-kernel install kernel-lt.x86_64
grub2-set-default 0
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
拷贝脚本到升级节点机器
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible ansible_node -m copy -a "src=./update_kernel/update_kernel.sh dest=/tmp/" -i host.yaml
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible ansible_node -m shell -a "cat /tmp/update_kernel.sh" -i host.yaml
运行升级脚本
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible ansible_node -m shell -a "/usr/bin/bash /tmp/update_kernel.sh" -i host.yaml -f 7 -vvv
升级完成查看内核版本确认
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible ansible_node -m shell -a 'hostnamectl | grep Kernel' -i host.yaml
192.168.26.106 | CHANGED | rc=0 >>
Kernel: Linux 5.4.230-1.el7.elrepo.x86_64
192.168.26.105 | CHANGED | rc=0 >>
Kernel: Linux 5.4.230-1.el7.elrepo.x86_64
192.168.26.102 | CHANGED | rc=0 >>
Kernel: Linux 5.4.230-1.el7.elrepo.x86_64
192.168.26.103 | CHANGED | rc=0 >>
Kernel: Linux 5.4.230-1.el7.elrepo.x86_64
192.168.26.101 | CHANGED | rc=0 >>
Kernel: Linux 5.4.230-1.el7.elrepo.x86_64
192.168.26.107 | CHANGED | rc=0 >>
Kernel: Linux 5.4.230-1.el7.elrepo.x86_64
192.168.26.108 | CHANGED | rc=0 >>
Kernel: Linux 5.4.230-1.el7.elrepo.x86_64
查看集群信息确认
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$kubectl get nodes
NAME STATUS ROLES AGE VERSION
vms100.liruilongs.github.io Ready control-plane 6d5h v1.25.1
vms101.liruilongs.github.io Ready control-plane 6d5h v1.25.1
vms102.liruilongs.github.io Ready control-plane 6d5h v1.25.1
vms103.liruilongs.github.io Ready <none> 6d4h v1.25.1
vms105.liruilongs.github.io Ready <none> 6d4h v1.25.1
vms106.liruilongs.github.io Ready <none> 6d4h v1.25.1
vms107.liruilongs.github.io Ready <none> 6d4h v1.25.1
vms108.liruilongs.github.io Ready <none> 6d4h v1.25.1
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$
运行原来的工具测试
┌──[root@vms100.liruilongs.github.io]-[~/ansible/pixie]
└─$px deploy --check_only
Pixie CLI
Running Cluster Checks:
✔ Kernel version > 4.14.0
✔ Cluster type is supported
✔ K8s version > 1.16.0
✔ Kubectl > 1.10.0 is present
✔ User can create namespace
INFO[0002] All Required Checks Passed!
┌──[root@vms100.liruilongs.github.io]-[~/ansible/pixie]
└─$
作者:山河已无恙
欢迎关注微信公众号 :山河已无恙