从零构建高可用Kubernetes集群Ansible自动化部署实战指南为什么我们需要自动化部署Kubernetes集群在云原生时代Kubernetes已经成为容器编排的事实标准。然而手动部署一个高可用的Kubernetes集群仍然是一项复杂且容易出错的任务。想象一下你需要重复执行数十个步骤在多台服务器上配置系统参数、安装依赖、部署组件——这不仅耗时而且几乎不可能保证环境的一致性。这正是Ansible这类自动化工具大显身手的地方。通过Ansible我们可以将整个部署过程编码化实现一键部署、版本控制和可重复执行。更重要的是当我们需要扩展集群或重建环境时自动化部署能节省大量时间和精力。1. 环境准备与Ansible基础配置1.1 基础设施规划在开始之前我们需要明确集群的架构设计。一个典型的高可用Kubernetes集群包含以下组件3个Master节点运行控制平面组件API Server、Controller Manager、Scheduler等N个Worker节点运行业务工作负载负载均衡层使用HaproxyKeepalived实现API Server的高可用网络插件Calico、Flannel等提供Pod间通信以下是一个示例的主机清单表格主机名IP地址角色备注master01192.168.1.1Master LB同时运行Haproxymaster02192.168.1.2Master LB同时运行Haproxymaster03192.168.1.3Master LB同时运行Haproxyworker01192.168.1.4Worker运行业务Podworker02192.168.1.5Worker运行业务Podvip192.168.1.100虚拟IP由Keepalived管理1.2 Ansible环境配置首先我们需要在控制节点可以是你的本地开发机或其中一台Master节点上安装Ansible# 在Ubuntu/Debian上 sudo apt update sudo apt install -y ansible # 在CentOS/RHEL上 sudo yum install -y epel-release sudo yum install -y ansible创建Ansible项目目录结构k8s-cluster/ ├── inventories/ │ ├── production/ │ │ ├── group_vars/ │ │ ├── host_vars/ │ │ └── hosts │ └── staging/ ├── roles/ │ ├── common/ │ ├── docker/ │ ├── haproxy/ │ ├── keepalived/ │ ├── kubernetes/ │ └── calico/ └── playbooks/ ├── site.yml ├── master.yml └── worker.yml配置inventories/production/hosts文件[masters] master01 ansible_host192.168.1.1 master02 ansible_host192.168.1.2 master03 ansible_host192.168.1.3 [workers] worker01 ansible_host192.168.1.4 worker02 ansible_host192.168.1.5 [load_balancers:children] masters [kube_cluster:children] masters workers2. 系统基础配置自动化2.1 操作系统通用配置创建roles/common/tasks/main.yml文件包含所有节点都需要的基础配置- name: Disable SELinux selinux: state: disabled - name: Disable swap shell: | swapoff -a sed -i / swap / s/^\(.*\)$/#\1/g /etc/fstab - name: Configure sysctl parameters sysctl: name: {{ item.key }} value: {{ item.value }} state: present reload: yes with_items: - { key: net.bridge.bridge-nf-call-iptables, value: 1 } - { key: net.ipv4.ip_forward, value: 1 } - { key: vm.swappiness, value: 0 } - name: Install base packages yum: name: {{ packages }} state: present vars: packages: - conntrack - ipvsadm - ipset - iptables - curl - sysstat - libseccomp2.2 内核模块加载为支持Kubernetes的IPVS模式我们需要加载必要的内核模块。创建roles/common/tasks/ipvs.yml- name: Ensure ipvs modules are loaded modprobe: name: {{ item }} state: present with_items: - ip_vs - ip_vs_rr - ip_vs_wrr - ip_vs_sh - nf_conntrack_ipv4 - name: Persist ipvs modules copy: content: | #!/bin/bash modprobe -- ip_vs modprobe -- ip_vs_rr modprobe -- ip_vs_wrr modprobe -- ip_vs_sh modprobe -- nf_conntrack_ipv4 dest: /etc/sysconfig/modules/ipvs.modules mode: 07553. 容器运行时安装与配置3.1 Docker安装创建roles/docker/tasks/main.yml- name: Add Docker repository yum_repository: name: docker-ce description: Docker CE Repository baseurl: https://download.docker.com/linux/centos/$releasever/$basearch/stable gpgcheck: yes gpgkey: https://download.docker.com/linux/centos/gpg enabled: yes - name: Install Docker yum: name: docker-ce-18.09.7 state: present - name: Configure Docker daemon copy: content: | { exec-opts: [native.cgroupdriversystemd], log-driver: json-file, log-opts: { max-size: 100m } } dest: /etc/docker/daemon.json - name: Start and enable Docker service: name: docker state: started enabled: yes注意Kubernetes 1.20版本开始逐渐弃用Docker你也可以选择containerd作为容器运行时。配置方式类似但需要调整相关参数。4. 高可用负载均衡部署4.1 Haproxy配置创建roles/haproxy/tasks/main.yml- name: Install Haproxy yum: name: haproxy state: present - name: Configure Haproxy template: src: haproxy.cfg.j2 dest: /etc/haproxy/haproxy.cfg - name: Start Haproxy service: name: haproxy state: restarted enabled: yes对应的模板文件roles/haproxy/templates/haproxy.cfg.j2global log /dev/log local0 log /dev/log local1 notice daemon defaults log global mode tcp timeout connect 5000ms timeout client 50000ms timeout server 50000ms frontend k8s-api bind *:6443 default_backend k8s-api backend k8s-api balance roundrobin option tcp-check {% for host in groups[masters] %} server {{ hostvars[host].ansible_hostname }} {{ hostvars[host].ansible_host }}:6443 check {% endfor %}4.2 Keepalived配置创建roles/keepalived/tasks/main.yml- name: Install Keepalived yum: name: keepalived state: present - name: Configure Keepalived template: src: keepalived.conf.j2 dest: /etc/keepalived/keepalived.conf - name: Start Keepalived service: name: keepalived state: restarted enabled: yes模板文件roles/keepalived/templates/keepalived.conf.j2vrrp_script chk_haproxy { script killall -0 haproxy interval 2 weight 2 } vrrp_instance VI_1 { interface {{ ansible_default_ipv4.interface }} state {{ MASTER if inventory_hostname master01 else BACKUP }} virtual_router_id 51 priority {{ 100 if inventory_hostname master01 else (90 if inventory_hostname master02 else 80) }} advert_int 1 authentication { auth_type PASS auth_pass 42 } virtual_ipaddress { {{ k8s_vip }} } track_script { chk_haproxy } }在group_vars/all.yml中定义变量k8s_vip: 192.168.1.1005. Kubernetes控制平面部署5.1 安装Kubernetes组件创建roles/kubernetes/tasks/main.yml- name: Add Kubernetes repository yum_repository: name: kubernetes description: Kubernetes Repository baseurl: https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ gpgcheck: no enabled: yes - name: Install kubeadm, kubelet and kubectl yum: name: {{ packages }} state: present disable_gpg_check: yes vars: packages: - kubelet-1.19.0 - kubeadm-1.19.0 - kubectl-1.19.0 - name: Enable kubelet service: name: kubelet enabled: yes5.2 初始化第一个Master节点创建playbooks/master.yml- hosts: master01 become: yes roles: - common - docker - kubernetes tasks: - name: Initialize Kubernetes cluster command: kubeadm init --config/tmp/kubeadm-config.yaml args: creates: /etc/kubernetes/admin.conf register: kubeadm_init - name: Copy admin config to local fetch: src: /etc/kubernetes/admin.conf dest: /tmp/admin.conf flat: yes - name: Deploy Calico network command: kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml when: kubeadm_init.rc 05.3 加入其他Master节点在第一个Master节点初始化完成后我们可以获取加入命令kubeadm token create --print-join-command然后创建任务将其他Master节点加入集群- name: Join other masters hosts: masters[1:] become: yes tasks: - name: Join master to cluster command: {{ hostvars[master01].join_command }} --control-plane when: inventory_hostname ! master016. 网络插件与Worker节点配置6.1 部署Calico网络创建roles/calico/tasks/main.yml- name: Download Calico manifest get_url: url: https://docs.projectcalico.org/manifests/calico.yaml dest: /tmp/calico.yaml - name: Apply Calico network command: kubectl apply -f /tmp/calico.yaml when: inventory_hostname master016.2 Worker节点加入集群创建playbooks/worker.yml- hosts: workers become: yes roles: - common - docker - kubernetes tasks: - name: Join worker to cluster command: {{ hostvars[master01].join_command }}7. 验证集群状态在所有节点部署完成后我们可以验证集群状态kubectl get nodes kubectl get pods -n kube-system kubectl get svc以下是一个健康集群应有的核心组件状态组件预期状态副本数kube-apiserverRunning3kube-controller-managerRunning3kube-schedulerRunning3etcdRunning3calico-nodeRunningN3corednsRunning2haproxyRunning3keepalivedRunning38. 高级配置与优化8.1 证书自动续期Kubernetes集群的证书默认有效期为1年我们可以配置自动续期- name: Enable kubelet certificate rotation lineinfile: path: /var/lib/kubelet/config.yaml regexp: ^rotateCertificates: line: rotateCertificates: true state: present - name: Restart kubelet service: name: kubelet state: restarted8.2 集群备份与恢复使用etcdctl备份集群状态ETCDCTL_API3 etcdctl \ --endpointshttps://127.0.0.1:2379 \ --cacert/etc/kubernetes/pki/etcd/ca.crt \ --cert/etc/kubernetes/pki/etcd/server.crt \ --key/etc/kubernetes/pki/etcd/server.key \ snapshot save snapshot.db8.3 安全加固建议启用Pod安全策略配置网络策略限制Pod间通信使用RBAC严格控制访问权限定期轮换证书和密钥启用审计日志9. 常见问题排查9.1 节点NotReady状态可能原因及解决方案网络插件未正确安装检查Calico/Kube-proxy日志容器运行时问题验证Docker/containerd状态kubelet配置错误检查/var/log/messages和kubelet日志9.2 Pod无法调度检查方向kubectl describe pod pod-name kubectl get events --sort-by.metadata.creationTimestamp9.3 API Server不可用排查步骤验证Haproxy状态检查Keepalived是否维护了VIP确认各Master节点的API Server日志10. 扩展与升级策略10.1 集群扩展添加新Worker节点- name: Add new worker hosts: new_worker become: yes roles: - common - docker - kubernetes tasks: - name: Join new worker command: {{ hostvars[master01].join_command }}10.2 集群升级Kubernetes版本升级步骤升级kubeadm排空节点升级控制平面升级kubelet和kubectl升级Worker节点对应的Ansible任务- name: Upgrade kubeadm yum: name: kubeadm-{{ target_version }} state: present - name: Drain node command: kubectl drain {{ inventory_hostname }} --ignore-daemonsets - name: Upgrade control plane command: kubeadm upgrade apply v{{ target_version }} - name: Upgrade kubelet and kubectl yum: name: {{ item }} state: present with_items: - kubelet-{{ target_version }} - kubectl-{{ target_version }} - name: Uncordon node command: kubectl uncordon {{ inventory_hostname }}