别再手动敲命令了!用RKE一键部署Kubernetes高可用集群(附完整YAML配置)
告别手动部署RKE自动化构建Kubernetes高可用集群实战指南为什么选择RKE进行Kubernetes集群部署在云原生技术快速发展的今天Kubernetes已经成为容器编排的事实标准。然而传统的手动部署Kubernetes集群方式存在诸多痛点配置复杂需要手动配置etcd、kube-apiserver、kube-controller-manager等多个组件易出错人工操作容易遗漏关键配置项导致集群不稳定维护困难升级和扩展集群时需要重复大量手动操作一致性差不同环境部署的集群配置难以保持一致RKERancher Kubernetes Engine作为一款轻量级的Kubernetes安装工具通过声明式配置文件解决了这些问题。它能够快速部署生产级Kubernetes集群确保集群配置的一致性和可重复性简化集群的维护和升级流程提供灵活的插件系统扩展集群功能RKE核心架构解析RKE工作原理RKE采用基础设施即代码的理念通过YAML配置文件定义整个Kubernetes集群的拓扑结构和组件配置。其核心工作流程包括节点发现与验证通过SSH连接到目标节点验证Docker环境组件部署根据配置部署Kubernetes各组件容器证书生成自动创建集群所需的TLS证书网络配置安装选择的CNI插件默认Canal附加组件按需部署Ingress Controller、Metrics Server等关键配置文件解析RKE的核心是cluster.yml配置文件主要包含以下关键部分nodes: - address: 192.168.1.101 # 节点IP user: rke-user # SSH用户名 role: # 节点角色 - controlplane - etcd - worker services: etcd: snapshot: true # 启用etcd快照 retention: 7d # 快照保留7天 kube-api: service_cluster_ip_range: 10.43.0.0/16 kube-controller: cluster_cidr: 10.42.0.0/16 kubelet: fail_swap_on: false # 不强制禁用swap network: plugin: canal # 网络插件选择 options: canal_flannel_backend_type: vxlan实战使用RKE部署生产级Kubernetes集群环境准备硬件要求节点类型CPU内存磁盘数量Control Plane2核4GB50GB3Worker4核8GB100GB2软件要求操作系统CentOS 7.7/Ubuntu 18.04Docker18.09.x/19.03.x/20.10.xSSH访问所有节点间SSH互通时间同步确保所有节点时间一致基础环境配置禁用Swap所有节点执行swapoff -a sed -i / swap / s/^\(.*\)$/#\1/g /etc/fstab内核参数调优cat /etc/sysctl.d/k8s.conf EOF net.bridge.bridge-nf-call-ip6tables 1 net.bridge.bridge-nf-call-iptables 1 net.ipv4.ip_forward 1 EOF sysctl --system加载内核模块modprobe br_netfilter modprobe ip_vs modprobe ip_vs_rr modprobe ip_vs_wrr modprobe ip_vs_sh modprobe nf_conntrack_ipv4RKE安装与配置下载RKE二进制文件wget https://github.com/rancher/rke/releases/download/v1.3.2/rke_linux-amd64 chmod x rke_linux-amd64 mv rke_linux-amd64 /usr/local/bin/rke创建cluster.yml配置文件nodes: - address: 192.168.1.101 user: rke-user role: [controlplane, etcd, worker] ssh_key_path: ~/.ssh/id_rsa - address: 192.168.1.102 user: rke-user role: [controlplane, etcd, worker] - address: 192.168.1.103 user: rke-user role: [controlplane, etcd, worker] services: etcd: snapshot: true retention: 168h # 7天 creation: 24h # 每天创建快照 kube-api: service_cluster_ip_range: 10.43.0.0/16 extra_args: audit-log-path: /var/log/kube-audit/audit.log audit-log-maxage: 30 audit-log-maxbackup: 10 audit-log-maxsize: 100 kube-controller: cluster_cidr: 10.42.0.0/16 service_cluster_ip_range: 10.43.0.0/16 kubelet: fail_swap_on: false extra_args: max-pods: 250部署集群rke up --config cluster.yml成功部署后会生成kube_config_cluster.ymlkubectl配置文件cluster.rkestate集群状态文件集群验证检查节点状态kubectl --kubeconfig kube_config_cluster.yml get nodes预期输出NAME STATUS ROLES AGE VERSION 192.168.1.101 Ready controlplane,etcd 5m v1.20.6 192.168.1.102 Ready controlplane,etcd 5m v1.20.6 192.168.1.103 Ready controlplane,etcd 5m v1.20.6检查系统Pod状态kubectl --kubeconfig kube_config_cluster.yml get pods -A关键Pod应全部为Running状态corednscanal/flannelingress-nginxmetrics-server高级配置与优化网络插件选择与调优RKE支持多种CNI网络插件默认使用CanalFlannel Calico策略network: plugin: canal options: canal_flannel_backend_type: vxlan # 或 host-gw canal_iface: eth1 # 指定网络接口 mtu: 1450 # 根据网络环境调整不同后端类型比较类型性能跨子网配置复杂度VXLAN中支持低Host-GW高不支持中IPsec低支持高存储配置配置本地存储类apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer配置NFS存储类helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/ helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \ --set nfs.server192.168.1.200 \ --set nfs.path/data/nfs监控与日志部署Prometheus Stackhelm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install kube-prometheus prometheus-community/kube-prometheus-stack配置集群日志收集# fluent-bit配置示例 config: inputs: | [INPUT] Name tail Path /var/log/containers/*.log Parser docker Tag kube.* Mem_Buf_Limit 5MB Skip_Long_Lines On outputs: | [OUTPUT] Name es Match * Host elasticsearch Port 9200 Logstash_Format On Replace_Dots On Retry_Limit False集群运维最佳实践备份与恢复定期备份etcd数据rke etcd snapshot-save --name pre-upgrade-snapshot \ --config cluster.yml从快照恢复集群rke etcd snapshot-restore --name pre-upgrade-snapshot \ --config cluster.yml升级策略检查可升级版本rke config --list-version --all执行滚动升级rke up --config cluster.yml \ --kubernetes-version v1.21.5升级路径建议当前版本可升级版本v1.18.xv1.19.xv1.19.xv1.20.xv1.20.xv1.21.x节点管理添加新节点编辑cluster.yml添加新节点配置执行rke up更新集群安全下线节点kubectl drain node-name --ignore-daemonsets --delete-emptydir-data常见问题排查部署失败排查检查RKE日志tail -f /var/log/rke.log常见错误及解决方案错误信息可能原因解决方案Failed to connect to nodeSSH配置错误检查SSH密钥和防火墙设置Port already in use端口冲突检查端口占用情况Image pull failed镜像拉取失败配置镜像仓库或手动拉取镜像性能优化建议内核参数调优echo vm.swappiness 0 /etc/sysctl.conf echo net.ipv4.tcp_tw_reuse 1 /etc/sysctl.conf sysctl -pKubelet资源配置kubelet: extra_args: kube-reserved: cpu500m,memory1Gi system-reserved: cpu500m,memory1Gi eviction-hard: memory.available500Mi,nodefs.available10%集成Rancher管理平台Rancher安装准备准备证书kubectl -n cattle-system create secret tls tls-rancher-ingress \ --certtls.crt \ --keytls.key添加Helm仓库helm repo add rancher-stable https://releases.rancher.com/server-charts/stable安装Rancherhelm install rancher rancher-stable/rancher \ --namespace cattle-system \ --set hostnamerancher.example.com \ --set ingress.tls.sourcesecret \ --set replicas3Rancher高可用架构推荐架构----------------- | Load Balancer | ---------------- | -------------------------------- | | | ----------- ----------- ----------- | Rancher | | Rancher | | Rancher | | Server 1 | | Server 2 | | Server 3 | ------------ ------------ ------------安全加固指南集群安全配置启用Pod安全策略services: kube-api: pod_security_policy: true secrets_encryption_config: enabled: true配置网络策略apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all spec: podSelector: {} policyTypes: - Ingress - Egress认证与授权集成LDAP/ADapiVersion: v1 kind: ConfigMap metadata: name: rancher-config namespace: cattle-system data: AD_URL: ldap://ad.example.com AD_DOMAIN: example.com配置RBACapiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: dev-role rules: - apiGroups: [] resources: [pods, pods/log] verbs: [get, list, watch]成本优化策略资源利用率提升配置HPAapiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50使用Spot实例apiVersion: apps/v1 kind: Deployment metadata: name: spot-worker spec: template: spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/spot operator: Exists tolerations: - key: spot operator: Exists effect: NoSchedule未来演进方向混合云管理graph TD A[Rancher中央管理平台] -- B[本地数据中心集群] A -- C[公有云集群] A -- D[边缘计算节点] A -- E[异构基础设施]GitOps实践Argo CD集成helm repo add argo https://argoproj.github.io/argo-helm helm install argocd argo/argo-cd \ --namespace argocd \ --set server.service.typeLoadBalancerFlux CD配置apiVersion: source.toolkit.fluxcd.io/v1beta1 kind: GitRepository metadata: name: myapp namespace: flux-system spec: interval: 1m0s url: https://github.com/myorg/myapp ref: branch: main性能基准测试数据3节点集群性能测试结果测试场景请求数平均延迟吞吐量100 Pods部署-12s20 Pods/s1000次API请求100045ms850 req/s网络吞吐量--5 Gbps存储IOPS-2ms15,000扩展阅读与资源推荐工具k9s终端Kubernetes管理工具LensKubernetes IDEOctantKubernetes可视化面板kube-benchCIS基准测试工具学习资源Kubernetes官方文档Rancher官方文档CNCF培训课程Kubernetes the Hard Way