Cluster Autoscaler 集群扩缩容实践从节点到成本优化前言哥们别整那些花里胡哨的理论。今天直接上硬菜——我在大厂一线使用 Cluster Autoscaler 实现集群自动扩缩容的真实经验总结。作为一个白天写前端、晚上打鼓的硬核工程师我对成本优化的追求就像对鼓点节奏的把控一样严格。背景最近我们团队的 Kubernetes 集群出现了资源紧张时无法及时扩容、低谷时节点闲置的问题。经过一周的 Cluster Autoscaler 实践我们实现了集群的弹性伸缩成本降低了 40%资源利用率提升了 50%。今天就把这些干货分享给大家。Cluster Autoscaler 基础配置1. AWS 配置问题如何在 AWS 上配置 Cluster Autoscaler。解决方案直接上代码# Cluster Autoscaler 部署 apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system labels: app: cluster-autoscaler spec: replicas: 1 selector: matchLabels: app: cluster-autoscaler template: metadata: labels: app: cluster-autoscaler spec: serviceAccountName: cluster-autoscaler containers: - name: cluster-autoscaler image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0 command: - ./cluster-autoscaler - --cloud-provideraws - --node-group-auto-discoveryasg:tagk8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/cluster-name - --balance-similar-node-groups - --skip-nodes-with-system-podsfalse - --skip-nodes-with-local-storagefalse - --scale-down-delay-after-add10m - --scale-down-unneeded-time10m - --scale-down-utilization-threshold0.5 env: - name: AWS_REGION value: us-east-1 volumeMounts: - name: ssl-certs mountPath: /etc/ssl/certs/ca-certificates.crt readOnly: true volumes: - name: ssl-certs hostPath: path: /etc/ssl/certs/ca-certificates.crt2. 阿里云配置问题如何在阿里云上配置 Cluster Autoscaler。解决方案# 阿里云 Cluster Autoscaler apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: replicas: 1 selector: matchLabels: app: cluster-autoscaler template: metadata: labels: app: cluster-autoscaler spec: serviceAccountName: cluster-autoscaler containers: - name: cluster-autoscaler image: registry.aliyuncs.com/acs/cluster-autoscaler:v1.27.0 command: - ./cluster-autoscaler - --cloud-provideralicloud - --node-group-auto-discoveryasg:tagk8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/cluster-name - --balance-similar-node-groups - --skip-nodes-with-system-podsfalse - --scale-down-delay-after-add10m - --scale-down-unneeded-time10m - --scale-down-utilization-threshold0.5 env: - name: ALICLOUD_REGION_ID value: cn-hangzhou节点组配置1. 多节点组策略问题如何配置不同类型的节点组。解决方案# AWS Auto Scaling Group 标签 # 计算优化型节点组 aws autoscaling create-or-update-tags \ --tags ResourceIdasg-compute-optimized,ResourceTypeauto-scaling-group,Keyk8s.io/cluster-autoscaler/enabled,Valuetrue,PropagateAtLaunchtrue aws autoscaling create-or-update-tags \ --tags ResourceIdasg-compute-optimized,ResourceTypeauto-scaling-group,Keyk8s.io/cluster-autoscaler/node-template/label/node-type,Valuecompute-optimized,PropagateAtLaunchtrue # 内存优化型节点组 aws autoscaling create-or-update-tags \ --tags ResourceIdasg-memory-optimized,ResourceTypeauto-scaling-group,Keyk8s.io/cluster-autoscaler/enabled,Valuetrue,PropagateAtLaunchtrue aws autoscaling create-or-update-tags \ --tags ResourceIdasg-memory-optimized,ResourceTypeauto-scaling-group,Keyk8s.io/cluster-autoscaler/node-template/label/node-type,Valuememory-optimized,PropagateAtLaunchtrue2. 节点模板标签问题如何为节点组配置标签和污点。解决方案# 配置节点模板标签 aws autoscaling create-or-update-tags \ --tags ResourceIdasg-gpu,ResourceTypeauto-scaling-group,Keyk8s.io/cluster-autoscaler/node-template/label/nvidia.com/gpu,Valuetrue,PropagateAtLaunchtrue # 配置节点模板污点 aws autoscaling create-or-update-tags \ --tags ResourceIdasg-gpu,ResourceTypeauto-scaling-group,Keyk8s.io/cluster-autoscaler/node-template/taint/dedicated,Valuenvidia.com/gpu:NoSchedule,PropagateAtLaunchtrue # 配置资源容量 aws autoscaling create-or-update-tags \ --tags ResourceIdasg-gpu,ResourceTypeauto-scaling-group,Keyk8s.io/cluster-autoscaler/node-template/resources/nvidia.com/gpu,Value4,PropagateAtLaunchtrue扩缩容策略1. 扩容策略问题如何优化扩容行为。解决方案# Cluster Autoscaler 扩容配置 apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: template: spec: containers: - name: cluster-autoscaler command: - ./cluster-autoscaler - --cloud-provideraws - --node-group-auto-discoveryasg:tagk8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/cluster-name - --max-node-provision-time15m - --max-nodes-total100 - --cores-total0:1000 - --memory-total0:4000 - --cloud-provider-gce-local-ssd-count1 - --balance-similar-node-groups - --expanderleast-waste2. 缩容策略问题如何优化缩容行为。解决方案# Cluster Autoscaler 缩容配置 apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: template: spec: containers: - name: cluster-autoscaler command: - ./cluster-autoscaler - --cloud-provideraws - --node-group-auto-discoveryasg:tagk8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/cluster-name - --scale-down-enabledtrue - --scale-down-delay-after-add10m - --scale-down-delay-after-delete10s - --scale-down-delay-after-failure3m - --scale-down-unneeded-time10m - --scale-down-unready-time20m - --scale-down-utilization-threshold0.5 - --skip-nodes-with-system-podsfalse - --skip-nodes-with-local-storagefalse - --ignore-daemonsets-utilizationtrue成本优化1. 竞价实例配置问题如何使用 Spot 实例降低成本。解决方案# Spot 实例节点组 apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: template: spec: containers: - name: cluster-autoscaler command: - ./cluster-autoscaler - --cloud-provideraws - --node-group-auto-discoveryasg:tagk8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/cluster-name - --expanderprice - --balance-similar-node-groups - --skip-nodes-with-system-podsfalse2. 优先级扩展器问题如何基于优先级选择节点组。解决方案# 优先级扩展器配置 apiVersion: v1 kind: ConfigMap metadata: name: cluster-autoscaler-priority-expander namespace: kube-system data: priorities: |- 10: - .*spot.* 20: - .*ondemand.*# Cluster Autoscaler 使用优先级扩展器 apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: template: spec: containers: - name: cluster-autoscaler command: - ./cluster-autoscaler - --cloud-provideraws - --node-group-auto-discoveryasg:tagk8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/cluster-name - --expanderpriority - --balance-similar-node-groups监控告警1. Prometheus 监控问题如何监控 Cluster Autoscaler。解决方案# ServiceMonitor apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: cluster-autoscaler namespace: monitoring spec: selector: matchLabels: app: cluster-autoscaler namespaceSelector: matchNames: - kube-system endpoints: - port: http interval: 30s path: /metrics --- # 告警规则 apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: cluster-autoscaler-alerts namespace: monitoring spec: groups: - name: cluster-autoscaler rules: - alert: ClusterAutoscalerNotSafeToEvict expr: cluster_autoscaler_nodes_not_safe_to_evict_count 0 for: 15m labels: severity: warning annotations: summary: Nodes not safe to evict description: There are {{ $value }} nodes that are not safe to evict - alert: ClusterAutoscalerUnschedulablePods expr: cluster_autoscaler_unschedulable_pods_count 0 for: 10m labels: severity: warning annotations: summary: Unschedulable pods detected description: There are {{ $value }} unschedulable pods最佳实践节点组设计按业务类型划分节点组配置合理的节点模板标签使用 taint 隔离特殊节点扩缩容策略设置合理的缩容延迟配置利用率阈值考虑系统 Pod 的影响成本控制使用 Spot 实例降低成本配置优先级扩展器监控节点使用情况监控告警监控扩缩容事件设置异常告警分析成本趋势常见问题与解决方案1. 无法扩容问题Cluster Autoscaler 没有触发扩容。解决方案检查 Pod 是否处于 Pending 状态验证节点组配置查看 Cluster Autoscaler 日志2. 无法缩容问题节点没有按预期缩容。解决方案检查节点利用率查看 Pod 分布情况验证缩容策略配置3. 扩容太慢问题节点扩容时间过长。解决方案优化镜像启动时间使用预置节点调整最大节点配置时间4. 成本过高问题集群成本超出预期。解决方案使用 Spot 实例优化缩容策略配置优先级扩展器