1.prometheus 配置文件主体# 此片段指定的是prometheus的全局配置 比如采集间隔抓取超时时间等. global: # 抓取间隔 [ scrape_interval: duration | default 1m ] # 抓取超时时间 [ scrape_timeout: duration | default 10s ] # 评估规则间隔 [ evaluation_interval: duration | default 1m ] # 外部一些标签设置 external_labels: [ labelname: labelvalue ... ] # File to which PromQL queries are logged. # Reloading the configuration will reopen the file. [ query_log_file: string ] # 此片段指定报警规则文件 prometheus根据这些规则信息会推送报警信息到alertmanager中。 rule_files: [ - filepath_glob ... ] # 此片段指定抓取配置prometheus的数据采集通过此片段配置。 scrape_configs: [ - scrape_config ... ] # 此片段指定报警配置 这里主要是指定prometheus将报警规则推送到指定的alertmanager实例地址。 alerting: alert_relabel_configs: [ - relabel_config ... ] alertmanagers: [ - alertmanager_config ... ] # 指定后端的存储的写入api地址。 remote_write: [ - remote_write ... ] # 指定后端的存储的读取api地址。 remote_read: [ - remote_read ... ]2.scrape_configs配置详解一个scrape_config 片段指定一组目标和参数 目标就是实例指定采集的端点 参数描述如何采集这些实例 配置文件格式如下# The job name assigned to scraped metrics by default. job_name: job_name # 抓取间隔,默认继承global值。 [ scrape_interval: duration | default global_config.scrape_interval ] # 抓取超时时间,默认继承global值。 [ scrape_timeout: duration | default global_config.scrape_timeout ] # 抓取路径 默认是/metrics [ metrics_path: path | default /metrics ] # honor_labels controls how Prometheus handles conflicts between labels that are # already present in scraped data and labels that Prometheus would attach # server-side (job and instance labels, manually configured target # labels, and labels generated by service discovery implementations). # # If honor_labels is set to true, label conflicts are resolved by keeping label # values from the scraped data and ignoring the conflicting server-side labels. # # If honor_labels is set to false, label conflicts are resolved by renaming # conflicting labels in the scraped data to exported_original-label (for # example exported_instance, exported_job) and then attaching server-side # labels. # # Setting honor_labels to true is useful for use cases such as federation and # scraping the Pushgateway, where all labels specified in the target should be # preserved. # # Note that any globally configured external_labels are unaffected by this # setting. In communication with external systems, they are always applied only # when a time series does not have a given label yet and are ignored otherwise. [ honor_labels: boolean | default false ] # honor_timestamps controls whether Prometheus respects the timestamps present # in scraped data. # # If honor_timestamps is set to true, the timestamps of the metrics exposed # by the target will be used. # # If honor_timestamps is set to false, the timestamps of the metrics exposed # by the target will be ignored. [ honor_timestamps: boolean | default true ] # 指定采集使用的协议http或者https。 [ scheme: scheme | default http ] # 指定url参数。 params: [ string: [string, ...] ] # 指定认证信息。 basic_auth: [ username: string ] [ password: secret ] [ password_file: string ] # 指定token的数值 用户get metrics认证使用 [ bearer_token: secret ] # 指定获取token的文件 用户get metrics认证使用 [ bearer_token_file: /path/to/bearer/token/file ] # 指定获取metrics时需要的tls证书 tls_config: [ tls_config ] # Optional proxy URL. [ proxy_url: string ] # List of Azure service discovery configurations. azure_sd_configs: [ - azure_sd_config ... ] # List of Consul service discovery configurations. consul_sd_configs: [ - consul_sd_config ... ] # List of DNS service discovery configurations. dns_sd_configs: [ - dns_sd_config ... ] # List of EC2 service discovery configurations. ec2_sd_configs: [ - ec2_sd_config ... ] # List of OpenStack service discovery configurations. openstack_sd_configs: [ - openstack_sd_config ... ] # List of file service discovery configurations. file_sd_configs: [ - file_sd_config ... ] # List of GCE service discovery configurations. gce_sd_configs: [ - gce_sd_config ... ] # List of Kubernetes service discovery configurations. kubernetes_sd_configs: [ - kubernetes_sd_config ... ] # List of Marathon service discovery configurations. marathon_sd_configs: [ - marathon_sd_config ... ] # List of AirBnBs Nerve service discovery configurations. nerve_sd_configs: [ - nerve_sd_config ... ] # List of Zookeeper Serverset service discovery configurations. serverset_sd_configs: [ - serverset_sd_config ... ] # List of Triton service discovery configurations. triton_sd_configs: [ - triton_sd_config ... ] # 静态指定服务job static_configs: [ - static_config ... ] # 控制采集哪些数据标签可以删除不必要的标签 relabel_configs: [ - relabel_config ... ] # 添加、编辑或修改指标的标签值或标签格式。 metric_relabel_configs: [ - relabel_config ... ] # Per-scrape limit on number of scraped samples that will be accepted. # If more than this number of samples are present after metric relabelling # the entire scrape will be treated as failed. 0 means no limit. [ sample_limit: int | default 0 ]因为部署在kubernetes环境中所以我只在意基于kubernetes_sd_configs的服务发现和static_configs静态文件的发现2.1 relabel_configsrelable_configss是功能强大的工具就是Relabel可以在Prometheus采集数据之前通过Target实例的Metadata信息动态重新写入Label的值。除此之外我们还能根据Target实例的Metadata信息选择是否采集或者忽略该Target实例。relabel_configs配置格式如下# The source labels select values from existing labels. Their content is concatenated # using the configured separator and matched against the configured regular expression # for the replace, keep, and drop actions. [ source_labels: [ labelname [, ...] ] ] # 默认分隔符 [ separator: string | default ; ] # Label to which the resulting value is written in a replace action. # It is mandatory for replace actions. Regex capture groups are available. [ target_label: labelname ] # Regular expression against which the extracted value is matched. [ regex: regex | default (.*) ] # Modulus to take of the hash of the source label values. [ modulus: uint64 ] # Replacement value against which a regex replace is performed if the # regular expression matches. Regex capture groups are available. [ replacement: string | default $1 ] # Action to perform based on regex matching. [ action: relabel_action | default replace ]其中action主要包括:replacekeepdrophashmodlabelmaplabeldroplabelkeepreplace默认通过regex匹配source_label的值使用replacement来引用表达式匹配的分组keep删除regex与连接不匹配的目标 source_labelsdrop删除regex与连接匹配的目标 source_labelslabeldrop删除regex匹配的标签labelkeep删除regex不匹配的标签hashmod设置target_label为modulus连接的哈希值source_labelslabelmap匹配regex所有标签名称。然后复制匹配标签的值进行分组replacement分组引用{2},…替代prometheus中的数值都是key:value格式 其中replace、keep、drop都是对value的操作 labelmap、labeldrop、labelkeep都是对key的操作replace用法replace是action的默认值 通过regex匹配source_label的值使用replacement来引用表达式匹配的分组- action: replace regex: ([^:])(?::\d)?;(\d) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__上面的列子中address的值为$1:$2 其中$1是正则表达式([^:])(?::\d)?从address中获取$2是正则表达式(\d)从(\d)中获取 最后的address的数值为192.168.1.1:9100keep用法relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: true上面的例子只要匹配__meta_kubernetes_service_annotation_prometheus_io_probetrue数据就保留 反正source_labels中的值没有匹配regex中的值就丢弃drop用法drop 的使用和keep刚好相反 还是使用keep的例子:relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: true上面的例子只要__meta_kubernetes_service_annotation_prometheus_io_probe这个标签的值为true就丢弃 反之如果__meta_kubernetes_service_annotation_prometheus_io_probe!true的数据就保留labelmap用法labelmap的用法和上面说到replace、keep、drop不同labelmap匹配的是标签名称 而replace、keep、drop匹配的是valuerelabel_configs: - action: labelmap regex: __meta_kubernetes_service_label_(.)上面例子中只要匹配到正则表达式__meta_kubernetes_service_label_(.)的标签 就将标签重写为(.)中的内容 效果如下原标签 __meta_kubernetes_service_label_test111 重写后 test111hashmod用法待续2.1.6 labeldrop用法使用labeldrop则可以对Target标签进行过滤删除符合过滤条件的标签例如relabel_configs: - action: labeldrop regex: __meta_kubernetes_service_label_(.)该配置会使用regex匹配当前target中的所有标签 删除符合规则的标签 反之保留不符合规则的labelkeep用法使用labelkeep则可以对Target标签进行过滤仅保留符合过滤条件的标签例如relabel_configs: - action: labelkeep regex: __meta_kubernetes_service_label_(.)该配置会使用regex匹配当前target中的所有标签 保留符合规则的标签 反之不符合的移除2.2 metric_relabel_configs上面我们说到relabel_config是获取metrics之前对标签的重写 对应的metric_relabel_configs是对获取metrics之后对标签的操作 metric_relabel_configs能够确定我们保存哪些指标删除哪些指标以及这些指标将是什么样子。metric_relabel_configs的配置和relabel_config的配置基本相同 如果需要配置相关参数请参考2.scrape_configs2.2 static_configs主要用途为指定exporter获取metrics数据的目标 可以指定prometheus、 mysql、 nginx等目标scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090此规则主要是用于抓取prometheus自己数据的配置 targets列表中的为prometheus 获取metrics的地址和端口 因为没有指定metrics_path所以使用默认的/metrics中获取数据简单理解就是 prometheus访问 http://localhost:9090/metrics 获取监控数据还可以配置指定exporter中的目的地址 如获取node_exporter的数据scrape_configs: - job_name: node static_configs: - targets: - 10.40.58.153:9100 - 10.40.61.116:9100 - 10.40.58.154:9100简单理解为分别访问 http://10.40.58.153:9100/metrics http://10.40.58.154:9100/metrics http://10.40.61.116:9100/metrics 获取metrics数据2.3 kubernetes_sd_configskubernetes的服务发现可以刮取以下几种数据nodeservicepodendpointsingress通过指定kubernetes_sd_config的模式为endpointsPrometheus会自动从Kubernetes中发现到所有的endpoints节点并作为当前Job监控的Target实例。如下所示kubernetes_sd_configs: - role: endpoints配置实例一该配置是使用kubernetes的发现机制发现kube-apiserversscrape_configs: - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: kubernetes-apiservers kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: default;kubernetes;https source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true上面的刮取配置定义了如下信息job名称为kubernetes-apiservers(job-name: kubernetes-apiservers)获取kubernetes中endpoints的相关信息(role: endpoints)使用https的方式获取信息(scheme: https)target的需要满足default名称空间下service名字为kubernetes并且端口为https__meta_kubernetes_namespacedefault__meta_kubernetes_service_namekubernetes__meta_kubernetes_endpoint_port_namehttps配置实例二该配置是自动发现kubernetes中的endpoints- job_name: kubernetes-service-endpoints kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:])(?::\d)?;(\d) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - source_labels: [__meta_kubernetes_pod_node_name] action: replace target_label: kubernetes_node可以看到relable_configs中的规则很多 具体的内容如下job名称为kubernetes-service-endpoints(job-name: kubernetes-service-endpoints)获取kubernetes中endpoints的相关信息(role: endpoints)使用http的方式获取信息(没有配置使用默认配置http)relabel配置部分annotations中必须存在prometheus.io/scrape: true配置才会被promethues发现__scheme__的值为__meta_kubernetes_service_annotation_prometheus_io_scheme的value 需要满足正则表达式(https?)__metrics_path__的值为__meta_kubernetes_service_annotation_prometheus_io_path的value 满足正则表达式(.)__address__的value替换为IP:port的方式kubernetes_namespace的value replace为__meta_kubernetes_namespace的valuekubernetes_name的value replace为__meta_kubernetes_service_name的valuekubernetes_node的value replace为__meta_kubernetes_pod_node_name的value获取的metrics的信息如下up{appprometheus,app_kubernetes_io_managed_byHelm,chartprometheus-11.3.0,componentnode-exporter,heritageHelm,instance10.40.61.116:9100,jobkubernetes-service-endpoints,kubernetes_nameprometheus-node-exporter,kubernetes_namespacedevops,kubernetes_nodepy-modelo2o08cn-p005.pek3.example.com,releaseprometheus}Relabel用来重写target的标签每个Target可以配置多个Relabel动作按照配置文件顺序应用Target包含一些内置的标签以__开头都可以用于relabel在relabel时未保留内置标签将被删除relabel流程Target[source_label,…] - relabel - Target [target_label,…]Relabel的配置[ source_labels: [ labelname [, ...] ] ] [ separator: string | default ; ] [ target_label: labelname ] [ regex: regex | default (.*) ] [ modulus: uint64 ] [ replacement: string | default $1 ] [ action: relabel_action | default replace ]Relabel的actionACTIONRegex匹配操作对象重要参数描述keep标签值Target源标签、regex丢弃指定源标签的标签值没有匹配到regex的targetDrop标签值Target源标签、regex丢弃指定源标签的标签值匹配到regex的targetlabeldrop标签名LabelRegex丢弃匹配到regex 的标签labelkeep标签名LabelRegex丢弃没有匹配到regex 的标签Replace标签值Label名值源标签、目标标签、替换值、regex值更改标签名、更改标签值、合并标签hashmod无标签名值源标签、hash长度、target标签将多个源标签的值进行hash作为target标签的值labelmap标签名标签名regex、replacementRegex匹配名-replacement用原标签名的部分来替换名replace是缺省action可以不配置action使用labeldrop 和labelkeep Relabel后需要注意保证metricslabels唯一Replacement会用到了正则捕获组需要自行补充相关知识如何查看源标签从prometheus-》status-》service Discovery过滤target使用keep保留标签值匹配regex的targetsscrape_configs: - … - job_name: cephs relabel_configs: - action: keep source_labels: - __address__ regex: ceph01.*relabel结果可以在Prometheus网页的status/ Service Discovery中查看使用drop丢弃匹配regex的targetsscrape_configs: - … - job_name: cephs relabel_configs: - action: drop source_labels: - __address__ regex: ceph01.*删除标签将标签名为job的标签删除scrape_configs: - … - job_name: cephs relabel_configs: - regex: job action: labeldroplabelKeep和labeldrop不操作’__’开头的标签要操作需要先改名修改label名使用replace将scheme标签改名为protocolscrape_configs: - … - job_name: cephs relabel_configs: - source_labels: - __scheme__ target_label: procotol这里可以是多个source_labels只有值匹配到regex才会进行替换使用labelmap将原始标签的一部分转换为target标签这一功能replace无法实现scrape_configs: - … - job_name: sd_file_mysql file_sd_configs: - files: - mysql.yml refresh_interval: 1m relabel_configs: - action: labelmap regex: (.*)(address)(.*) replacement: ${2}修改label值配置k8s服务发现scrape_configs: - … - job_name: sd_k8s_nodes kubernetes_sd_configs: - role: node bearer_token_file: bearer_token tls_config: ca_file: ca.crt namespaces: names: - default api_server: https://master01:6443服务发现完成后默认node的port是10250会无法取得数据同通过relabel修改标签.relabel_configs: - source_labels: - __address__ regex: (.*)\:10250 replacement: ${1}:10255 target_label: __address__多标签合并标签合并可以将多个源标签合并为一个目标标签可以取源标签的值也可以进行hash用户target分组在文件服务发现中将标签filenamemysql.yml 和sd_typefile合并为sd”file;mysql.yml”标签值使用分号连接scrape_configs: - … - job_name: sd_file_mysql file_sd_configs: - files: - mysql.yml refresh_interval: 1m relabel_configs: - source_labels: - sd_type - filename separator: ; target_label: sd将多个标签的值进行hash形成一个target标签只要target标签一致则表示源标签一致可以用来实现prometheus的负载均衡scrape_configs: - … - job_name: sd_file_mysql file_sd_configs: - files: - mysql.yml refresh_interval: 1m relabel_configs: - action: hashmod source_labels: - __scheme__ - __metrics_path__ modulus: 64 target_label: hash_id完整案例以下是一个完整的relabel案例这个案例包括根据标签值过滤target合并标签值并进行正则匹配修改标签名直接添加标签名这个案例说明源标签是可以重复使用的