目 录CONTENT

文章目录

prometheus监控进程及添加告警配置

Seven
2023-05-11 / 0 评论 / 0 点赞 / 1281 阅读 / 11080 字 / 正在检测是否收录...

Prometheus监控进程

process-export主要用来做进程监控,比如某个服务的进程数、消耗了多少CPU、内存等资源。

process-exporter安装

1、下载 process-exporter

GITHUB项目地址
下载地址

2、配置 process-exporter

tar -zxvf process-exporter-0.7.5.linux-amd64.tar.gz
mv process-exporter-0.7.5.linux-amd64 /usr/local/process-exporter

3、添加监控配置文件

vim /usr/local/process-exporter/process_exporter.yaml #存放脚本的地方

process_names:
#  - name: "{{.Comm}}"
#    cmdline:
#    - '.+'

  - name: "{{.Matches}}"
    cmdline:
    - 'nginx' #唯一标识

  - name: "{{.Matches}}"
    cmdline:
    - '/opt/atlassian/confluence/bin/tomcat-juli.jar'

  - name: "{{.Matches}}"
    cmdline:
    - 'mysqld'

  - name: "{{.Matches}}"
    cmdline:
    - 'redis-server'

示例:

cmdline: 所选进程的唯一标识,ps -ef 可以查询到。如果改进程不存在,则不会有该进程的数据采集到。
例如:> ps -ef | grep redis
redis 4287 4127 0 Oct31 ? 00:58:12 redis-server *:6379

{{.Comm}} groupname=”redis-server” exe或者sh文件名称
{{.ExeBase}} groupname=”redis-server *:6379” /
{{.ExeFull}} groupname=”/usr/bin/redis-server *:6379” ps中的进程完成信息
{{.Username}} groupname=”redis” 使用进程所属的用户进行分组
{{.Matches}} groupname=”map[:redis]” 表示配置到关键字“redis”

4、使用systemctl管理process-exporter进程

vim /usr/lib/systemd/system/process_exporter.service
 
[Unit]
Description=Prometheus exporter for processors metrics, written in Go with pluggable metric collectors.
Documentation=https://github.com/ncabatoff/process-exporter
After=network.target
  
[Service]
Type=simple
User=prometheus
WorkingDirectory=/usr/local/process-exporter
ExecStart=/usr/local/process-exporter/process-exporter -config.path=/usr/local/process-exporter/process-exporter.yaml
Restart=on-failure
  
[Install]
WantedBy=multi-user.target

5、配置开机启动process-exporter

systemctl enable process-exporter --now
systemctl status process-exporter

● process_exporter.service - Prometheus exporter for processors metrics, written in Go with pluggable metric collectors.
   Loaded: loaded (/usr/lib/systemd/system/process_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-05-11 09:27:15 CST; 1h 53min ago
     Docs: https://github.com/ncabatoff/process-exporter
 Main PID: 3601 (process-exporte)
    Tasks: 20
   Memory: 9.6M
   CGroup: /system.slice/process_exporter.service
           └─3601 /usr/local/process-exporter/process-exporter -config.path=/usr/local/process-exporter/process-exporter.yaml

May 11 09:27:15 iZ2zegyhr62d4cf14c3v28Z systemd[1]: Started Prometheus exporter for processors metrics, written in Go with pluggable metric collectors..
May 11 09:27:15 iZ2zegyhr62d4cf14c3v28Z process-exporter[3601]: 2023/05/11 09:27:15 Reading metrics from /proc based on "/usr/local/process-exporter/p...r.yaml"
May 11 09:27:16 iZ2zegyhr62d4cf14c3v28Z process-exporter[3601]: level=info ts=2023-05-11T01:27:16.137Z caller=tls_config.go:195 msg="TLS is disabled."...2=false
Hint: Some lines were ellipsized, use -l to show in full.

验证监控数据

curl http://localhost:9256/metrics
#相关测试的数据
# HELP http_response_size_bytes The HTTP response sizes in bytes.
# TYPE http_response_size_bytes summary
http_response_size_bytes{handler="prometheus",quantile="0.5"} 2988
http_response_size_bytes{handler="prometheus",quantile="0.9"} 2996
http_response_size_bytes{handler="prometheus",quantile="0.99"} 3006
http_response_size_bytes_sum{handler="prometheus"} 1.34205181e+08
http_response_size_bytes_count{handler="prometheus"} 45188
# HELP namedprocess_namegroup_context_switches_total Context switches
# TYPE namedprocess_namegroup_context_switches_total counter
namedprocess_namegroup_context_switches_total{ctxswitchtype="nonvoluntary",groupname="map[:bladebit]"} 7.7977455e+07
namedprocess_namegroup_context_switches_total{ctxswitchtype="nonvoluntary",groupname="map[:pw_python.py]"} 2.02666e+06
namedprocess_namegroup_context_switches_total{ctxswitchtype="voluntary",groupname="map[:bladebit]"} 3.335109e+06
namedprocess_namegroup_context_switches_total{ctxswitchtype="voluntary",groupname="map[:pw_python.py]"} 8.22652233e+08
# HELP namedprocess_namegroup_cpu_system_seconds_total Cpu system usage in seconds
# TYPE namedprocess_namegroup_cpu_system_seconds_total counter
namedprocess_namegroup_cpu_system_seconds_total{groupname="map[:bladebit]"} 94275.01000000017
namedprocess_namegroup_cpu_system_seconds_total{groupname="map[:pw_python.py]"} 64818.93000000004
# HELP namedprocess_namegroup_cpu_user_seconds_total Cpu user usage in seconds
# TYPE namedprocess_namegroup_cpu_user_seconds_total counter
namedprocess_namegroup_cpu_user_seconds_total{groupname="map[:bladebit]"} 2.42621264299998e+07
namedprocess_namegroup_cpu_user_seconds_total{groupname="map[:pw_python.py]"} 85.29000000000613
# HELP namedprocess_namegroup_major_page_faults_total Major page faults
# TYPE namedprocess_namegroup_major_page_faults_total counter
namedprocess_namegroup_major_page_faults_total{groupname="map[:bladebit]"} 18261
namedprocess_namegroup_major_page_faults_total{groupname="map[:pw_python.py]"} 1236
# HELP namedprocess_namegroup_memory_bytes number of bytes of memory in use
# TYPE namedprocess_namegroup_memory_bytes gauge
namedprocess_namegroup_memory_bytes{groupname="map[:bladebit]",memtype="resident"} 4.46810939392e+11
namedprocess_namegroup_memory_bytes{groupname="map[:bladebit]",memtype="swapped"} 0
namedprocess_namegroup_memory_bytes{groupname="map[:bladebit]",memtype="virtual"} 4.47847292928e+11
namedprocess_namegroup_memory_bytes{groupname="map[:pw_python.py]",memtype="resident"} 1.2959744e+07
namedprocess_namegroup_memory_bytes{groupname="map[:pw_python.py]",memtype="swapped"} 0
namedprocess_namegroup_memory_bytes{groupname="map[:pw_python.py]",memtype="virtual"} 2.4733696e+08

prometheus 配置添加监控

  - job_name: "process_exporter"
    static_configs:
      - targets: ["172.17.224.16:9256"]

配置alertmanagers告警rules

之前请先配置好alertmanagers告警 不会点这里

1、prometheus配置文件里确定rules目录:

image-1683775997593

2、新建告警规则:

vim ./rules/process_exporter.yml
groups:
- name: 进程监控
  rules:
  - alert: redis进程退出告警
    expr: namedprocess_namegroup_num_procs{groupname="map[:redis-server]"} < 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}: redis进程退出"
      description: "{{ $labels.instance }}: redis服务挂掉,已经超过1m,请尽快查看。"
      value: "{{ $value }}"

  - alert: mysql进程退出告警
    expr: namedprocess_namegroup_num_procs{groupname="map[:mysqld]"} < 1
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}: mysql进程退出"
      description: "{{ $labels.instance }}: mysql服务挂掉,已经超过30S,请尽快查看。"
      value: "{{ $value }}"

  - alert: confluence后台退出告警
    expr: namedprocess_namegroup_num_procs{groupname="map[:/opt/atlassian/confluence/bin/tomcat-juli.jar]"} < 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}: confluence退出"
      description: "{{ $labels.instance }}: confluence挂掉,已经超过1m,请尽快查看。"
      value: "{{ $value }}"

重启prometheus服务

systemctl restart prometheus

grafana出图

process-exporter对应dashboard 的ID为:249

效果如下
image-1683777139402

0

评论区