Alerting Policies (Node Level)

Objective

KubeSphere provides alerting policies for nodes and workloads. This guide demonstrates how you can create alerting policies for nodes in the cluster and configure mail notifications. See Alerting Policy (Workload Level) to learn how to configure alerting policies for workloads.

Prerequisites

Hands-on Lab

Task 1: Create an alerting policy

  1. Log in to the console with one account granted the role platform-admin.

  2. Click Platform in the top left corner and select Clusters Management.

    alerting_policy_node_level_guide

  3. Select a cluster from the list and enter it (If you do not enable the multi-cluster feature, you will directly go to the Overview page).

  4. Navigate to Alerting Policies under Monitoring & Alerting, and click Create.

    alerting_policy_node_level_create

Task 2: Provide basic information

In the dialog that appears, fill in the basic information as follows. Click Next after you finish.

  • Name: a concise and clear name as its unique identifier, such as alert-demo.
  • Alias: to help you distinguish alerting policies better.
  • Description: a brief introduction to the alerting policy.

alerting_policy_node_level_basic_info

Task 3: Select monitoring targets

Select several nodes in the node list or use Node Selector to choose a group of nodes as the monitoring targets. Here a node is selected for the convenience of demonstration. Click Next when you finish.

alerting_policy_node_level_monitoring_target

Note

You can sort nodes in the node list from the drop-down menu through the following three ways: Sort By CPU, Sort By Memory, Sort By Pod Utilization.

Task 4: Add alerting rules

  1. Click Add Rule to begin to create an alerting rule. The rule defines parameters such as metric type, check period, consecutive times, metric threshold and alert level to provide rich configurations. The check period (the second field under Rule) means the time interval between 2 consecutive checks of the metric. For example, 2 minutes/period means the metric is checked every two minutes. The consecutive times (the third field under Rule) means the number of consecutive times that the metric meets the threshold when checked. An alert is only triggered when the actual time is equal to or is greater than the number of consecutive times set in the alerting policy.

    alerting_policy_node_level_alerting_rule

  2. In this example, set those parameters to memory utilization rate, 1 minute/period, 2 consecutive times, > and 50%, and Major Alert in turn. It means KubeSphere checks the memory utilization rate every minute, and a major alert is triggered if it is larger than 50% for 2 consecutive times.

  3. Click to save the rule when you finish and click Next to continue.

Note

You can create node-level alerting policies for the following metrics:

  • CPU: cpu utilization rate, cpu load average 1 minute, cpu load average 5 minutes, cpu load average 15 minutes
  • Memory: memory utilization rate, memory available
  • Disk: inode utilization rate, disk space available, local disk space utilization rate, disk write throughput, disk read throughput, disk read iops, disk write iops
  • Network: network data transmitting rate, network data receiving rate
  • Pod: pod abnormal ratio, pod utilization rate

Task 5: Set notification rules

  1. Effective Notification Time Range is used to set sending time of notification emails, such as 09:00 ~ 19:00. Notification Channel currently only supports Email. You can add email addresses of members to be notified to Notification List.

  2. Customize Repetition Rules defines sending period and retransmission times of notification emails. If alerts have not been resolved, the notification will be sent repeatedly after a certain period of time. Different repetition rules can also be set for different levels of alerts. Since the alert level set in the previous step is Major Alert, select Alert once every 5 miniutes (sending period) in the second field for Major Alert and Resend up to 3 times in the third field (retransmission times). Refer to the following image to set notification rules:

    alerting_policy_node_level_notification_rule

  3. Click Create, and you can see that the alerting policy is successfully created.

Note

Waiting Time for Alerting = Check Period x Consecutive Times. For example, if the check period is 1 minute/period, and the number of consecutive times is 2, you need to wait for 2 minutes before the alerting message appears.

Task 6: View alerting policies

After an alerting policy is successfully created, you can enter its detail information page to view the status, alert rules, monitoring targets, notification rule, alert history, etc. Click More and select Change Status from the drop-down menu to enable or disable this alerting policy.

alerting-policy-node-level-detail-page