Monitoring Application Metrics Using Prometheus and Grafana

Mar 11, 2024

Published byJin

Monitoring Application Metrics Using Prometheus and Grafana

Metrics Monitoring in Spring with Prometheus and Grafana

Prometheus is an open-source monitoring tool widely used to collect metrics from various services and systems. Grafana is a visualization platform that displays these metrics through flexible and expressive dashboards. When used together, they offer real-time visibility into the state and performance of running systems, making it easier to interpret operational conditions as they evolve.

In this post, we will set up a monitoring environment for a Spring Boot application using Prometheus and Grafana.

1. Collecting Metrics in Spring

Spring Boot Actuator and Micrometer provide robust support for monitoring the performance and health of Spring Boot applications. Actuator exposes runtime metrics and operational data, giving visibility into the internal state of the application. Micrometer serves as a metrics facade, allowing seamless integration with various monitoring systems.

1-1. Configuring Spring Boot Actuator

Spring Boot Actuator provides built-in support for exposing various operational metrics and runtime information about a Spring Boot application. It enables monitoring of system health, environment configurations, logging, HTTP traces, and more through HTTP endpoints or JMX beans. These metrics offer valuable insights for identifying issues and analyzing system behavior in production.

To enable Actuator, add the following dependency to your build.gradle:

build.gradle

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-actuator'
}

Then, expose the necessary endpoints by updating your application.yml. To allow all endpoints during development:

application.yml

management:
  endpoints:
    web:
      exposure:
        include: '*'

This configuration is useful during local development but should be restricted in production for security reasons.

After the application starts, a GET request to /actuator/metrics will return a list of available metrics collected by the application.

metrics-monitoring-in-spring-with-prometheus-and-grafana_00.png

The JSON response includes a wide range of metric names related to the application and system:

/actuator/metrics

{
  "names": [
    "application.ready.time",
    "application.started.time",
    "cache.gets",
    "cache.lock.duration",
    "cache.puts",
    "cache.removals",
    "disk.free",
    "disk.total",
    "executor.active",
    "executor.completed",
    "executor.pool.core",
    "executor.pool.max",
    "executor.pool.size",
    "executor.queue.remaining",
    "executor.queued",
    "hikaricp.connections",
    "hikaricp.connections.acquire",
    "hikaricp.connections.active",
    "hikaricp.connections.creation",
    "hikaricp.connections.idle",
    "hikaricp.connections.max",
    "hikaricp.connections.min",
    "hikaricp.connections.pending",
    "hikaricp.connections.timeout",
    "hikaricp.connections.usage",
    "http.server.requests",
    "http.server.requests.active",
    "jdbc.connections.active",
    "jdbc.connections.idle",
    "jdbc.connections.max",
    "jdbc.connections.min",
    "jvm.buffer.count",
    "jvm.buffer.memory.used",
    "jvm.buffer.total.capacity",
    "jvm.classes.loaded",
    "jvm.classes.unloaded",
    "jvm.compilation.time",
    "jvm.gc.live.data.size",
    "jvm.gc.max.data.size",
    "jvm.gc.memory.allocated",
    "jvm.gc.memory.promoted",
    "jvm.gc.overhead",
    "jvm.gc.pause",
    "jvm.info",
    "jvm.memory.committed",
    "jvm.memory.max",
    "jvm.memory.usage.after.gc",
    "jvm.memory.used",
    "jvm.threads.daemon",
    "jvm.threads.live",
    "jvm.threads.peak",
    "jvm.threads.started",
    "jvm.threads.states",
    "logback.events",
    "process.cpu.usage",
    "process.files.max",
    "process.files.open",
    "process.start.time",
    "process.uptime",
    "spring.data.repository.invocations",
    "spring.security.authorizations",
    "spring.security.authorizations.active",
    "spring.security.filterchains",
    "spring.security.filterchains.AuthFilter.after",
    "spring.security.filterchains.AuthFilter.before",
    "spring.security.filterchains.access.exceptions.after",
    "spring.security.filterchains.access.exceptions.before",
    "spring.security.filterchains.active",
    "spring.security.filterchains.authentication.anonymous.after",
    "spring.security.filterchains.authentication.anonymous.before",
    "spring.security.filterchains.authorization.after",
    "spring.security.filterchains.authorization.before",
    "spring.security.filterchains.context.async.after",
    "spring.security.filterchains.context.async.before",
    "spring.security.filterchains.context.holder.after",
    "spring.security.filterchains.context.holder.before",
    "spring.security.filterchains.context.servlet.after",
    "spring.security.filterchains.context.servlet.before",
    "spring.security.filterchains.cors.after",
    "spring.security.filterchains.cors.before",
    "spring.security.filterchains.header.after",
    "spring.security.filterchains.header.before",
    "spring.security.filterchains.logout.after",
    "spring.security.filterchains.logout.before",
    "spring.security.filterchains.requestcache.after",
    "spring.security.filterchains.requestcache.before",
    "spring.security.filterchains.session.management.after",
    "spring.security.filterchains.session.management.before",
    "spring.security.filterchains.session.url-encoding.after",
    "spring.security.filterchains.session.url-encoding.before",
    "spring.security.http.secured.requests",
    "spring.security.http.secured.requests.active",
    "system.cpu.count",
    "system.cpu.usage",
    "system.load.average.1m",
    "tomcat.cache.access",
    "tomcat.cache.hit",
    "tomcat.connections.config.max",
    "tomcat.connections.current",
    "tomcat.connections.keepalive.current",
    "tomcat.global.error",
    "tomcat.global.received",
    "tomcat.global.request",
    "tomcat.global.request.max",
    "tomcat.global.sent",
    "tomcat.servlet.error",
    "tomcat.servlet.request",
    "tomcat.servlet.request.max",
    "tomcat.sessions.active.current",
    "tomcat.sessions.active.max",
    "tomcat.sessions.alive.max",
    "tomcat.sessions.created",
    "tomcat.sessions.expired",
    "tomcat.sessions.rejected",
    "tomcat.threads.busy",
    "tomcat.threads.config.max",
    "tomcat.threads.current"
  ]
}

The following is a categorized list of frequently used metrics along with their descriptions:

HikariCP: Database connection pool
- hikaricp.connections: Total number of active and idle connections in the pool.
- hikaricp.connections.acquire: Time taken to acquire a connection from the pool.
- hikaricp.connections.active: Number of connections currently in use.
- hikaricp.connections.idle: Number of connections currently idle in the pool.
- hikaricp.connections.max: Maximum number of connections allowed in the pool.
- hikaricp.connections.min: Minimum number of connections to maintain.
- hikaricp.connections.pending: Number of threads waiting for a connection.
- hikaricp.connections.timeout: Number of times acquiring a connection timed out.
- hikaricp.connections.usage: Time spent using each connection before returning it to the pool.
JVM: Java Virtual Machine
- jvm.buffer.count: Number of buffer instances in use.
- jvm.buffer.memory.used: Total memory currently used by buffers.
- jvm.buffer.total.capacity: Total capacity available across all buffers.
- jvm.classes.loaded: Number of classes currently loaded into memory.
- jvm.classes.unloaded: Number of classes unloaded from memory.
- jvm.compilation.time: Total time spent by the JIT compiler.
- jvm.gc.live.data.size: Size of live objects after the last GC cycle.
- jvm.gc.max.data.size: Maximum size of memory that GC can reclaim.
- jvm.memory.committed: Memory guaranteed to be available to the JVM.
- jvm.memory.max: Maximum memory that can be used by the JVM.
- jvm.memory.used: Actual memory currently in use by the JVM.
- jvm.threads.daemon: Number of daemon threads.
- jvm.threads.live: Total number of live threads.
- jvm.threads.peak: Peak number of live threads since application start.
- jvm.threads.states: Number of threads in each state (e.g., RUNNABLE, BLOCKED, WAITING).
HTTP: Web server request handling
- http.server.requests: Count of all HTTP requests received.
- http.server.requests.active: Number of requests currently being processed.
Tomcat: Embedded web server
- tomcat.connections.current: Current number of open connections.
- tomcat.global.request: Total number of requests handled globally.
- tomcat.global.request.max: Maximum time spent handling a single request.
- tomcat.sessions.active.current: Number of currently active HTTP sessions.
- tomcat.sessions.active.max: Highest number of concurrent active sessions.
- tomcat.sessions.created: Total number of sessions created since startup.
- tomcat.sessions.expired: Number of sessions that have expired.
- tomcat.threads.busy: Number of threads actively handling requests.
- tomcat.threads.config.max: Configured maximum number of request-handling threads.
- tomcat.threads.current: Current total number of threads in the Tomcat thread pool.

You can append a specific metric name to the /actuator/metrics path to query detailed information about it.

For example, to inspect the amount of heap memory retained after the most recent garbage collection, request the following endpoint:

GET /actuator/metrics/jvm.gc.live.data.size HTTP/1.1
Host: localhost:8080
Connection: close
User-Agent: RapidAPI/4.2.8 (Macintosh; OS X/15.4.0) GCDHTTPRequest

The response would be:

{
  "name": "jvm.gc.live.data.size",
  "description": "Size of long-lived heap memory pool after reclamation",
  "baseUnit": "bytes",
  "measurements": [
    {
      "statistic": "VALUE",
      "value": 0
    }
  ],
  "availableTags": []
}

Metrics Monitoring in Spring with Prometheus and Grafana

By enabling Spring Boot Actuator, detailed runtime metrics can be accessed easily and integrated into a monitoring pipeline. These metrics are essential for observing system behavior in real time and reacting to performance trends or anomalies.

1-2. Integrating Micrometer

Micrometer is often described as the SLF4J of application metrics. It provides a standardized facade for collecting and publishing metrics to a wide range of monitoring systems such as Prometheus, Datadog, Elastic, New Relic, and Graphite.

Micrometer is tightly integrated with Spring Boot and can be easily added using Spring Initializr.

metrics-monitoring-in-spring-with-prometheus-and-grafana_03.png

To use Micrometer with Prometheus, begin by adding the required dependency to your project:

build.gradle

dependencies {
    runtimeOnly 'io.micrometer:micrometer-registry-prometheus'
}

Then, enable Prometheus integration in application.yml:

application.yml

management:
  endpoints:
    web:
      exposure:
        include: 'prometheus'
  prometheus:
    metrics:
      export:
        enabled: true # Enabled by default. Set to false to disable.

This configuration allows Micrometer to expose metrics in a Prometheus-compatible format via the /actuator/prometheus endpoint. Prometheus servers can scrape this endpoint periodically to collect application metrics.

When accessing /actuator/prometheus, the same metrics provided by Spring Boot Actuator will now be formatted for Prometheus:

/actuator/prometheus

# HELP tomcat_servlet_request_seconds  
# TYPE tomcat_servlet_request_seconds summary
tomcat_servlet_request_seconds_count{name="dispatcherServlet",} 3.0
tomcat_servlet_request_seconds_sum{name="dispatcherServlet",} 0.118
# HELP spring_security_filterchains_authorization_before_total  
# TYPE spring_security_filterchains_authorization_before_total counter
spring_security_filterchains_authorization_before_total{security_security_reached_filter_section="before",spring_security_filterchain_position="0",spring_security_filterchain_size="0",spring_security_reached_filter_name="none",} 3.0
# HELP executor_active_threads The approximate number of threads that are actively executing tasks
# TYPE executor_active_threads gauge
executor_active_threads{name="applicationTaskExecutor",} 0.0
# HELP hikaricp_connections_idle Idle connections
# TYPE hikaricp_connections_idle gauge
hikaricp_connections_idle{pool="HikariPool-1",} 10.0
hikaricp_connections_idle{pool="HikariPool-2",} 10.0
# HELP tomcat_cache_hit_total  
# TYPE tomcat_cache_hit_total counter
tomcat_cache_hit_total 0.0
...

At this point, Prometheus is ready to begin collecting data from the Spring Boot application, and Grafana can be configured to visualize that data using the exposed metrics.

2. Pulling Metrics with Prometheus

Prometheus is an open-source monitoring system designed specifically for handling time series data. It is widely adopted due to its efficient data model and strong integration capabilities. Key features include:

Pull-based metric collection:
Prometheus collects metrics by regularly scraping predefined HTTP endpoints. This pull-based approach allows Prometheus to control the schedule and frequency of data collection, offering centralized control and predictable network usage.
Optimized time series database:
Prometheus includes its own time series database, purpose-built for storing and querying high-volume metrics efficiently. It supports fast lookups and real-time analysis of historical trends.
Powerful visualization support:
Prometheus integrates seamlessly with visualization tools such as Grafana, enabling users to build dynamic dashboards and explore complex queries through intuitive interfaces.
Multi-dimensional data model:
Each metric in Prometheus is uniquely identified by a metric name and a set of key-value pairs called labels. This model allows flexible filtering and aggregation of data across various dimensions such as instance, application, or region.
PromQL query language:
Prometheus offers PromQL, a dedicated query language for selecting and aggregating time series data. PromQL supports complex expressions and is commonly used to define alerting rules and derive operational insights.
High availability and scalability:
Prometheus supports horizontal scaling and high availability through clustering and federation. It is well-suited for monitoring large, distributed systems with minimal overhead.

Thanks to these capabilities, Prometheus has become a trusted monitoring solution for infrastructures of all sizes, from small-scale applications to complex, globally distributed systems.

Time Series Data
Time series monitoring involves tracking and analyzing data points collected over time. These data points often originate from sources such as sensors, application logs, network traffic, or financial transactions. Monitoring time series data enables users to detect spikes or anomalies, identify patterns, and make predictions based on historical trends. It plays a critical role in performance optimization, predictive maintenance, and informed operational decision-making.

2-1. Installing Prometheus

Installing Prometheus is straightforward and supported across multiple platforms. Download the latest stable release for macOS (darwin) from the official Prometheus website.

Once downloaded, extract the archive using the following command:

$ tar xvfz prometheus-*.tar.gz

Navigate into the extracted directory, which contains the main executable and configuration files:

console

$ cd prometheus-*
$ lt
Permissions Size User       Date Modified Name
drwxr-xr-x@    - catsriding  4 May 03:02   .
drwxr-xr-x@    - catsriding  4 May 03:00  ├──  console_libraries
.rw-r--r--@ 2.9k catsriding  4 May 03:00  │  ├──  menu.lib
.rw-r--r--@ 6.2k catsriding  4 May 03:00  │  └──  prom.lib
drwxr-xr-x@    - catsriding  4 May 03:00  ├──  consoles
.rw-r--r--@  616 catsriding  4 May 03:00  │  ├──  index.html.example
.rw-r--r--@ 2.7k catsriding  4 May 03:00  │  ├──  node-cpu.html
.rw-r--r--@ 3.5k catsriding  4 May 03:00  │  ├──  node-disk.html
.rw-r--r--@ 5.8k catsriding  4 May 03:00  │  ├──  node-overview.html
.rw-r--r--@ 1.5k catsriding  4 May 03:00  │  ├──  node.html
.rw-r--r--@ 4.1k catsriding  4 May 03:00  │  ├──  prometheus-overview.html
.rw-r--r--@ 1.3k catsriding  4 May 03:00  │  └──  prometheus.html
.rw-r--r--@  11k catsriding  4 May 03:00  ├──  LICENSE
.rw-r--r--@ 3.8k catsriding  4 May 03:00  ├──  NOTICE
.rwxr-xr-x@ 139M catsriding  4 May 02:46  ├──  prometheus
.rw-r--r--@  934 catsriding  4 May 03:00  ├──  prometheus.yml
.rwxr-xr-x@ 133M catsriding  4 May 02:46  └──  promtool

prometheus: The Prometheus server binary.
prometheus.yml: The primary configuration file used to define scrape targets and intervals.
promtool: A utility for validating configuration and testing rules.
consoles and console_libraries: Templates and libraries used by the Prometheus web UI.

This directory is self-contained and allows Prometheus to run without installation. You can start the server directly using the prometheus binary, and configure data collection targets in the prometheus.yml file.

2-2. Setting Up Prometheus Metrics

To collect metrics from a Spring Boot application, Prometheus needs to be configured to scrape the endpoint exposed by Micrometer. This requires editing the prometheus.yml configuration file to define the scrape targets.

Here is a basic structure of the prometheus.yml file:

prometheus.yml

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

To monitor a Spring Boot application, append an additional scrape configuration:

prometheus.yml

...
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: 'waves-server'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:8080']

job_name: Used as a label in Prometheus to identify the source.
metrics_path: The path where Prometheus scrapes metrics from the target.
scrape_interval: How frequently Prometheus pulls metrics (e.g., every 5 seconds).
targets: Host and port of the service exposing metrics.

After saving the updated configuration, start the Prometheus server with:

$ ./prometheus --config.file=prometheus.yml

macOS users may encounter a security warning:

"prometheus" cannot be opened because the developer cannot be verified.

metrics-monitoring-in-spring-with-prometheus-and-grafana_06.png

In this case, go to System Settings ▸ Privacy & Security, find the warning message, and allow the app to run.

Once permission is granted, restart the server:

$ ./prometheus --config.file=./prometheus.yml
ts=2024-05-06T06:12:20.401Z caller=main.go:573 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2024-05-06T06:12:20.401Z caller=main.go:617 level=info msg="Starting Prometheus Server" mode=server version="(version=2.52.0-rc.1, branch=HEAD, revision=48e6e169435e934abea41db3d37b80cd5066c10f)"
ts=2024-05-06T06:12:20.401Z caller=main.go:622 level=info build_context="(go=go1.22.2, platform=darwin/amd64, user=root@66286a36f6af, date=20240503-17:44:34, tags=netgo,builtinassets,stringlabels)"
ts=2024-05-06T06:12:20.401Z caller=main.go:623 level=info host_details=(darwin)
ts=2024-05-06T06:12:20.401Z caller=main.go:624 level=info fd_limits="(soft=122880, hard=unlimited)"
ts=2024-05-06T06:12:20.401Z caller=main.go:625 level=info vm_limits="(soft=unlimited, hard=unlimited)"
...

To run Prometheus on a different port, use the --web.listen-address option:

$ ./prometheus --config.file=./prometheus.yml --web.listen-address=:9080

When Prometheus is up and running, open http://localhost:9090 in your browser.

metrics-monitoring-in-spring-with-prometheus-and-grafana_04.png

Navigate to Status ▸ Targets to verify that the Spring Boot application is being scraped successfully.

metrics-monitoring-in-spring-with-prometheus-and-grafana_05_.png

At this point, Prometheus is collecting metrics from your application. For visualizing this data over time, the next step is to integrate with Grafana.

3. Visualizing Metrics with Grafana

Grafana is a powerful open-source analytics and visualization platform that transforms complex time-series data into rich, meaningful dashboards. When combined with Prometheus, it enables developers to monitor system performance and health in real time through intuitive visualizations. This plays a crucial role in identifying issues early and maintaining the stability of system infrastructure.

Grafana’s user-friendly interface accelerates access to important insights and simplifies data exploration. Users can connect multiple data sources and build custom dashboards tailored to specific monitoring needs. These capabilities make Grafana a popular choice across various domains, including network monitoring, server management, and performance analysis.

The integration of Grafana with Prometheus enhances observability and empowers development teams to improve operational efficiency at scale.

3-1. Installing Grafana

To install Grafana, visit the official Grafana download page and choose the appropriate package for your operating system. Grafana supports a wide range of platforms, including Linux, Windows, and macOS. It also offers a Docker image, which provides a quick and convenient installation method. Unlike Prometheus, Grafana focuses purely on visualization, so it does not require complex configuration for initial setup.

In this example, we will use Docker to install and run Grafana.

First, pull the Grafana image from Docker Hub:

$ docker pull grafana/grafana

Then, run the container with the following command. By default, Grafana listens on port 3000:

$ docker run -d \
--name waves-grafana \
-p 3000:3000 \
grafana/grafana

Once the container is up and running, open a web browser and navigate to http://localhost:3000. You should see the Grafana login screen:

metrics-monitoring-in-spring-with-prometheus-and-grafana_08.png

3-2. Designing Grafana Dashboards

When logging into Grafana for the first time, the default credentials are admin for both the username and password. Although it is recommended to change these credentials immediately, our focus here will be on connecting Prometheus and visualizing the collected metrics.

To visualize data, Grafana first needs to connect to an external source of metrics. These sources are referred to as data sources in Grafana, and Prometheus is one of the supported types.

To add Prometheus as a data source, go to the sidebar menu and navigate to Connections ▸ Add new connection. Then, search for Prometheus and select it.

Enter the URL for the Prometheus server. If you're running Grafana in a Docker container on macOS, it's common to use host.docker.internal instead of localhost to refer to the host machine. This behavior may vary depending on the operating system.

After entering the server address, click Save & test to confirm that Grafana can successfully connect to Prometheus. This step verifies that metrics can be retrieved from the source without issue.

Once the data source is connected, you can begin visualizing metrics. Instead of building panels from scratch, consider using prebuilt dashboards available through the Grafana community. These templates help you get started quickly and offer professionally structured visualizations.

You can browse and search for dashboards by category on Grafana Labs, then import one that fits your monitoring use case.

For example, the popular JVM (Micrometer) dashboard is a great starting point for Java applications. Click the Copy ID to clipboard button to copy the dashboard ID.

Return to Grafana and open the Import dashboard page from the Dashboards menu. Paste the copied ID and proceed with the import.

If the ID is valid, you will be prompted to choose a data source. Select the Prometheus data source you added earlier.

After completing the import, the dashboard will display real-time metrics from your application, such as JVM memory usage and request performance.

4. No measure, no control

With Spring Boot Actuator, Micrometer, Prometheus, and Grafana, we have successfully established a foundational metrics monitoring environment for Spring Boot applications. Although not covered in this article, these system metrics are sensitive information and should be protected with appropriate security measures.

For example, on this blog's own server, Prometheus and Grafana are running as Docker containers. Prometheus is configured to communicate only within the EC2 instance, and access to Grafana is controlled via AWS Security Groups.

While building application features is important, designing a resilient infrastructure is equally critical. A robust monitoring environment plays a central role in this. It enables cost-effective infrastructure management and allows for proactive actions, such as scaling the server immediately when performance degrades. This approach lays the foundation for maintaining a stable production system.

Discover more