Prometheus with Grafana from BASIC to ADVANCE Level: Complete Prometheus Guide to Master DevOps Infra Monitoring

Introduction

Monitoring is the backbone of modern DevOps and SRE (Site Reliability Engineering) practices. Without proper monitoring, even the most robust infrastructure can fail silently. This is where Prometheus and Grafana come in — one collects and processes metrics, the other visualizes them beautifully. Together, they form one of the most powerful monitoring stacks in the DevOps ecosystem.

In this complete guide, we’ll cover everything about Prometheus and Grafana, from basic setup to advanced scaling, and how you can use them to master infrastructure monitoring like a pro.

Getting Started with Prometheus

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. Originally developed at SoundCloud, it’s now a CNCF (Cloud Native Computing Foundation) project.

Core Features of Prometheus

Multi-dimensional data model
Powerful query language (PromQL)
Pull-based metric collection
Integrated alerting with Alertmanager
Easy integration with Grafana

Prometheus Architecture

The main components are:

Prometheus server (scrapes and stores metrics)
Exporters (expose application/system metrics)
Alertmanager (manages alerts)
Grafana (for visualization)

Installing and Setting Up Prometheus

Install Prometheus on Linux


wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-2.45.0.linux-amd64.tar.gz
cd prometheus-2.45.0.linux-amd64
./prometheus --config.file=prometheus.yml

Configuring prometheus.yml

A simple configuration might look like:


scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']

Understanding Prometheus Data Model

Prometheus stores time-series metrics identified by:

Metric name
Labels (key-value pairs)

Metric types:

Counter: Only increases (e.g., requests served)
Gauge: Goes up or down (e.g., CPU usage)
Histogram: Observes values in buckets (e.g., request latency)
Summary: Similar to histogram, with quantiles

PromQL – Prometheus Query Language

PromQL is the heart of Prometheus.

Basic Queries

up → Check if targets are alive
node_cpu_seconds_total → CPU usage

Advanced Queries

CPU usage %:


100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Error rate:


rate(http_requests_total{status=~"5.."}[1m])

Exporters in Prometheus

Exporters are small programs that expose metrics.

Popular Exporters

Node Exporter: System metrics (CPU, memory, disk)
Blackbox Exporter: Endpoint probing (HTTP, DNS, TCP)
Database Exporters: MySQL, PostgreSQL, Redis
Custom Exporters: Write your own with Python/Go

Alerting with Prometheus and Alertmanager

Prometheus alerts are defined in rules, while Alertmanager handles notifications.

Example Alert Rule


groups:
- name: example
  rules:
  - alert: HighCPUUsage
    expr: node_cpu_seconds_total > 0.8
    for: 5m
    labels:
      severity: warning
    annotations:
      description: "CPU usage high on {{ $labels.instance }}"

Getting Started with Grafana

Grafana is an open-source visualization tool that integrates seamlessly with Prometheus.

Connect Grafana to Prometheus

Go to Settings → Data Sources
Select Prometheus
Add the Prometheus URL (e.g., http://localhost:9090)

Visualizing Metrics with Grafana Dashboards

Grafana allows you to create beautiful, interactive dashboards.

Panels: Graph, Gauge, Table, Heatmap
Variables: Create dynamic dashboards
Community Dashboards: Import from Grafana Labs

Advanced Grafana Features

Annotations: Mark important events on graphs
Alerting: Built-in alert system
Plugins: Extend Grafana functionality
Loki + Tempo: Logs and Traces in Grafana

Scaling Prometheus

Prometheus has scaling challenges, especially with long-term storage.

Solutions

Federation: Hierarchical Prometheus servers
Remote Write/Read: Send data to external storage
Thanos & Cortex: Long-term, scalable solutions

Prometheus in Kubernetes

Kubernetes and Prometheus are a natural fit.

kube-prometheus stack: Preconfigured setup
Service Discovery: Automatically discovers pods/nodes
Grafana Dashboards: Prebuilt dashboards for Kubernetes

CI/CD and Automation

Prometheus fits perfectly into CI/CD pipelines.

Jenkins/GitHub Actions: Monitor build pipelines
Ansible/Terraform: Automate Prometheus deployments
Continuous Monitoring: Integrate into DevOps lifecycle

Best Practices

Keep dashboards clean and minimal
Use alert thresholds wisely (avoid alert fatigue)
Secure Grafana with auth & SSL
Limit high-cardinality metrics

Challenges and Limitations

High cardinality can overload Prometheus
Scaling requires federation or external solutions
Retention policies may drop old data

Future of Prometheus and Grafana

With the rise of observability, monitoring is evolving:

AI/ML-based anomaly detection
Unified metrics, logs, and traces
Cloud-native observability platforms

Conclusion

Prometheus and Grafana together provide a complete monitoring solution for modern DevOps teams. From basic installation to advanced scaling with Thanos, they offer unmatched flexibility, reliability, and power. Mastering them means mastering infrastructure observability, a skill every DevOps engineer needs today.

FAQs

Q1. Is Prometheus better than other monitoring tools?
Yes, especially for cloud-native and containerized environments.

Q2. Can Grafana work without Prometheus?
Yes, Grafana supports multiple data sources like InfluxDB, Elasticsearch, and Loki.

Q3. What is the difference between PromQL and SQL?
PromQL is designed for time-series data, while SQL is for relational databases.

Q4. How do I monitor Kubernetes with Prometheus?
Use the kube-prometheus stack, which comes with Prometheus, Alertmanager, and Grafana preconfigured.

Q5. Can Prometheus store metrics long-term?
Not by default. You need Thanos or Cortex for long-term storage.

Prabhat Korshub Blogs

Search This Blog