Skip to main content

Prometheus with Grafana from BASIC to ADVANCE Level: Complete Prometheus Guide to Master DevOps Infra Monitoring

 

Introduction

Monitoring is the backbone of modern DevOps and SRE (Site Reliability Engineering) practices. Without proper monitoring, even the most robust infrastructure can fail silently. This is where Prometheus and Grafana come in — one collects and processes metrics, the other visualizes them beautifully. Together, they form one of the most powerful monitoring stacks in the DevOps ecosystem.

In this complete guide, we’ll cover everything about Prometheus and Grafana, from basic setup to advanced scaling, and how you can use them to master infrastructure monitoring like a pro.


Getting Started with Prometheus

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. Originally developed at SoundCloud, it’s now a CNCF (Cloud Native Computing Foundation) project.

Core Features of Prometheus

  • Multi-dimensional data model

  • Powerful query language (PromQL)

  • Pull-based metric collection

  • Integrated alerting with Alertmanager

  • Easy integration with Grafana

Prometheus Architecture

The main components are:

  • Prometheus server (scrapes and stores metrics)

  • Exporters (expose application/system metrics)

  • Alertmanager (manages alerts)

  • Grafana (for visualization)


Installing and Setting Up Prometheus

Install Prometheus on Linux

wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz tar xvf prometheus-2.45.0.linux-amd64.tar.gz cd prometheus-2.45.0.linux-amd64 ./prometheus --config.file=prometheus.yml

Configuring prometheus.yml

A simple configuration might look like:

scrape_configs: - job_name: 'node_exporter' static_configs: - targets: ['localhost:9100']

Understanding Prometheus Data Model

Prometheus stores time-series metrics identified by:

  • Metric name

  • Labels (key-value pairs)

Metric types:

  • Counter: Only increases (e.g., requests served)

  • Gauge: Goes up or down (e.g., CPU usage)

  • Histogram: Observes values in buckets (e.g., request latency)

  • Summary: Similar to histogram, with quantiles


PromQL – Prometheus Query Language

PromQL is the heart of Prometheus.

Basic Queries

  • up → Check if targets are alive

  • node_cpu_seconds_total → CPU usage

Advanced Queries

  • CPU usage %:

    100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
  • Error rate:

    rate(http_requests_total{status=~"5.."}[1m])

Exporters in Prometheus

Exporters are small programs that expose metrics.

Popular Exporters

  • Node Exporter: System metrics (CPU, memory, disk)

  • Blackbox Exporter: Endpoint probing (HTTP, DNS, TCP)

  • Database Exporters: MySQL, PostgreSQL, Redis

  • Custom Exporters: Write your own with Python/Go


Alerting with Prometheus and Alertmanager

Prometheus alerts are defined in rules, while Alertmanager handles notifications.

Example Alert Rule

groups: - name: example rules: - alert: HighCPUUsage expr: node_cpu_seconds_total > 0.8 for: 5m labels: severity: warning annotations: description: "CPU usage high on {{ $labels.instance }}"

Getting Started with Grafana

Grafana is an open-source visualization tool that integrates seamlessly with Prometheus.

Connect Grafana to Prometheus

  1. Go to Settings → Data Sources

  2. Select Prometheus

  3. Add the Prometheus URL (e.g., http://localhost:9090)


Visualizing Metrics with Grafana Dashboards

Grafana allows you to create beautiful, interactive dashboards.

  • Panels: Graph, Gauge, Table, Heatmap

  • Variables: Create dynamic dashboards

  • Community Dashboards: Import from Grafana Labs


Advanced Grafana Features

  • Annotations: Mark important events on graphs

  • Alerting: Built-in alert system

  • Plugins: Extend Grafana functionality

  • Loki + Tempo: Logs and Traces in Grafana


Scaling Prometheus

Prometheus has scaling challenges, especially with long-term storage.

Solutions

  • Federation: Hierarchical Prometheus servers

  • Remote Write/Read: Send data to external storage

  • Thanos & Cortex: Long-term, scalable solutions


Prometheus in Kubernetes

Kubernetes and Prometheus are a natural fit.

  • kube-prometheus stack: Preconfigured setup

  • Service Discovery: Automatically discovers pods/nodes

  • Grafana Dashboards: Prebuilt dashboards for Kubernetes


CI/CD and Automation

Prometheus fits perfectly into CI/CD pipelines.

  • Jenkins/GitHub Actions: Monitor build pipelines

  • Ansible/Terraform: Automate Prometheus deployments

  • Continuous Monitoring: Integrate into DevOps lifecycle


Best Practices

  • Keep dashboards clean and minimal

  • Use alert thresholds wisely (avoid alert fatigue)

  • Secure Grafana with auth & SSL

  • Limit high-cardinality metrics


Challenges and Limitations

  • High cardinality can overload Prometheus

  • Scaling requires federation or external solutions

  • Retention policies may drop old data


Future of Prometheus and Grafana

With the rise of observability, monitoring is evolving:

  • AI/ML-based anomaly detection

  • Unified metrics, logs, and traces

  • Cloud-native observability platforms


Conclusion

Prometheus and Grafana together provide a complete monitoring solution for modern DevOps teams. From basic installation to advanced scaling with Thanos, they offer unmatched flexibility, reliability, and power. Mastering them means mastering infrastructure observability, a skill every DevOps engineer needs today.


FAQs

Q1. Is Prometheus better than other monitoring tools?
Yes, especially for cloud-native and containerized environments.

Q2. Can Grafana work without Prometheus?
Yes, Grafana supports multiple data sources like InfluxDB, Elasticsearch, and Loki.

Q3. What is the difference between PromQL and SQL?
PromQL is designed for time-series data, while SQL is for relational databases.

Q4. How do I monitor Kubernetes with Prometheus?
Use the kube-prometheus stack, which comes with Prometheus, Alertmanager, and Grafana preconfigured.

Q5. Can Prometheus store metrics long-term?
Not by default. You need Thanos or Cortex for long-term storage.

Comments

Popular posts from this blog

Laravel 10 — Build News Portal and Magazine Website (2023)

The digital landscape is ever-evolving, and in 2023, Laravel 10 will emerge as a powerhouse for web development . This article delves into the process of creating a cutting-edge News Portal and Magazine Website using Laravel 10. Let’s embark on this journey, exploring the intricacies of Laravel and the nuances of building a website tailored for news consumption. I. Introduction A. Overview of Laravel 10 Laravel 10 , the latest iteration of the popular PHP framework, brings forth a myriad of features and improvements. From enhanced performance to advanced security measures, Laravel 10 provides developers with a robust platform for crafting dynamic and scalable websites. B. Significance of building a News Portal and Magazine Website in 2023 In an era where information is king, establishing an online presence for news and magazines is more crucial than ever. With the digital audience constantly seeking up-to-the-minute updates, a well-crafted News Portal and Magazine Website beco...

Laravel 10 — Build News Portal and Magazine Website (2023)

Learn how to create a stunning news portal and magazine website in 2023 with Laravel 10 . Follow this comprehensive guide for expert insights, step-by-step instructions, and creative tips. Introduction In the dynamic world of online media, a powerful content management system is the backbone of any successful news portal or magazine website. Laravel 10, the latest iteration of this exceptional PHP framework, offers a robust platform to build your digital empire. In this article, we will dive deep into the world of Laravel 10 , exploring how to create a news portal and magazine website that stands out in 2023. Laravel 10 — Build News Portal and Magazine Website (2023) News websites are constantly evolving, and Laravel 10 empowers you with the tools and features you need to stay ahead of the game. Let’s embark on this journey and uncover the secrets of building a successful news portal and magazine website in the digital age. Understanding Laravel 10 Laravel 10 , the most recent vers...

Full AI Course 2025: ChatGPT, Gemini, Midjourney, Firefly

  Full AI Course 2025: ChatGPT, Gemini, Midjourney, Firefly Introduction Welcome to the Future of AI Learning 2025 isn’t just another year. It’s the year AI goes mainstream. From intelligent chatbots to generative art, artificial intelligence is no longer a futuristic dream — it’s in your browser, your design tools, your search engine, and even your daily workflows. And guess what? You can master it all. Why 2025 Is the Best Time to Learn AI AI tools have become insanely user-friendly. You no longer need a PhD in computer science to build intelligent applications. With platforms like ChatGPT, Gemini, Midjourney, and Firefly leading the way, learning AI has become as simple as using a Google search or designing a poster in Canva. Understanding Artificial Intelligence Today What Is AI, Really? AI stands for Artificial Intelligence , the ability of machines to mimic human intelligence. Whether it's understanding language, recognizing images, or making decisions — AI is behind many of...