1080*80 ad

Cloudflare Data Exploration with Python Notebooks using marimo

Harnessing Cloudflare Data: A Practical Guide to Analysis with Python and Marimo

Cloudflare sits at the edge of the internet, processing vast amounts of data about your web traffic, performance, and security threats. While the standard dashboard offers valuable insights, it only scratches the surface. To unlock the full potential of your data, you need to go deeper with custom analysis—and Python is the perfect tool for the job.

This guide will walk you through how to use Python, specifically with an innovative notebook tool called marimo, to query, process, and visualize your Cloudflare data. By the end, you’ll be able to build your own interactive dashboards for deeper security analysis and performance monitoring.

Why Go Beyond the Standard Cloudflare Dashboard?

The default Cloudflare dashboard is excellent for a quick overview, but a custom approach offers significant advantages:

  • Deeper Insights: Ask specific, complex questions that the standard UI can’t answer. Correlate different data points to uncover hidden trends.
  • Custom Visualizations: Build charts and graphs tailored to your specific needs, focusing on the metrics that matter most to your business.
  • Automation: Create scripts that automatically fetch and analyze data, sending alerts or generating reports on a schedule.
  • Integration: Combine Cloudflare data with other data sources, such as your internal application logs or business metrics, for a holistic view.

The Modern Toolkit: Python, Marimo, and the Cloudflare API

To build our custom analysis pipeline, we’ll use a powerful combination of modern tools.

  1. The Cloudflare GraphQL API: This is the heart of our data access. Unlike traditional REST APIs, GraphQL allows you to request exactly the data you need in a single query, which is incredibly efficient. We’ll be using the Analytics Engine API endpoint.
  2. Python: The go-to language for data science, Python has a rich ecosystem of libraries that make data manipulation and visualization straightforward. We’ll rely on requests for API calls, pandas for data structuring, and plotly for charting.
  3. Marimo: This is the game-changer. Marimo is a next-generation reactive Python notebook. Unlike traditional Jupyter notebooks where you must re-run cells in order, marimo understands the relationships between your code and UI elements. When you change a piece of code or adjust a slider, every part of the notebook that depends on it automatically updates. This makes it ideal for building interactive, app-like dashboards.

Step-by-Step: Analyzing Cloudflare Traffic with Python and Marimo

Let’s build a simple dashboard to analyze traffic sources by country and identify potential bot activity.

Step 1: Prerequisites and Setup

First, you need to install the necessary Python libraries. Open your terminal and run:

pip install marimo requests pandas plotly

Next, you need your Cloudflare credentials:

  • Zone ID: The unique identifier for your website in Cloudflare.
  • API Token: A security token to authenticate your requests.

Security Tip: Always treat your API token like a password. Store it securely using environment variables or a secrets management tool. Never hardcode your API token directly into your script, especially if you plan to share it or commit it to a version control system.

Step 2: Querying the Cloudflare API

We’ll start by writing a Python function to query the GraphQL API. This query will fetch the number of requests and the top countries of origin over the last 24 hours.

Create a new marimo notebook by running marimo edit in your terminal. In a cell, add the following code:

import marimo as mo
import requests
import pandas as pd
import os

# --- Configuration ---
# mo.ui.text() creates an interactive text input in your notebook.
# Best practice: Load secrets from environment variables.
API_TOKEN = mo.ui.text(value=os.environ.get("CLOUDFLARE_API_TOKEN", ""), label="Cloudflare API Token", kind="password")
ZONE_ID = mo.ui.text(value=os.environ.get("CLOUDFLARE_ZONE_ID", ""), label="Cloudflare Zone ID")

# --- GraphQL Query ---
# This query asks for the top 10 countries by request count in the last 24 hours.
graphql_query = """
query GetTrafficAnalytics($zoneTag: string, $filter: ZoneAnalyticsFilter) {
  viewer {
    zones(filter: {zoneTag: $zoneTag}) {
      httpRequests1dGroups(limit: 10, filter: $filter, orderBy: [sum_requests_DESC]) {
        sum {
          requests
          bytes
        }
        dimensions {
          clientCountryName
        }
      }
    }
  }
}
"""

# --- API Fetch Function ---
def fetch_cloudflare_data(token, zone_id):
    if not token or not zone_id:
        return None
    headers = {"Authorization": f"Bearer {token}"}
    variables = {
        "zoneTag": zone_id,
        "filter": {"date_geq": "2023-10-26"} # Use a dynamic date for real applications
    }
    response = requests.post(
        "https://api.cloudflare.com/client/v4/graphql",
        json={"query": graphql_query, "variables": variables},
        headers=headers
    )
    response.raise_for_status() # Raise an exception for bad status codes
    return response.json()
Step 3: Processing Data with Pandas

The API returns data in a nested JSON format. We need to parse this and load it into a pandas DataFrame, which is a powerful, table-like structure perfect for analysis.

In a new cell, add the processing logic:

# Fetch the data using the values from the interactive UI elements
api_response = fetch_cloudflare_data(API_TOKEN.value, ZONE_ID.value)

# Process the JSON response into a pandas DataFrame
if api_response and "data" in api_response:
    data_groups = api_response["data"]["viewer"]["zones"][0]["httpRequests1dGroups"]

    records = [
        {
            "country": item["dimensions"]["clientCountryName"],
            "requests": item["sum"]["requests"],
            "bytes": item["sum"]["bytes"]
        }
        for item in data_groups
    ]
    df = pd.DataFrame(records)
else:
    df = pd.DataFrame() # Create an empty DataFrame if there's no data
Step 4: Creating Interactive Visualizations

Now for the fun part. We’ll use Plotly to create a bar chart and display our DataFrame. Thanks to marimo’s reactivity, if you were to change the API token or add a date filter, this chart would update automatically.

In the final cell, add the visualization code:

import plotly.express as px

# Create an interactive bar chart using Plotly Express
if not df.empty:
    fig = px.bar(
        df,
        x="requests",
        y="country",
        orientation='h',
        title="Top 10 Traffic Sources by Country",
        labels={"requests": "Total Requests", "country": "Country"},
        text='requests'
    )
    fig.update_layout(yaxis={'categoryorder':'total ascending'})
else:
    fig = "Enter valid API credentials to see data."

# Display the UI elements, the DataFrame, and the figure
mo.vstack([
    API_TOKEN,
    ZONE_ID,
    mo.md("### Traffic Data"),
    mo.ui.table(df),
    mo.md("### Traffic Visualization"),
    fig
])

Actionable Insights from Your Custom Dashboard

This simple example can be extended to uncover crucial security and performance insights:

  • Security Analysis: Modify the GraphQL query to analyze firewall events. You can visualize the top attacking IP addresses, most common attack types (like SQL injection or XSS), or countries where threats originate. This helps you fine-tune your firewall rules.
  • Performance Optimization: Query for cache performance data. Create visualizations that show your cache-hit ratio over time or identify the most frequently missed (uncached) resources. This points you directly to assets that need better caching rules to speed up your site.
  • Bot Detection: Filter traffic by botManagementDecision to create a dashboard that separates human traffic from bot traffic. Analyzing the behavior of automated and likely-automated bots can help you identify content scraping or credential stuffing attacks.

By combining the power of the Cloudflare API with the flexibility of Python and the interactivity of marimo, you move from being a passive consumer of data to an active analyst. You can build tailored, responsive, and shareable dashboards that provide a far deeper understanding of what’s happening at the edge of your network.

Source: https://blog.cloudflare.com/marimo-cloudflare-notebooks/

900*80 ad

      1080*80 ad