Skip to main content
  1. Posts/

Deploying Thanos for monitoring multiple Openshift / Kubernetes clusters

·1936 words·10 mins
multicluster thanos prometheus kubernetes monitoring
Mattia Forcellese
Author
Mattia Forcellese
DevOps Engineer/Software Architect

Thanos is a highly available Prometheus setup, capable of offloading storage to a deep storage (like S3) for persisting all the metrics from your clusters for an indefinite amount of time, aggregating them and even downsampling them (and much more!). One of Thanos nicest features, the one that we will look into today, is the capability of fetching metrics from other Prometheus instances (or, in the particular case for OpenShift clusters, Thanos instances) thus allowing us to monitor and filter all the fancy metrics that we already had from different clusters, from a single Thanos instance.

Note: This guide will explain some of the pain points of configuring such a Thanos instance for scraping Openshift clusters - but all the reasoning should be perfectly applicable to vanilla Kubernetes clusters too.


Overall architecture
#

The desiderata architecture, from top to bottom: the main Thanos querier instance that will scrape every other cluster (this instance could be placed in an external cluster, as in the picture, but also in one of the monitored clusters). This main thanos can write to an external storage in order not to accumulate all the metrics data on the monitored clusters. Lastly, we have the OCP clusters that we want to scrape, with their Thanos instance and the corresponding prometheus instances. For an overview of the Openshift monitoring architecture, and the separation between user workload and cluster monitoring, please refer to its official documentation.

Let’s try to figure out what we need for achieving this architecture, in small steps. For brevity I will call OCP1 and OCP2 the two Openshift clusters that will be monitored in this example.

  • Configure some common labels for the metrics of each cluster (e.g. cluster name, location, etc)
  • Configure (if not done already) our ingress controller to support gRPC and HTTP/2
  • Create an Ingress/Route to let Main Thanos scrape from OCP1/OCP2 Thanos instances
  • Deploy the Main Thanos and configure it with static stores to scrape from OCP1/OCP2
  • Enjoy!

…Wait, is it this simple?  Actually, no, if you are on Openshift. If you are on a vanilla Kubernetes or you have total control over the Thanos instances, yes, it should be this easy - configure the ingress, let Thanos scrape the other Thanos instances, and you’re done. But if you are on Openshift and want to leverage the pre-existing Thanos deployments (as I did) there are a few other extra steps to take, to overcome some of the limitations of the operator-managed Thanos configurations.

gRPC and HTTP/2
#

Before diving in, a quick note on how Thanos will try to scrape metrics, what ports will it need and the protocol it will use. Thanos will need to contact the configured stores via gRPC. Stores could be anything implementing the Store API, like Prometheus, or another Thanos (they can be stacked). Thanos will call these gRPC API for fetching metadata about the metrics that the store provides, and other infos. Tipically, Thanos will serve gRPC metrics by binding itself on 0.0.0.0:10901 - this is not the case for Openshift, as we will see later. Why don’t we scrape Prometheus directly? It is a bit harder, due to Prometheus being protected by Openshift oauth proxy, and we would need to scrape the two prometheus instances separately (User workload and Cluster) in both the OCP clusters. Obviously if you are on Kubernetes or in any other case, where you are able to connect and scrape directly metrics from prometheus, you could and probably should! As you may already know, gRPC works with HTTP/2 protocol, which is supported by the majority of ingress controllers - whatever you are using should be fine - but requires (in theory, as we will see later) TLS for protocol negotiation (known as ALPN). What you may not know, is that HTTP/2 can work without TLS; this is known as h2c (HTTP/2 over TCP, instead of TLS). Keep this in mind, because if you are on Openshift, you will be in trouble 😵

Cluster configuration
#

Anyway, let’s start with our first cluster, OCP1, and modify the user workload monitoring configuration by adding some labels for identifying the cluster (and de-duplicating data):

oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
apiVersion: v1
data:
  config.yaml: |+
    prometheus:
      externalLabels:
        cluster: ocp1
        region: xxxxxx
        cluster_replica: ocp1    

Then, the cluster monitoring config:

oc -n openshift-monitoring edit configmap cluster-monitoring-config
apiVersion: v1
data:
  config.yaml: |
    prometheusK8s:
      externalLabels:
        cluster: ocp1
        region: xxxxx
        cluster_replica: ocp1    

Next, configure our ingress controller (on Openshift by default it will be HAProxy) for allowing HTTP/2 connections, this can be done by following the official documentation (note: this works on OCP 4.6+) - it should simply be done by annotating the cluster ingress configuration:

oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=true

If you are using another ingress controller, please refer to its specific documentation about HTTP/2 and gRPC services.

Now, set the current project to openshift-monitoring (this is where Thanos, Prometheus and everything handled by OCP monitoring reside) and start by creating our route pointing at the Thanos service exposing the gRPC port. Wait… there is none! As of this OCP monitoring version, Thanos will not expose the gRPC port on the service, and there is no plan for changing this in the imminent future. But this should be no problem, since declaring a port on the container specs of the pod, is mostly for informational purposes -  every port can be reached, even if the container does not expose them, as long as they listen on all interfaces (i.e. 0.0.0.0). Altough this is true, it’s not possible for this Thanos service, unfortunately if you look inside the Thanos pod (for the container thanos-query), you will see that Thanos is started with:

- ...
- '--grpc-address=127.0.0.1:10901'
- ...

This means that Thanos is listening for gRPC queries on port 10901, but will only respond to the local interface (127.0.0.1) - only to the other containers. In fact, as I anticipated before, the Thanos instance managed by Openshift is sitting behind a oauth proxy container, that will be able to speak to the query container on the local interface because they are on the same pod. So, we can’t directly access this - but we still need to create a Service pointing at this port (remember: the service can reference any numbered port, it does not matter if the container port is actually listening or open).

Accessing Thanos gRPC
#

Let’s create it with this manifest:

kind: Service
apiVersion: v1
metadata:
  name: thanos-querier-grpc
  namespace: openshift-monitoring
spec:
  ports:
    - name: grpc
      protocol: TCP
      port: 10901
      targetPort: 10901
  type: ClusterIP
  selector:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-querier
    app.kubernetes.io/name: thanos-query
    app.kubernetes.io/part-of: openshift-monitoring

If you try now, locally, to access this service with kubectl port-forward, oc port-forward, etc… you will notice that the service will respond correctly! (you could even try to query the gRPC service with gRPCurl and see what happens). This is because the kubernetes port-forwarding command works a bit differently than you may expect.

The problem is, as we said before, that any other pod trying to reach for that service, would not be able to communicate (the port is closed!), you can even try that with a running pod with a docker image of gRPCurl.

As a solution, an ugly yet necessary workaround, we have to create another pod, running kubectl, that will do the port-forwarding for us, and then expose this pod via service for letting the Main Thanos scrape the metrics from the OCP1 Thanos. Wait, what? Let’s try to explain it with a diagram:

Note that port-forwarding in general is slow and not very reliable, but we’ll stick with it, because we need to access the “hidden” port of that container.

In order to do this, we need to create a few resources: a Deployment, a Service, and finally the route (also, depending on your RBAC, a service account/role/role binding may be needed):

kind: Deployment
apiVersion: apps/v1
metadata:
  name: thanos-grpc-port-forwarder
  namespace: openshift-monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      name: thanos-grpc-port-forwarder
  template:
    metadata:
      labels:
        name: thanos-grpc-port-forwarder
    spec:
      serviceAccountName: thanos-port-forwarder
      restartPolicy: Always
      containers:
        - name: kubectl
          image: bitnami/kubectl:latest
          command:
            - kubectl
            - port-forward
            - svc/thanos-querier-grpc
            - "--address=0.0.0.0"
            - "10901:10901"
          ports:
            - containerPort: 10901
              protocol: TCP
          imagePullPolicy: IfNotPresent

---
kind: Service
apiVersion: v1
metadata:
  name: thanos-grpc-port-forwarder
  namespace: openshift-monitoring
spec:
  ports:
    - protocol: TCP
      port: 10901
      targetPort: 30901
      appProtocol: h2c
  type: ClusterIP
  selector:
    name: thanos-grpc-port-forwarder

---

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: thanos-port-forwarder
  namespace: openshift-monitoring
rules:
  - verbs:
      - get
      - watch
      - list
    apiGroups:
      - ''
    resources:
      - services
  - verbs:
      - get
      - watch
      - list
    apiGroups:
      - ''
    resources:
      - pods
  - verbs:
      - get
      - watch
      - list
      - create
      - update
    apiGroups:
      - ''
    resources:
      - pods/portforward

---

kind: ServiceAccount
apiVersion: v1
metadata:
  name: thanos-port-forwarder
  namespace: openshift-monitoring

---

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: thanos-port-forwarder
  namespace: openshift-monitoring
subjects:
  - kind: ServiceAccount
    name: thanos-port-forwarder
    namespace: openshift-monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: thanos-port-forwarder

Remember the HTTP/2 protocol needed for gRPC? Notice that the thanos-grpc-port-forwarder service specifies an appProtocol of type h2c. This is HTTP/2 over TCP, and it’s necessary due to the way the thanos container is run. If you check all its argument, you will notice that there is no flag for configuring the gRPC server (it makes sense, since it won’t be reachable anyway). This gRPC server is running on HTTP/2 without TLS, and this is a problem if you are using OCP version < 4.10, because the ingress controller did not support HTTP/2 routes other than TLS - Re-encrypt and Passthrough - and you would be stuck because you could not create a HTTP/2 insecure or edge route. Note that at this point the thanos-grpc-port-forwarder service could also be a LoadBalancer/NodePort service, thus not needing a route at all; this may depend on your infrastructure, but it could be your best way to expose Thanos if you are on OCP version < 4.10. Although not documented, OCP 4.10 introduces the possibility of edge encrypted routes for HTTP/2 via h2c ( see this merged pull request for more details).

Finally, the Route!

kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: thanos-grpc-port-forwarder
  namespace: openshift-monitoring
spec:
  host: <redacted>
  to:
    kind: Service
    name: thanos-grpc-port-forwarder
    weight: 100
  port:
    targetPort: 10901
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Allow
  wildcardPolicy: None

At this point, to check if everything is correct, you can still use gRPCurl:

grpcurl <route>:443 list

It should respond with a list of the methods available on Thanos service.

Done! You can repeat the exact same steps for OCP2, and any other cluster that you want to monitor, and then you are ready to go.

Thanos deploy
#

For the Main Thanos deploy, I would suggest you to look at the Helm chart mantained by Bitnami - its configuration are out of the scope of this article, but the relevant part, regarding the scraping of our freshly configured stores, is the query.stores parameter. For example:

helm install thanos bitnami/thanos --set "query.replicaLabel={prometheus_replica,cluster_replica}" --set "query.stores={<ocp1-route>:443,<ocp2-route>:443}"

Also note the query.replicaLabel parameter in the helm values: this is needed for de-duplication of data when fetching from multiple prometheus/thanos instances. prometheus_replica is a default annotation that is used by the Thanos managed by OCP, and the cluster_replica is the one that we set before on each OCP cluster.

To check that the stores are working fine, you should be see a log like this in the thanos-query container:

level=info ts=2022-11-29T17:41:22.948335452Z caller=endpointset.go:381 component=endpointset msg="adding new query with [storeAPI rulesAPI exemplarsAPI targetsAPI MetricMetadataAPI QueryAPI]" address=<ocp1-route>:443 extLset="{cluster=\"ocp1\", prometheus=\"openshift-monitoring/k8s\", prometheus_replica=\"prometheus-k8s-0\", region=\"XXXXX\"},{cluster=\"ocp1\", prometheus=\"openshift-monitoring/k8s\", prometheus_replica=\"prometheus-k8s-1\", region=\"XXXXX\"},{cluster=\"ocp1\", prometheus=\"openshift-user-workload-monitoring/user-workload\", prometheus_replica=\"prometheus-user-workload-0\", XXXXXX\"},{cluster=\"ocp1\", prometheus=\"openshift-user-workload-monitoring/user-workload\", prometheus_replica=\"prometheus-user-workload-1\", region=\"XXXXXX\"},{thanos_ruler_replica=\"thanos-ruler-user-workload-0\"},{thanos_ruler_replica=\"thanos-ruler-user-workload-1\"}"
level=debug ts=2022-11-29T17:41:22.948379345Z caller=endpointset.go:389 component=endpointset msg="updated endpoints" activeEndpoints=1

Thanks for reading this! Let me know if you found it useful, if something can be done better - and how!  See you next time!