Under the hood of OpenShift update mechanism

06-06-2025 4141 words 20 minutes

Contents

Buckle up for a trip inside the cluster operators world!

1 Introduction

OpenShift Cluster Upgrade is a very interesting topic :

It is fully automated
It is resilient
It can be triggered by one command / click
All cluster components including etcd, API, kubelet and nodes Operating Systems are upgraded at the same time
It relies on container images only, even when it comes to upgrading nodes operating systems

OpenShift is a Kubernetes distribution built around the concept of Cluster Operators. Those are Kubernetes Operators dedicated to manage the resources required by the cluster to serve its purpose.

These Cluster Operators include one for managing the lifecycle of etcd, another for managing the lifecycle of the API Server, another responsible of the monitoring stack, et cetera.

Among those, there are two operators that are interesting when it comes to cluster upgrades :

The cluster-version-operator
The machine-config-operator.

This blog is about looking under the hood of these two operators to better understand how the whole upgrade process works. Let’s get down this rabbit hole !

2. Cluster upgrade

Starting the upgrade

Let’s start from the official documentation. Here is the command given to update a cluster to a specific version :

        
oc adm upgrade --to=quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0

Note that to update a cluster, you need to use the oc CLI and more specifically the upgrade subcommand. Also, the upgrade subcommand takes a reference to a container image as an argument.

Here, the quay.io/openshift-release-dev/ocp-release@sha256:720f8... is the release image of OpenShift 4.18.9.

Let’s take a look at the implementation of this subcommand to understand what it does.

Note

The whole oc CLI code be found on the openshift/oc GitHub repository. The upgrade subcommand implementation can be found in the pkg/cli/admin/upgrade/upgrade.go code file.

Each subcommand has its own directory and code file. The starting point to study a subcommand implementation is the Run() function that can be found in those code files.

Looking at the code, we see that in our case, the Run() function is organized around a switch instruction :

        
        
        
    
func (o *Options) Run() error {
    switch {
    	case o.Clear:
          // ...
        case o.ToMultiArch:
          // ...
        case o.ToLatestAvailable, len(o.To) > 0, len(o.ToImage) > 0:
          // ...
        default:
          // ...
    }
}

Earlier, we ran the upgrade subcommand using the --to=<version> argument. Thus, we match the third case condition len(o.To) > 0.

In this case, the oc client starts checking if the specified version is among the recommended upgrade paths.

If the given version is not a recommended upgrade, then a warning is issued by the tool and the upgrade needs to be forced.

Note

To make the blog easier to read, I’ll show only simple parts of the code and summarize more advanced parts that can make it harder to read.

Once those checks are done, the patchDesiredUpdate() function is called :

        
        
        
    
func patchDesiredUpdate(ctx context.Context, update *configv1.Update, client configv1client.Interface,
	clusterVersionName string) error {

	updateJSON, err := json.Marshal(update)
	if err != nil {
		return fmt.Errorf("marshal ClusterVersion patch: %v", err)
	}
	patch := []byte(fmt.Sprintf(`{"spec":{"desiredUpdate": %s}}`, updateJSON))
	if _, err := client.ConfigV1().ClusterVersions().Patch(ctx, clusterVersionName, types.MergePatchType, patch,
		metav1.PatchOptions{}); err != nil {

		return fmt.Errorf("Unable to upgrade: %v", err)
	}
	return nil
}

This function updates the ClusterVersion/version object on the cluster, more precisely the spec.desiredUpdate field, which is filled with the reference of the target version container image.

Warning

Even if it is possible, upgrading to a specific version by patching the ClusterVersion/version object directly can be dangerous.

We saw earlier that the oc client implements a check to verify if the given target version is in the recommended upgrade path or not.

Thus it is not recommended to patch the ClusterVersion/version object directly.

The automated process

Now that the ClusterVersion/version object has the desired version written in the desiredUpdate field, what happens on the cluster ?

First things first, let’s see what an OpenShift version is and what it is made of.

OpenShift version definition

As I said in the introduction, OpenShift is built around the concept of Cluster Operators (CO). Those are programs reponsible for managing the OpenShift core compenents lifecycle in an automated manner.

Cluster Operators are deployed using native Kubernetes objects like Deployments. Once they are running, they observe specific resources on the Cluster API and reconcile the cluster state with the one described in those resources.

An OpenShift version is defined by the combination of the versions of :

Each of the Cluster Operator
Red Hat CoreOS (RHCOS) Linux distribution

Tip

For each release, Red Hat publishes a release.txt that summarizes all the content required to run a specific OpenShift version.

This release.txt file can be found on mirror.openshift.com. The example below is from the following release.txt - 4.18.9 - x86_64

        
        
        
    
Name:           4.18.9
Digest:         sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0
Created:        2025-04-10T10:34:38Z
OS/Arch:        linux/amd64
Manifests:      758
Metadata files: 2

Pull From: quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0

Release Metadata:
  Version:  4.18.9
  updates: 4.17.11, 4.17.12, 4.17.13, 4.17.14, 4.17.15, 4.17.16, 4.17.17, 4.17.18, 4.17.19, 4.17.20, 4.17.21, 4.17.22, 4.17.23, 4.17.24, 4.17.25, 4.18.1, 4.18.2, 4.18.3, 4.18.4, 4.18.5, 4.18.6, 4.18.7, 4.18.8
  Metadata:
    url: https://access.redhat.com/errata/RHSA-2025:3775

Component Versions:
  kubectl          1.31.1                
  kubernetes       1.31.7                
  kubernetes-tests 1.31.7                
  machine-os       418.94.202504080525-0 Red Hat Enterprise Linux CoreOS

Images:
  NAME                                           PULL SPEC
  cluster-config-operator                        quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b2690fca3c4f54caf58ff3fd615e2a4914aa63027c727c224dabceb966ac761b
  cluster-dns-operator                           quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6be869d0e2ddb592fb6825e47cfce449384851d8ba842dd2700a71cbcb008afe
  cluster-etcd-operator                          quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b3f5ccd36438583a3b58323bf3a5a69d4835bdf06b61f55c7892fd97f1038035
  cluster-image-registry-operator                quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:167c136dabc99f9f49b36c39ef358bdf7271554427c9e7c7b9aa174d43c5178b
  cluster-ingress-operator                       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f0dcf5bb3b8ebf6729259ecfdc0e1159e009ed5a2a35b9a6b4342f2f8ab53ad8
  cluster-kube-apiserver-operator                quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:202576eb3c11b227a505e824c6784fba949a4659949eefa492478652a1c9e16e
  cluster-kube-cluster-api-operator              quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c551abd805f79f719789ae9ceda1c3626c959d2e403a6ace2c78beb85beb4472
  cluster-kube-controller-manager-operator       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:781c07ba125137a28d0582100727b19c29ea536a4752fa28ef069545827e6d04
  cluster-kube-scheduler-operator                quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb559c500e215c9153afe0f426062c94982e77e3024a781c69210f8919ef1663
  cluster-monitoring-operator                    quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7964aba94c5b660fd0f6ee9119f1c2fdba4a67bcf3cfcf0a6a0a45d1f0834d18
  cluster-network-operator                       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e13ed4e33d59e9996402cc12f7e15e390c584024406215037a3bf15fe89c4504
  cluster-openshift-apiserver-operator           quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:64fdffa22a39a810a83efbdf73ea2d1cbf99dc677fb5adc9b4b4088e68ffaa4d
  cluster-openshift-controller-manager-operator  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:89d5110d21eb9daf2977134649e2475f05660f1482e34f2f361e525582e5ccab
  cluster-storage-operator                       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:63663869c2227e12f5d0854964d0aa47639b24238d42e486c4f6b137d1837f1e
  cluster-version-operator                       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1dfaa99dab1455e6a0d6710a859e9a1e2cdf22acd4f6a48ded3161a8724c0f2a
  coredns                                        quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f0bb0d1a38fc778125e865cacf2ef11a10edf0d94cb96cd22a8690d83ccc5fba
  ovn-kubernetes                                 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:42b4c6bdf3081ec264bc5edbc4eb779151de9e4742d6fe59a316645891df649c
  // ...

As we can see, this gives various information on a specific release :

The number of manifests it contains
The components versions including the machine-os version
The version from which it can be upgraded from
The list of container images used in this release

Furthermore, we saw earlier that to upgrade a cluster, we only needed to put the reference to a container image in the spec.desiredUpdate field of the ClusterVersion/version object. But in the release.txt file, multiple images are referenced.

How does the cluster go from knowing one image to knowing all the images it should use to be up-to-date ?

Let’s have a look at the release image to see what we can learn.

Release image

First, let’s retrieve the image. I ran the oc adm upgrade on a cluster to get the list of recommended updates and their associated container image.

        
oc adm upgrade
Cluster version is 4.18.8

Recommended updates:

  VERSION     IMAGE
  4.18.9      quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0

Then, let’s download the image and inspect its metadata using podman :

        
RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0
podman pull $RELEASE_IMAGE
podman inspect image $RELEASE_IMAGE

Here is a lighter version of the inspect command output :

        
        
        
    
[
     {
          "Id": "efd84b2a452794d480d67f0cd9b1acc8481e770db4294d6782e87bce3f3651b5",
          // ...
          "Config": {
               "Env": [
                    // ...
                    "SOURCE_GIT_URL=https://github.com/openshift/cluster-version-operator",
                    // ...
               ],
               "Entrypoint": [
                    "/usr/bin/cluster-version-operator"
               ],
               "Labels": {
                    "io.openshift.release": "4.18.9",
                    "io.openshift.release.base-image-digest": "sha256:1dfaa99dab1455e6a0d6710a859e9a1e2cdf22acd4f6a48ded3161a8724c0f2a"
               }
          },
          "Architecture": "amd64",
          "Os": "linux",
          "Size": 518315982,
          "Labels": {
               "io.openshift.release": "4.18.9",
               "io.openshift.release.base-image-digest": "sha256:1dfaa99dab1455e6a0d6710a859e9a1e2cdf22acd4f6a48ded3161a8724c0f2a"
          },
          "ManifestType": "application/vnd.docker.distribution.manifest.v2+json",
     }
]

We can see that, according to the Entrypoint, a release image is in fact the image of the cluster-version-operator itself.

We can look into the image by running it locally as it contains the sh binary:

        
podman exec -it --rm --entrypoint /bin/sh $RELEASE_IMAGE

While browsing the filesystem, we find two interesting directories at the root of it :

/manifests
/release-manifests

Looking at their content, we find severals Kubernetes manifests files :

        
        
        
    
sh-5.1# ls -alh manifests/
total 228K
drwxr-xr-x. 1 root root 1.7K Apr  9 11:03 .
dr-xr-xr-x. 1 root root    6 Apr 23 20:18 ..
-rw-rw-r--. 1 root root  713 Apr  9 11:00 0000_00_cluster-version-operator_00_namespace.yaml
-rw-rw-r--. 1 root root  497 Apr  9 11:00 0000_00_cluster-version-operator_01_adminack_configmap.yaml
-rw-rw-r--. 1 root root  448 Apr  9 11:00 0000_00_cluster-version-operator_01_admingate_configmap.yaml
-rw-rw-r--. 1 root root 7.6K Apr  9 11:00 0000_00_cluster-version-operator_01_clusteroperators.crd.yaml
-rw-rw-r--. 1 root root  43K Apr  9 11:00 0000_00_cluster-version-operator_01_clusterversions-CustomNoupdate.crd.yaml
-rw-rw-r--. 1 root root  39K Apr  9 11:00 0000_00_cluster-version-operator_01_clusterversions-Default.crd.yaml
-rw-rw-r--. 1 root root  43K Apr  9 11:00 0000_00_cluster-version-operator_01_clusterversions-DevPreviewNoupdate.crd.yaml
-rw-rw-r--. 1 root root  43K Apr  9 11:00 0000_00_cluster-version-operator_01_clusterversions-TechPreviewNoupdate.crd.yaml
-rw-rw-r--. 1 root root  480 Apr  9 11:00 0000_00_cluster-version-operator_02_roles.yaml
-rw-rw-r--. 1 root root 4.8K Apr  9 11:00 0000_00_cluster-version-operator_03_deployment.yaml
-rw-rw-r--. 1 root root  732 Apr  9 11:00 0000_00_cluster-version-operator_04_service.yaml
-rw-rw-r--. 1 root root  418 Apr  9 11:00 0000_90_cluster-version-operator_00_prometheusrole.yaml
-rw-rw-r--. 1 root root  564 Apr  9 11:00 0000_90_cluster-version-operator_01_prometheusrolebinding.yaml
-rw-rw-r--. 1 root root 9.9K Apr  9 11:00 0000_90_cluster-version-operator_02_servicemonitor.yaml

The manifests directory contains every manifests required to deploy the cluster-version-operator on the cluster.

        
        
        
    
sh-5.1# ls -alh release-manifests/ 
total 17M
drwxrwxrwx. 1 root root  86K Apr  9 12:26 .
dr-xr-xr-x. 1 root root    6 Apr 23 20:18 ..
-r--r--r--. 1 root root  473 Apr  9 10:13 0000_00_config-operator_00_namespace.yaml
-r--r--r--. 1 root root  509 Apr  9 10:31 0000_03_cloud-credential-operator_00_namespace.yaml
-r--r--r--. 1 root root 9.8K Apr  9 10:31 0000_03_cloud-credential-operator_01_crd.yaml
-r--r--r--. 1 root root  13K Apr  9 10:59 0000_03_config-operator_01_clusterresourcequotas.crd.yaml
-r--r--r--. 1 root root 5.3K Apr  9 10:59 0000_03_config-operator_01_proxies.crd.yaml
-r--r--r--. 1 root root  12K Apr  9 10:59 0000_03_config-operator_01_rolebindingrestrictions.crd.yaml
-r--r--r--. 1 root root  18K Apr  9 10:59 0000_03_config-operator_01_securitycontextconstraints-CustomNoupdate.crd.yaml
-r--r--r--. 1 root root  17K Apr  9 10:59 0000_03_config-operator_01_securitycontextconstraints-Default.crd.yaml
-r--r--r--. 1 root root  18K Apr  9 10:59 0000_03_config-operator_01_securitycontextconstraints-DevPreviewNoupdate.crd.yaml
-r--r--r--. 1 root root  18K Apr  9 10:59 0000_03_config-operator_01_securitycontextconstraints-TechPreviewNoupdate.crd.yaml
-r--r--r--. 1 root root 2.5K Apr  9 10:59 0000_03_config-operator_02_rangeallocations.crd.yaml
-r--r--r--. 1 root root  684 Apr  9 10:13 0000_03_marketplace-operator_02_operatorhub.cr.yaml
-r--r--r--. 1 root root 5.1K Apr  9 10:13 0000_03_marketplace_01_operatorhubs.crd.yaml
// ...
-r--r--r--. 1 root root 120K Apr  9 12:26 image-references
-r--r--r--. 1 root root  500 Apr  9 12:26 release-metadata

The release-manifests directory contains all the manifests that need to be applied in order to update the cluster, including the ones required to update the Cluster Operators to their new version.

Consequently, operator deployments are updated with the reference of the new version of the cluster operator image. Kubernetes’ native mechanisms come into play to replace the current pod with one running the new version of the operator.

The operator in turn modifies the resources it manages to update them as necessary.

Perfect, we find how the cluster goes from having the knowledge of the release image only to having the knowledge of all cluster operators images.

Even if we have a better understanding of the release image content, we still don’t know two important things :

How are those manifests getting applied on the cluster ?
How are nodes getting updated ?

Let’s try to answer these two questions!

The controller managing the `ClusterVersion` kind

Earlier, we showed that to upgrade a cluster, we needed to update the ClusterVersion/version object, which is a specific OpenShift resource which is managed by … the cluster-version-operator !

Let’s take a look at the cluster-version-operator code to better understand how it works and what it does with the release image.

Note

The openshift/cluster-version-operator code is accessible on GitHub.

The code we are mainly interested in is located in the pkg/cvo/updatepayload.go file.

The same way I did earlier, I will not show all the operator code, but only highlight interesting parts.

Release image consumption

A payloadRetriever struct is implemented by the cluster-version-operator, which is reponsible for downloading the release image on the cluster. The fecthUpdatePayloadToDir() function can be called on the payloadRetriever struct.

It creates a new pod, containing 4 containers. Those containers run the release image with specific commands, instead of the cluster-version-operator binary, which is the default entrypoint.

Running a pod with multiple containers using the release image is an elegant way to download it on the cluster. Elegant in the way that it use default Kubernetes mechanism, respecting the KISS principle.

Note

The struct and function implementations are located in the pkg/cvo/updatepayload.go file.

Here is an abbreviated version of the pod created by the function :

        
        
        
    
apiVersion: v1
kind: Pod
metadata:
  // ...
spec:
  initContainers:
    - name: cleanup
      command: ["sh", "-c", "rm -fR ./*"]
      volumeMounts:
        - name: payloads
          mountPath: /etc/cvo/updatepayloads
      workingDir: /etc/cvo/updatepayloads/
      image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
    - name: make-temporary-directory
      command:
        - mkdir
        - /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw-whtp2
      volumeMounts:
        - name: payloads
          mountPath: /etc/cvo/updatepayloads
      image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
    - name: move-operator-manifests-to-temporary-directory
      command:
        - mv
        - /manifests
        - /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw-whtp2/manifests
      volumeMounts:
        - name: payloads
          mountPath: /etc/cvo/updatepayloads
      image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
    - name: move-release-manifests-to-temporary-directory
      command:
        - mv
        - /release-manifests
        - /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw-whtp2/release-manifests
      volumeMounts:
        - name: payloads
          mountPath: /etc/cvo/updatepayloads
      image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
    containers:
      - name: rename-to-final-location
        command:
          - mv
          - /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw-whtp2
          - /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw
        volumeMounts:
          - name: payloads
            mountPath: /etc/cvo/updatepayloads
          - name: kube-api-access-x87fj
            readOnly: true
            mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
    volumes:
      - name: payloads
        hostPath:
          path: /etc/cvo/updatepayloads
          type: ''

The pod uses a hostPath volume, named payloads, which targets the /etc/cvo/updatepayloads directory on a master node filesystem. It then extracts all the manifests from the release image in this directory.

Once the manifests are written to the master node filesystem, the cluster-version-operator reads and load them in memory. All manifests are loaded in an array, unordered.

A graph is then created by the cluster-version-operator based on the unordered array. Once fully built, the graph is ordered using the ByNumberAndComponent function.

The ordering algorithm is based on the manifests filename :

        
// ByNumberAndComponent creates parallelization for tasks whose original filenames are of the form
// 0000_NN_NAME_* - files that share 0000_NN_NAME_ are run in serial, but chunks of files that have
// the same 0000_NN but different NAME can be run in parallel. If the input is not sorted in an order
// such that 0000_NN_NAME elements are next to each other, the splitter will treat those as unsplittable
// elements.

Note

To have a better understanding of what this function does, we can generate a graph using a tool named cluster-version-util and feed it the /manifests and /release-manifests directories we discoverd earlier.

This tool then outputs a graph representing the order the cluster-version-operator will use to apply the manifests.
The following screenshot is a part of the final graph, the complete graph being too big to be displayed in a readable manner.

At the top of the graph, we find the first file located in the /release-manifests directory : 0000_00_config-operator_00_namespace.yaml.

As explained earlier with filenames, the cluster-version-operator applies the manifests using a specific order represented by the graph:

Blocks that are at the same level starting from the root are applied in parallel
Manifests in the same block are applied sequentially

Summing up

We are now fully aware of how cluster operators and other Kubernetes resources are updated:

The manifests are shipped through the release image.
The manifests are applied on the cluster by the cluster-version-operator

The beautiful thing is that the manifests required to update the cluster-version-operator itself are included in those manifests.

Because the upgrade payload is dumped on a master node filesystem, the cluster-version-operator can safely update itself by patching its own deployment and let Kubernetes native mechanism the responsibility to restart it. Once the new instance of the cluster-version-operator starts, it reads the manifests from the master node filesystem and continues applying the update seamlessly.

Ok that’s nice, but how are components like Kubelet, the container runtime or even the nodes operating system updated ?

3. Node upgrade

I’ll break the suspens quickly: there is a dedicated cluster operator for the nodes. It is called the machine-config-operator. Let’s have a look at its code.

Note

The same way I did for the other components we studied, we will look in the openshift/machine-config-operator codebase available on GitHub.

machine-config-operator

The machine-config-operator is a Cluster Operator shipped with OpenShift. It is responsible for managing the nodes configuration in a Kubernetes fashion, meaning : declaratively.

Cluster administrators can apply changes on node configuration and operating system by simply modifying MachineConfig objects on the OpenShift API, the same way they would patch a Deployment for example. Those MachineConfig objects are then merged by the machine-config-operator to define the final configuration of the nodes. It is then exposed by the machine-config-server component of the operator and applied on each nodes by the machine-config-deamon running on each of them.

Here is a simple schema of it’s architecture :

Operating system upgrade

Let’s take a look at the machine-config-daemon code to find out how the operating system update is done.

Note

The daemon code is located in the pkg/daemon directory.

There is an update.go file which defines the update() function.

Here is an interesting part in the function code :

        
        
        
    
// Update the node to the provided node configuration.
func (dn *Daemon) update(oldConfig, newConfig *mcfgv1.MachineConfig, skipCertificateWrite bool) (retErr error) {
    // ...

    if dn.os.IsCoreOSVariant() {
		coreOSDaemon := CoreOSDaemon{dn}
		if err := coreOSDaemon.applyOSChanges(*diff, oldConfig, newConfig); err != nil {
			return err
		}

		defer func() {
			if retErr != nil {
				if err := coreOSDaemon.applyOSChanges(*diff, newConfig, oldConfig); err != nil {
					errs := kubeErrs.NewAggregate([]error{err, retErr})
					retErr = fmt.Errorf("error rolling back changes to OS: %w", errs)
					return
				}
			}
		}()
	} else {
		klog.Info("updating the OS on non-CoreOS nodes is not supported")
	}    

    // ...
}

The daemon checks if the underlying OS is CoreOS. If it is, then it calls the applyOSChanges() function.

Note

Red Hat OpenShift can also have RHEL worker nodes. In this case, their configuration is not managed by the machine-config-operator, neither is the upgrade of the operating system. We’ll see why later.

Let’s follow the function calls until there is something meaningful. This leads to the RebaseLayered() function defined in the pkg/daemon/rpm-ostree.go file.

        
        
        
    
// RebaseLayered rebases system or errors if already rebased.
func (r *RpmOstreeClient) RebaseLayered(imgURL string) error {
	// Try to re-link the merged pull secrets if they exist, since it could have been populated without a daemon reboot
	if err := useMergedPullSecrets(rpmOstreeSystem); err != nil {
		return fmt.Errorf("Error while ensuring access to pull secrets: %w", err)
	}
	klog.Infof("Executing rebase to %s", imgURL)
	return runRpmOstree("rebase", "--experimental", "ostree-unverified-registry:"+imgURL)
}

It introduces a new component called rpm-ostree. The RebaseLayered() function effect is exactly the same as running the following command in a shell :

rpm-ostree rebase --experimental ostree-unverified-registry:<os_img_url>

This means we need to look up to another component to understand the whole upgrade process: rpm-ostree.

rpm-ostree

rpm-ostree is an hybrid image/package system. Here are its main features:

Transactional, background image-based (versioned/checksummed) upgrades
OS rollback without affecting user data (/usr, /etc but not /var) via libostree
Client-side package layering (and overrides)

rpm-ostree could be the topic of multiple blog posts, in our case let’s focus on the rebase subcommand.

rpm-ostree rebase

The rpm-ostree rebase command is similar to the git rebase command. It changes the base. If there is any additional commit, they are reapplied to the new base and if there is conflicts, they are resolved automatically or manually if required.

What is a base in rpm-ostree context ?

In the rpm-ostree context, a base refers to a bootable immutable system image. It contains the core set of binaries, packages and configurations. Including the Kubelet, cri-o and others useful binaries.

If we look again into the release-manifests dir of the OpenShift release image, we can find the file 0000_80_machine-config_05_osimageurl.yaml, which has the following content :

        
        
        
    
apiVersion: v1
kind: ConfigMap
metadata:
  name: machine-config-osimageurl
  namespace: openshift-machine-config-operator
data:
  releaseVersion: 4.18.9
  baseOSContainerImage: "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:058137a9f5a2d69ff885f7903f28b30b30767cb6e27c80e6c65c690e1788b424"
  baseOSExtensionsContainerImage: "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:28346d15c1b6f96096ac293846fbda9efbff284e9629d10b1aa17aff18b5511c"
  osImageURL: ""

The baseOSContainerImage references a container image that is named rhel-coreos in the release.txt file. So our base is a container image.

The same way we did earlier, let’s take a closer look at it. As usual, I’ll pull the image and run a container using it :

        
OS_IMAGE=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:058137a9f5a2d69ff885f7903f28b30b30767cb6e27c80e6c65c690e1788b424
podman pull $OS_IMAGE
podman run --it --rm --entrypoint=/bin/sh $OS_IMAGE

Let’s list the repositories and files at the filesystem root :

        
        
        
    
sh-5.1# ls -alh
total 44K
dr-xr-xr-x.   1 root      root       12 May 12 19:10 .
dr-xr-xr-x.   1 root      root       12 May 12 19:10 ..
lrwxrwxrwx.   2 root      root        7 Jan  1  1970 bin -> usr/bin
drwxr-xr-x.   1 root      root        0 Jan  1  1970 boot
drwxr-xr-x.   5 root      root      360 May 12 19:10 dev
drwxr-xr-x.   1 root      root       38 May 12 19:10 etc
lrwxrwxrwx.   2 root      root        8 Jan  1  1970 home -> var/home
lrwxrwxrwx.   2 root      root        7 Jan  1  1970 lib -> usr/lib
lrwxrwxrwx.   2 root      root        9 Jan  1  1970 lib64 -> usr/lib64
lrwxrwxrwx.   2 root      root        9 Jan  1  1970 media -> run/media
lrwxrwxrwx.   2 root      root        7 Jan  1  1970 mnt -> var/mnt
lrwxrwxrwx.   2 root      root        7 Jan  1  1970 opt -> var/opt
lrwxrwxrwx.   2 root      root       14 Jan  1  1970 ostree -> sysroot/ostree
dr-xr-xr-x. 361 nfsnobody nfsnobody   0 May 12 19:10 proc
lrwxrwxrwx.   2 root      root       12 Jan  1  1970 root -> var/roothome
drwxr-xr-x.   1 root      root       40 May 12 19:10 run
lrwxrwxrwx.   2 root      root        8 Jan  1  1970 sbin -> usr/sbin
lrwxrwxrwx.   2 root      root        7 Jan  1  1970 srv -> var/srv
dr-xr-xr-x.  13 nfsnobody nfsnobody   0 May 12 14:29 sys
drwxr-xr-x.   1 root      root       12 Jan  1  1970 sysroot
drwxrwxrwt.   1 root      root        0 Jan  1  1970 tmp
drwxr-xr-x.   1 root      root       38 Apr 21 09:56 usr
drwxr-xr-x.   1 root      root        6 Jan  1  1970 var

It seems to contain a basic Linux filesystem layout. There are some particularities though :

bin, lib, lib64 and sbin are all links to subdirectories in /usr
home, mnt, opt and root are all links to subdirectories in /var
There is an ostree directory

Let’s take a look at space usage :

        
        
        
    
sh-5.1# du -h --max-depth=1 / 2> /dev/null
2G	/sysroot
/boot
/dev
/proc
/run
/sys
/tmp
/usr
12K	/etc
/var
2G	/

/var is empty
/sysroot is 2.2G, thus, it must contain all the data (binaries, packages and configurations)

        
        
        
    
sh-5.1# du -h --max-depth=1 /sysroot/ostree/repo/ 2> /dev/null
2G	/sysroot/ostree/repo/objects
/sysroot/ostree/repo/extensions
/sysroot/ostree/repo/refs
/sysroot/ostree/repo/state
/sysroot/ostree/repo/tmp
2G	/sysroot/ostree/repo/

To be bootable, an OS image must at least contain a vmlinuz and an initramfs file, let’s look for those in the ostree repo :

        
sh-5.1# find /ostree/repo/objects/ -type f -exec file {} + 2>/dev/null | grep -E 'ASCII cpio archive|Linux kernel x86 boot executable bzImage'

/ostree/repo/objects/38/063cbf093cf5f80632b276f56471fba5d068aff9759c16037c9d734e4b7f60.file:             ASCII cpio archive (SVR4 with no CRC)
/ostree/repo/objects/d7/51c8c8b654a9b9a4fb8b79d049ddaf844583baba27065bc0a180777eca4316.file:             Linux kernel x86 boot executable bzImage, version 5.14.0-427.64.1.el9_4.x86_64 (mockbuild@x86-64-01.build.eng.rdu2.redhat.com) #1 SMP PREEMPT_DYNAMIC Fri Apr 4 17:27:15 EDT 2025, RO-rootFS, swap_dev 0xC, Normal VGA

Bingo! We found them. This means the content inside the container image is formatted as an rpm-ostree repo. It’s similar as a git repo, but it contains rpm packages, vmlinuz and initramfs files.

Note

You’ll find more information about rpm-ostree repos on this page : https://ostreedev.github.io/ostree/repo/

When the rpm-ostree rebase command is executed on a specific node, the container image content is extracted on the node filesystem as an new rpm-ostree bootable deployment, in the /ostree/deploy directory.

Then, based on the extracted content, the rebase process performs the following operations:

/usr content is fully replaced by the content of the /usr dir of the new deployment
/etc content is 3-way merged : /etc from the previous deployment, current /etc content from the node and /etc from the new deployment
/boot is updated to target the new deployement, vmlinuz and initramfs are written in it and entries are updated to target them

During this process, the /var directory is left unchanged. This way applications keep their data.

The resulting boot entry targets the new deployment directory. Here is an example extracted from /boot/loader/entries/ostree-1.conf:

        
        
        
    
title Red Hat Enterprise Linux CoreOS 418.94.202504170303-0 (ostree:0)
version 1
options ignition.platform.id=metal $ignition_firstboot systemd.unified_cgroup_hierarchy=1 cgroup_no_v1="all" psi=0 ostree=/ostree/boot.1/rhcos/8566b8b6905d98354dc2423bc33b656e584fc705faf30890cd236f754dd1d1cd/0 root=UUID=910678ff-f77e-4a7d-8d53-86f2ac47a823 rw rootflags=prjquota boot=UUID=cbe17ce2-fd72-4503-9b72-6b7b6a859251
linux /boot/ostree/rhcos-8566b8b6905d98354dc2423bc33b656e584fc705faf30890cd236f754dd1d1cd/vmlinuz-5.14.0-427.65.1.el9_4.x86_64
initrd /boot/ostree/rhcos-8566b8b6905d98354dc2423bc33b656e584fc705faf30890cd236f754dd1d1cd/initramfs-5.14.0-427.65.1.el9_4.x86_64.img
aboot /ostree/deploy/rhcos/deploy/553caf38510cb8d879ec8623d2621be3672e6bb3306722e96570e75e72850109.0/usr/lib/ostree-boot/aboot.img
abootcfg /ostree/deploy/rhcos/deploy/553caf38510cb8d879ec8623d2621be3672e6bb3306722e96570e75e72850109.0/usr/lib/ostree-boot/aboot.cfg

During this operation, kubelet, cri-o and others binaries on the nodes were updated. Once this process is finished, the node is rebooted and the update process is terminated.

4. Conclusion

In this blog post, we deep dived into the various components and elements required to upgrade an OpenShift cluster. We saw how a cluster can be updated using only container images as a shipping tool for the update payload and how the implementation cleverly leverages the various Kubernetes native mechanisms.

In the next section, you’ll find a list of useful tools when you plan to upgrade your cluster. I hope at least one of them is new to you!

5. Useful tools

OpenShift Update Graph

Red Hat OpenShift Container Platform Update Graph

This tools helps you to find an official and supported Upgrade Path from a version A to a version B.

OpenShift Operator Update Information Checker

Red Hat OpenShift Container Platform Operator Update Information Checker

This tools helps you find the right version for your operators to upgrade from a version 4.y.z to a 4.y+2.z. It helps prepare and upgrade and ensure operators are compatible with Kubernetes and OpenShift APIs during the whole process.

OCP Z-Stream Releases Schedule

OCP Z-Stream Releases Schedule - Calendar view

This is a calendar view of all OpenShift programmed releases in the various available channels.

OpenShift CI WebUI

These links take you to the Continuous Integration (CI) dashboards for OpenShift builds on different CPU architectures:

Each CI dashboard shows the build and test pipelines for its respective architecture. You can:

See which changes (commits, bug fixes, enhancements) are included in a given OpenShift release
Compare two OpenShift versions to understand what changed between them
Investigate build/test failures or regressions on a specific architecture

Basically, if you want to know what went into a release, or why something broke on ppc64le but not on amd64, these dashboards are where to start.

For example, you can list the differences between the 4.16.10 and 4.18.10 amd64 releases using this link : https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.18.10?from=4.16.10

Note

The changelog can take a while to load when the gap between two versions is big.

Contents

Under the hood of OpenShift update mechanism

1 Introduction

2. Cluster upgrade

Starting the upgrade

The automated process

OpenShift version definition

Release image

The controller managing the ClusterVersion kind

Release image consumption

Summing up

3. Node upgrade

machine-config-operator

Operating system upgrade

rpm-ostree

rpm-ostree rebase

4. Conclusion

5. Useful tools

OpenShift Update Graph

OpenShift Operator Update Information Checker

OCP Z-Stream Releases Schedule

OpenShift CI WebUI

The controller managing the `ClusterVersion` kind