Under the hood of OpenShift update mechanism
Buckle up for a trip inside the cluster operators world!
1 Introduction
OpenShift Cluster Upgrade is a very interesting topic :
- It is fully automated
- It is resilient
- It can be triggered by one command / click
- All cluster components including etcd, API, kubelet and nodes Operating Systems are upgraded at the same time
- It relies on container images only, even when it comes to upgrading nodes operating systems
OpenShift is a Kubernetes distribution built around the concept of Cluster Operators
. Those are Kubernetes Operators dedicated to manage the resources required by the cluster to serve its purpose.
These Cluster Operators
include one for managing the lifecycle of etcd, another for managing the lifecycle of the API Server, another responsible of the monitoring stack, et cetera.
Among those, there are two operators that are interesting when it comes to cluster upgrades :
- The
cluster-version-operator
- The
machine-config-operator
.
This blog is about looking under the hood of these two operators to better understand how the whole upgrade process works. Let’s get down this rabbit hole !
2. Cluster upgrade
Starting the upgrade
Let’s start from the official documentation. Here is the command given to update a cluster to a specific version :
oc adm upgrade --to=quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0
Note that to update a cluster, you need to use the oc
CLI and more specifically the upgrade
subcommand. Also, the upgrade
subcommand takes a reference to a container image as an argument.
Here, the quay.io/openshift-release-dev/ocp-release@sha256:720f8...
is the release image of OpenShift 4.18.9.
Let’s take a look at the implementation of this subcommand to understand what it does.
The whole oc
CLI code be found on the openshift/oc GitHub repository. The upgrade
subcommand implementation can be found in the pkg/cli/admin/upgrade/upgrade.go
code file.
Each subcommand has its own directory and code file. The starting point to study a subcommand implementation is the Run()
function that can be found in those code files.
Looking at the code, we see that in our case, the Run()
function is organized around a switch
instruction :
func (o *Options) Run() error {
switch {
case o.Clear:
// ...
case o.ToMultiArch:
// ...
case o.ToLatestAvailable, len(o.To) > 0, len(o.ToImage) > 0:
// ...
default:
// ...
}
}
Earlier, we ran the upgrade
subcommand using the --to=<version>
argument. Thus, we match the third case condition len(o.To) > 0
.
In this case, the oc
client starts checking if the specified version is among the recommended upgrade paths.
If the given version is not a recommended upgrade, then a warning is issued by the tool and the upgrade needs to be forced.
Once those checks are done, the patchDesiredUpdate()
function is called :
func patchDesiredUpdate(ctx context.Context, update *configv1.Update, client configv1client.Interface,
clusterVersionName string) error {
updateJSON, err := json.Marshal(update)
if err != nil {
return fmt.Errorf("marshal ClusterVersion patch: %v", err)
}
patch := []byte(fmt.Sprintf(`{"spec":{"desiredUpdate": %s}}`, updateJSON))
if _, err := client.ConfigV1().ClusterVersions().Patch(ctx, clusterVersionName, types.MergePatchType, patch,
metav1.PatchOptions{}); err != nil {
return fmt.Errorf("Unable to upgrade: %v", err)
}
return nil
}
This function updates the ClusterVersion/version
object on the cluster, more precisely the spec.desiredUpdate
field, which is filled with the reference of the target version container image.
Even if it is possible, upgrading to a specific version by patching the ClusterVersion/version
object directly can be dangerous.
We saw earlier that the oc
client implements a check to verify if the given target version is in the recommended upgrade path or not.
Thus it is not recommended to patch the ClusterVersion/version
object directly.
The automated process
Now that the ClusterVersion/version
object has the desired version written in the desiredUpdate
field, what happens on the cluster ?
First things first, let’s see what an OpenShift version is and what it is made of.
OpenShift version definition
As I said in the introduction, OpenShift is built around the concept of Cluster Operators (CO). Those are programs reponsible for managing the OpenShift core compenents lifecycle in an automated manner.
Cluster Operators are deployed using native Kubernetes objects like Deployments
.
Once they are running, they observe specific resources on the Cluster API
and reconcile the cluster state with the one described in those resources.
An OpenShift version is defined by the combination of the versions of :
- Each of the Cluster Operator
- Red Hat CoreOS (RHCOS) Linux distribution
For each release, Red Hat publishes a release.txt
that summarizes all the content required to run a specific OpenShift version.
This release.txt file can be found on mirror.openshift.com. The example below is from the following release.txt - 4.18.9 - x86_64
Name: 4.18.9
Digest: sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0
Created: 2025-04-10T10:34:38Z
OS/Arch: linux/amd64
Manifests: 758
Metadata files: 2
Pull From: quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0
Release Metadata:
Version: 4.18.9
updates: 4.17.11, 4.17.12, 4.17.13, 4.17.14, 4.17.15, 4.17.16, 4.17.17, 4.17.18, 4.17.19, 4.17.20, 4.17.21, 4.17.22, 4.17.23, 4.17.24, 4.17.25, 4.18.1, 4.18.2, 4.18.3, 4.18.4, 4.18.5, 4.18.6, 4.18.7, 4.18.8
Metadata:
url: https://access.redhat.com/errata/RHSA-2025:3775
Component Versions:
kubectl 1.31.1
kubernetes 1.31.7
kubernetes-tests 1.31.7
machine-os 418.94.202504080525-0 Red Hat Enterprise Linux CoreOS
Images:
NAME PULL SPEC
cluster-config-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b2690fca3c4f54caf58ff3fd615e2a4914aa63027c727c224dabceb966ac761b
cluster-dns-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6be869d0e2ddb592fb6825e47cfce449384851d8ba842dd2700a71cbcb008afe
cluster-etcd-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b3f5ccd36438583a3b58323bf3a5a69d4835bdf06b61f55c7892fd97f1038035
cluster-image-registry-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:167c136dabc99f9f49b36c39ef358bdf7271554427c9e7c7b9aa174d43c5178b
cluster-ingress-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f0dcf5bb3b8ebf6729259ecfdc0e1159e009ed5a2a35b9a6b4342f2f8ab53ad8
cluster-kube-apiserver-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:202576eb3c11b227a505e824c6784fba949a4659949eefa492478652a1c9e16e
cluster-kube-cluster-api-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c551abd805f79f719789ae9ceda1c3626c959d2e403a6ace2c78beb85beb4472
cluster-kube-controller-manager-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:781c07ba125137a28d0582100727b19c29ea536a4752fa28ef069545827e6d04
cluster-kube-scheduler-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb559c500e215c9153afe0f426062c94982e77e3024a781c69210f8919ef1663
cluster-monitoring-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7964aba94c5b660fd0f6ee9119f1c2fdba4a67bcf3cfcf0a6a0a45d1f0834d18
cluster-network-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e13ed4e33d59e9996402cc12f7e15e390c584024406215037a3bf15fe89c4504
cluster-openshift-apiserver-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:64fdffa22a39a810a83efbdf73ea2d1cbf99dc677fb5adc9b4b4088e68ffaa4d
cluster-openshift-controller-manager-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:89d5110d21eb9daf2977134649e2475f05660f1482e34f2f361e525582e5ccab
cluster-storage-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:63663869c2227e12f5d0854964d0aa47639b24238d42e486c4f6b137d1837f1e
cluster-version-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1dfaa99dab1455e6a0d6710a859e9a1e2cdf22acd4f6a48ded3161a8724c0f2a
coredns quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f0bb0d1a38fc778125e865cacf2ef11a10edf0d94cb96cd22a8690d83ccc5fba
ovn-kubernetes quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:42b4c6bdf3081ec264bc5edbc4eb779151de9e4742d6fe59a316645891df649c
// ...
As we can see, this gives various information on a specific release :
- The number of manifests it contains
- The components versions including the machine-os version
- The version from which it can be upgraded from
- The list of container images used in this release
Furthermore, we saw earlier that to upgrade a cluster, we only needed to put the reference to a container image in the spec.desiredUpdate
field of the ClusterVersion/version
object. But in the release.txt file, multiple images are referenced.
How does the cluster go from knowing one image to knowing all the images it should use to be up-to-date ?
Let’s have a look at the release image to see what we can learn.
Release image
First, let’s retrieve the image. I ran the oc adm upgrade
on a cluster to get the list of recommended updates and their associated container image.
oc adm upgrade
Cluster version is 4.18.8
Recommended updates:
VERSION IMAGE
4.18.9 quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0
Then, let’s download the image and inspect its metadata using podman :
RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0
podman pull $RELEASE_IMAGE
podman inspect image $RELEASE_IMAGE
Here is a lighter version of the inspect command output :
[
{
"Id": "efd84b2a452794d480d67f0cd9b1acc8481e770db4294d6782e87bce3f3651b5",
// ...
"Config": {
"Env": [
// ...
"SOURCE_GIT_URL=https://github.com/openshift/cluster-version-operator",
// ...
],
"Entrypoint": [
"/usr/bin/cluster-version-operator"
],
"Labels": {
"io.openshift.release": "4.18.9",
"io.openshift.release.base-image-digest": "sha256:1dfaa99dab1455e6a0d6710a859e9a1e2cdf22acd4f6a48ded3161a8724c0f2a"
}
},
"Architecture": "amd64",
"Os": "linux",
"Size": 518315982,
"Labels": {
"io.openshift.release": "4.18.9",
"io.openshift.release.base-image-digest": "sha256:1dfaa99dab1455e6a0d6710a859e9a1e2cdf22acd4f6a48ded3161a8724c0f2a"
},
"ManifestType": "application/vnd.docker.distribution.manifest.v2+json",
}
]
We can see that, according to the Entrypoint
, a release image is in fact the image of the cluster-version-operator
itself.
We can look into the image by running it locally as it contains the sh
binary:
podman exec -it --rm --entrypoint /bin/sh $RELEASE_IMAGE
While browsing the filesystem, we find two interesting directories at the root of it :
/manifests
/release-manifests
Looking at their content, we find severals Kubernetes manifests
files :
sh-5.1# ls -alh manifests/
total 228K
drwxr-xr-x. 1 root root 1.7K Apr 9 11:03 .
dr-xr-xr-x. 1 root root 6 Apr 23 20:18 ..
-rw-rw-r--. 1 root root 713 Apr 9 11:00 0000_00_cluster-version-operator_00_namespace.yaml
-rw-rw-r--. 1 root root 497 Apr 9 11:00 0000_00_cluster-version-operator_01_adminack_configmap.yaml
-rw-rw-r--. 1 root root 448 Apr 9 11:00 0000_00_cluster-version-operator_01_admingate_configmap.yaml
-rw-rw-r--. 1 root root 7.6K Apr 9 11:00 0000_00_cluster-version-operator_01_clusteroperators.crd.yaml
-rw-rw-r--. 1 root root 43K Apr 9 11:00 0000_00_cluster-version-operator_01_clusterversions-CustomNoupdate.crd.yaml
-rw-rw-r--. 1 root root 39K Apr 9 11:00 0000_00_cluster-version-operator_01_clusterversions-Default.crd.yaml
-rw-rw-r--. 1 root root 43K Apr 9 11:00 0000_00_cluster-version-operator_01_clusterversions-DevPreviewNoupdate.crd.yaml
-rw-rw-r--. 1 root root 43K Apr 9 11:00 0000_00_cluster-version-operator_01_clusterversions-TechPreviewNoupdate.crd.yaml
-rw-rw-r--. 1 root root 480 Apr 9 11:00 0000_00_cluster-version-operator_02_roles.yaml
-rw-rw-r--. 1 root root 4.8K Apr 9 11:00 0000_00_cluster-version-operator_03_deployment.yaml
-rw-rw-r--. 1 root root 732 Apr 9 11:00 0000_00_cluster-version-operator_04_service.yaml
-rw-rw-r--. 1 root root 418 Apr 9 11:00 0000_90_cluster-version-operator_00_prometheusrole.yaml
-rw-rw-r--. 1 root root 564 Apr 9 11:00 0000_90_cluster-version-operator_01_prometheusrolebinding.yaml
-rw-rw-r--. 1 root root 9.9K Apr 9 11:00 0000_90_cluster-version-operator_02_servicemonitor.yaml
The manifests
directory contains every manifests required to deploy the cluster-version-operator
on the cluster.
sh-5.1# ls -alh release-manifests/
total 17M
drwxrwxrwx. 1 root root 86K Apr 9 12:26 .
dr-xr-xr-x. 1 root root 6 Apr 23 20:18 ..
-r--r--r--. 1 root root 473 Apr 9 10:13 0000_00_config-operator_00_namespace.yaml
-r--r--r--. 1 root root 509 Apr 9 10:31 0000_03_cloud-credential-operator_00_namespace.yaml
-r--r--r--. 1 root root 9.8K Apr 9 10:31 0000_03_cloud-credential-operator_01_crd.yaml
-r--r--r--. 1 root root 13K Apr 9 10:59 0000_03_config-operator_01_clusterresourcequotas.crd.yaml
-r--r--r--. 1 root root 5.3K Apr 9 10:59 0000_03_config-operator_01_proxies.crd.yaml
-r--r--r--. 1 root root 12K Apr 9 10:59 0000_03_config-operator_01_rolebindingrestrictions.crd.yaml
-r--r--r--. 1 root root 18K Apr 9 10:59 0000_03_config-operator_01_securitycontextconstraints-CustomNoupdate.crd.yaml
-r--r--r--. 1 root root 17K Apr 9 10:59 0000_03_config-operator_01_securitycontextconstraints-Default.crd.yaml
-r--r--r--. 1 root root 18K Apr 9 10:59 0000_03_config-operator_01_securitycontextconstraints-DevPreviewNoupdate.crd.yaml
-r--r--r--. 1 root root 18K Apr 9 10:59 0000_03_config-operator_01_securitycontextconstraints-TechPreviewNoupdate.crd.yaml
-r--r--r--. 1 root root 2.5K Apr 9 10:59 0000_03_config-operator_02_rangeallocations.crd.yaml
-r--r--r--. 1 root root 684 Apr 9 10:13 0000_03_marketplace-operator_02_operatorhub.cr.yaml
-r--r--r--. 1 root root 5.1K Apr 9 10:13 0000_03_marketplace_01_operatorhubs.crd.yaml
// ...
-r--r--r--. 1 root root 120K Apr 9 12:26 image-references
-r--r--r--. 1 root root 500 Apr 9 12:26 release-metadata
The release-manifests
directory contains all the manifests that need to be applied in order to update the cluster, including the ones required to update the Cluster Operators
to their new version.
Consequently, operator deployments are updated with the reference of the new version of the cluster operator image. Kubernetes’ native mechanisms come into play to replace the current pod with one running the new version of the operator.
The operator in turn modifies the resources it manages to update them as necessary.
Perfect, we find how the cluster goes from having the knowledge of the release image only to having the knowledge of all cluster operators images.
Even if we have a better understanding of the release image content, we still don’t know two important things :
- How are those manifests getting applied on the cluster ?
- How are nodes getting updated ?
Let’s try to answer these two questions!
The controller managing the ClusterVersion
kind
Earlier, we showed that to upgrade a cluster, we needed to update the ClusterVersion/version
object, which is a specific OpenShift resource which is managed by … the cluster-version-operator
!
Let’s take a look at the cluster-version-operator
code to better understand how it works and what it does with the release image.
The openshift/cluster-version-operator code is accessible on GitHub.
The code we are mainly interested in is located in the pkg/cvo/updatepayload.go
file.
The same way I did earlier, I will not show all the operator code, but only highlight interesting parts.
Release image consumption
A payloadRetriever
struct is implemented by the cluster-version-operator
, which is reponsible for downloading the release image on the cluster.
The fecthUpdatePayloadToDir()
function can be called on the payloadRetriever
struct.
It creates a new pod, containing 4 containers. Those containers run the release image with specific commands, instead of the cluster-version-operator binary, which is the default entrypoint.
Running a pod with multiple containers using the release image is an elegant way to download it on the cluster. Elegant in the way that it use default Kubernetes mechanism, respecting the KISS principle.
pkg/cvo/updatepayload.go
file.Here is an abbreviated version of the pod created by the function :
apiVersion: v1
kind: Pod
metadata:
// ...
spec:
initContainers:
- name: cleanup
command: ["sh", "-c", "rm -fR ./*"]
volumeMounts:
- name: payloads
mountPath: /etc/cvo/updatepayloads
workingDir: /etc/cvo/updatepayloads/
image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
- name: make-temporary-directory
command:
- mkdir
- /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw-whtp2
volumeMounts:
- name: payloads
mountPath: /etc/cvo/updatepayloads
image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
- name: move-operator-manifests-to-temporary-directory
command:
- mv
- /manifests
- /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw-whtp2/manifests
volumeMounts:
- name: payloads
mountPath: /etc/cvo/updatepayloads
image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
- name: move-release-manifests-to-temporary-directory
command:
- mv
- /release-manifests
- /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw-whtp2/release-manifests
volumeMounts:
- name: payloads
mountPath: /etc/cvo/updatepayloads
image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
containers:
- name: rename-to-final-location
command:
- mv
- /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw-whtp2
- /etc/cvo/updatepayloads/d4o6wJfmlSbXEKmut5kAcw
volumeMounts:
- name: payloads
mountPath: /etc/cvo/updatepayloads
- name: kube-api-access-x87fj
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
image: 'quay.io/openshift-release-dev/ocp-release@sha256:720f89718effd16de7d77e5533c9608f1845295a2e00dfff543d0cf9aa09b2a0'
volumes:
- name: payloads
hostPath:
path: /etc/cvo/updatepayloads
type: ''
The pod uses a hostPath volume, named payloads
, which targets the /etc/cvo/updatepayloads
directory on a master node filesystem. It then extracts all the manifests from the release image in this directory.
Once the manifests are written to the master node filesystem, the cluster-version-operator
reads and load them in memory. All manifests are loaded in an array, unordered.
A graph is then created by the cluster-version-operator
based on the unordered array.
Once fully built, the graph is ordered using the ByNumberAndComponent
function.
The ordering algorithm is based on the manifests filename :
// ByNumberAndComponent creates parallelization for tasks whose original filenames are of the form
// 0000_NN_NAME_* - files that share 0000_NN_NAME_ are run in serial, but chunks of files that have
// the same 0000_NN but different NAME can be run in parallel. If the input is not sorted in an order
// such that 0000_NN_NAME elements are next to each other, the splitter will treat those as unsplittable
// elements.
To have a better understanding of what this function does, we can generate a graph using a tool named cluster-version-util
and feed it the /manifests
and /release-manifests
directories we discoverd earlier.
This tool then outputs a graph representing the order the cluster-version-operator
will use to apply the manifests.
The following screenshot is a part of the final graph, the complete graph being too big to be displayed in a readable manner.
At the top of the graph, we find the first file located in the /release-manifests
directory : 0000_00_config-operator_00_namespace.yaml
.
As explained earlier with filenames, the cluster-version-operator
applies the manifests using a specific order represented by the graph:
- Blocks that are at the same level starting from the root are applied in parallel
- Manifests in the same block are applied sequentially
Summing up
We are now fully aware of how cluster operators and other Kubernetes resources are updated:
- The manifests are shipped through the release image.
- The manifests are applied on the cluster by the
cluster-version-operator
The beautiful thing is that the manifests required to update the cluster-version-operator
itself are included in those manifests.
Because the upgrade payload is dumped on a master node filesystem, the cluster-version-operator
can safely update itself by patching its own deployment and let Kubernetes native mechanism the responsibility to restart it. Once the new instance of the cluster-version-operator
starts, it reads the manifests from the master node filesystem and continues applying the update seamlessly.
Ok that’s nice, but how are components like Kubelet, the container runtime or even the nodes operating system updated ?
3. Node upgrade
I’ll break the suspens quickly: there is a dedicated cluster operator for the nodes.
It is called the machine-config-operator
. Let’s have a look at its code.
machine-config-operator
The machine-config-operator
is a Cluster Operator shipped with OpenShift. It is responsible for managing the nodes configuration in a Kubernetes fashion, meaning : declaratively.
Cluster administrators can apply changes on node configuration and operating system by simply modifying MachineConfig
objects on the OpenShift API, the same way they would patch a Deployment
for example.
Those MachineConfig
objects are then merged by the machine-config-operator
to define the final configuration of the nodes. It is then exposed by the machine-config-server
component of the operator and applied on each nodes by the machine-config-deamon
running on each of them.
Here is a simple schema of it’s architecture :
Operating system upgrade
Let’s take a look at the machine-config-daemon
code to find out how the operating system update is done.
pkg/daemon
directory.There is an update.go
file which defines the update()
function.
Here is an interesting part in the function code :
// Update the node to the provided node configuration.
func (dn *Daemon) update(oldConfig, newConfig *mcfgv1.MachineConfig, skipCertificateWrite bool) (retErr error) {
// ...
if dn.os.IsCoreOSVariant() {
coreOSDaemon := CoreOSDaemon{dn}
if err := coreOSDaemon.applyOSChanges(*diff, oldConfig, newConfig); err != nil {
return err
}
defer func() {
if retErr != nil {
if err := coreOSDaemon.applyOSChanges(*diff, newConfig, oldConfig); err != nil {
errs := kubeErrs.NewAggregate([]error{err, retErr})
retErr = fmt.Errorf("error rolling back changes to OS: %w", errs)
return
}
}
}()
} else {
klog.Info("updating the OS on non-CoreOS nodes is not supported")
}
// ...
}
The daemon checks if the underlying OS is CoreOS. If it is, then it calls the applyOSChanges()
function.
machine-config-operator
, neither is the upgrade of the operating system.
We’ll see why later.Let’s follow the function calls until there is something meaningful. This leads to the RebaseLayered()
function defined in the pkg/daemon/rpm-ostree.go
file.
// RebaseLayered rebases system or errors if already rebased.
func (r *RpmOstreeClient) RebaseLayered(imgURL string) error {
// Try to re-link the merged pull secrets if they exist, since it could have been populated without a daemon reboot
if err := useMergedPullSecrets(rpmOstreeSystem); err != nil {
return fmt.Errorf("Error while ensuring access to pull secrets: %w", err)
}
klog.Infof("Executing rebase to %s", imgURL)
return runRpmOstree("rebase", "--experimental", "ostree-unverified-registry:"+imgURL)
}
It introduces a new component called rpm-ostree
. The RebaseLayered()
function effect is exactly the same as running the following command in a shell :
rpm-ostree rebase --experimental ostree-unverified-registry:<os_img_url>
This means we need to look up to another component to understand the whole upgrade process: rpm-ostree
.
rpm-ostree
rpm-ostree is an hybrid image/package system. Here are its main features:
- Transactional, background image-based (versioned/checksummed) upgrades
- OS rollback without affecting user data (/usr, /etc but not /var) via libostree
- Client-side package layering (and overrides)
rpm-ostree could be the topic of multiple blog posts, in our case let’s focus on the rebase subcommand.
rpm-ostree rebase
The rpm-ostree rebase
command is similar to the git rebase
command.
It changes the base. If there is any additional commit, they are reapplied to the new base and if there is conflicts, they are resolved automatically or manually if required.
What is a base
in rpm-ostree context ?
In the rpm-ostree
context, a base
refers to a bootable immutable system image.
It contains the core set of binaries, packages and configurations. Including the Kubelet, cri-o and others useful binaries.
If we look again into the release-manifests
dir of the OpenShift release image, we can find the file 0000_80_machine-config_05_osimageurl.yaml
, which has the following content :
apiVersion: v1
kind: ConfigMap
metadata:
name: machine-config-osimageurl
namespace: openshift-machine-config-operator
data:
releaseVersion: 4.18.9
baseOSContainerImage: "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:058137a9f5a2d69ff885f7903f28b30b30767cb6e27c80e6c65c690e1788b424"
baseOSExtensionsContainerImage: "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:28346d15c1b6f96096ac293846fbda9efbff284e9629d10b1aa17aff18b5511c"
osImageURL: ""
The baseOSContainerImage
references a container image that is named rhel-coreos
in the release.txt
file.
So our base
is a container image.
The same way we did earlier, let’s take a closer look at it. As usual, I’ll pull the image and run a container using it :
OS_IMAGE=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:058137a9f5a2d69ff885f7903f28b30b30767cb6e27c80e6c65c690e1788b424
podman pull $OS_IMAGE
podman run --it --rm --entrypoint=/bin/sh $OS_IMAGE
Let’s list the repositories and files at the filesystem root :
sh-5.1# ls -alh
total 44K
dr-xr-xr-x. 1 root root 12 May 12 19:10 .
dr-xr-xr-x. 1 root root 12 May 12 19:10 ..
lrwxrwxrwx. 2 root root 7 Jan 1 1970 bin -> usr/bin
drwxr-xr-x. 1 root root 0 Jan 1 1970 boot
drwxr-xr-x. 5 root root 360 May 12 19:10 dev
drwxr-xr-x. 1 root root 38 May 12 19:10 etc
lrwxrwxrwx. 2 root root 8 Jan 1 1970 home -> var/home
lrwxrwxrwx. 2 root root 7 Jan 1 1970 lib -> usr/lib
lrwxrwxrwx. 2 root root 9 Jan 1 1970 lib64 -> usr/lib64
lrwxrwxrwx. 2 root root 9 Jan 1 1970 media -> run/media
lrwxrwxrwx. 2 root root 7 Jan 1 1970 mnt -> var/mnt
lrwxrwxrwx. 2 root root 7 Jan 1 1970 opt -> var/opt
lrwxrwxrwx. 2 root root 14 Jan 1 1970 ostree -> sysroot/ostree
dr-xr-xr-x. 361 nfsnobody nfsnobody 0 May 12 19:10 proc
lrwxrwxrwx. 2 root root 12 Jan 1 1970 root -> var/roothome
drwxr-xr-x. 1 root root 40 May 12 19:10 run
lrwxrwxrwx. 2 root root 8 Jan 1 1970 sbin -> usr/sbin
lrwxrwxrwx. 2 root root 7 Jan 1 1970 srv -> var/srv
dr-xr-xr-x. 13 nfsnobody nfsnobody 0 May 12 14:29 sys
drwxr-xr-x. 1 root root 12 Jan 1 1970 sysroot
drwxrwxrwt. 1 root root 0 Jan 1 1970 tmp
drwxr-xr-x. 1 root root 38 Apr 21 09:56 usr
drwxr-xr-x. 1 root root 6 Jan 1 1970 var
It seems to contain a basic Linux filesystem layout. There are some particularities though :
- bin, lib, lib64 and sbin are all links to subdirectories in /usr
- home, mnt, opt and root are all links to subdirectories in /var
- There is an ostree directory
Let’s take a look at space usage :
sh-5.1# du -h --max-depth=1 / 2> /dev/null
2.2G /sysroot
0 /boot
0 /dev
0 /proc
0 /run
0 /sys
0 /tmp
0 /usr
12K /etc
0 /var
2.2G /
- /var is empty
- /sysroot is 2.2G, thus, it must contain all the data (binaries, packages and configurations)
sh-5.1# du -h --max-depth=1 /sysroot/ostree/repo/ 2> /dev/null
2.2G /sysroot/ostree/repo/objects
0 /sysroot/ostree/repo/extensions
0 /sysroot/ostree/repo/refs
0 /sysroot/ostree/repo/state
0 /sysroot/ostree/repo/tmp
2.2G /sysroot/ostree/repo/
To be bootable, an OS image must at least contain a vmlinuz and an initramfs file, let’s look for those in the ostree repo :
sh-5.1# find /ostree/repo/objects/ -type f -exec file {} + 2>/dev/null | grep -E 'ASCII cpio archive|Linux kernel x86 boot executable bzImage'
/ostree/repo/objects/38/063cbf093cf5f80632b276f56471fba5d068aff9759c16037c9d734e4b7f60.file: ASCII cpio archive (SVR4 with no CRC)
/ostree/repo/objects/d7/51c8c8b654a9b9a4fb8b79d049ddaf844583baba27065bc0a180777eca4316.file: Linux kernel x86 boot executable bzImage, version 5.14.0-427.64.1.el9_4.x86_64 (mockbuild@x86-64-01.build.eng.rdu2.redhat.com) #1 SMP PREEMPT_DYNAMIC Fri Apr 4 17:27:15 EDT 2025, RO-rootFS, swap_dev 0xC, Normal VGA
Bingo! We found them.
This means the content inside the container image is formatted as an rpm-ostree
repo.
It’s similar as a git repo, but it contains rpm packages, vmlinuz and initramfs files.
rpm-ostree
repos on this page : https://ostreedev.github.io/ostree/repo/When the rpm-ostree
rebase command is executed on a specific node, the container image content is extracted on the node filesystem as an new rpm-ostree
bootable deployment, in the /ostree/deploy
directory.
Then, based on the extracted content, the rebase process performs the following operations:
- /usr content is fully replaced by the content of the /usr dir of the new deployment
- /etc content is 3-way merged : /etc from the previous deployment, current /etc content from the node and /etc from the new deployment
- /boot is updated to target the new deployement, vmlinuz and initramfs are written in it and entries are updated to target them
During this process, the /var directory is left unchanged. This way applications keep their data.
The resulting boot entry targets the new deployment directory. Here is an example extracted from /boot/loader/entries/ostree-1.conf
:
title Red Hat Enterprise Linux CoreOS 418.94.202504170303-0 (ostree:0)
version 1
options ignition.platform.id=metal $ignition_firstboot systemd.unified_cgroup_hierarchy=1 cgroup_no_v1="all" psi=0 ostree=/ostree/boot.1/rhcos/8566b8b6905d98354dc2423bc33b656e584fc705faf30890cd236f754dd1d1cd/0 root=UUID=910678ff-f77e-4a7d-8d53-86f2ac47a823 rw rootflags=prjquota boot=UUID=cbe17ce2-fd72-4503-9b72-6b7b6a859251
linux /boot/ostree/rhcos-8566b8b6905d98354dc2423bc33b656e584fc705faf30890cd236f754dd1d1cd/vmlinuz-5.14.0-427.65.1.el9_4.x86_64
initrd /boot/ostree/rhcos-8566b8b6905d98354dc2423bc33b656e584fc705faf30890cd236f754dd1d1cd/initramfs-5.14.0-427.65.1.el9_4.x86_64.img
aboot /ostree/deploy/rhcos/deploy/553caf38510cb8d879ec8623d2621be3672e6bb3306722e96570e75e72850109.0/usr/lib/ostree-boot/aboot.img
abootcfg /ostree/deploy/rhcos/deploy/553caf38510cb8d879ec8623d2621be3672e6bb3306722e96570e75e72850109.0/usr/lib/ostree-boot/aboot.cfg
During this operation, kubelet, cri-o and others binaries on the nodes were updated. Once this process is finished, the node is rebooted and the update process is terminated.
4. Conclusion
In this blog post, we deep dived into the various components and elements required to upgrade an OpenShift cluster. We saw how a cluster can be updated using only container images as a shipping tool for the update payload and how the implementation cleverly leverages the various Kubernetes native mechanisms.
In the next section, you’ll find a list of useful tools when you plan to upgrade your cluster. I hope at least one of them is new to you!
5. Useful tools
OpenShift Update Graph
This tools helps you to find an official and supported Upgrade Path from a version A to a version B.
OpenShift Operator Update Information Checker
This tools helps you find the right version for your operators to upgrade from a version 4.y.z
to a 4.y+2.z
.
It helps prepare and upgrade and ensure operators are compatible with Kubernetes and OpenShift APIs during the whole process.
OCP Z-Stream Releases Schedule
This is a calendar view of all OpenShift programmed releases in the various available channels.
OpenShift CI WebUI
These links take you to the Continuous Integration (CI) dashboards for OpenShift builds on different CPU architectures:
Each CI dashboard shows the build and test pipelines for its respective architecture. You can:
-
See which changes (commits, bug fixes, enhancements) are included in a given OpenShift release
-
Compare two OpenShift versions to understand what changed between them
-
Investigate build/test failures or regressions on a specific architecture
Basically, if you want to know what went into a release, or why something broke on ppc64le but not on amd64, these dashboards are where to start.
For example, you can list the differences between the 4.16.10
and 4.18.10
amd64 releases using this link :
https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.18.10?from=4.16.10