# 2. Understanding Containers

This chapter covers&#x20;

```
- Understanding what a container is
- Differences between containers and virtual machines
- Creating, running and sharing a container image with Docker
- Linux kernel features that make containers possible
```

* Kubernetes primarily manages applications that run in containers  - so, before we start exploring kubernetes, we need to have a good understanding af what a container is.

**Introducing containers**

* Due to the shift to microservice architectures, where systems consist of hundreds of deployed application instances, an alternative to VMs was needed. Containers are that alternative.

#### Comparing containers to virtual machines <a href="#sigil_toc_id_18" id="sigil_toc_id_18"></a>

*
*

```
<figure><img src="/files/1UmASE7jzxUZCcTa1hUr" alt=""><figcaption></figcaption></figure>
```

<figure><img src="/files/AQADo5O6NcdqPGOtLG5H" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/uKmiPjNUJ3YE8rojgbXv" alt=""><figcaption></figcaption></figure>

**what enables containers and what enables virtual machines 🤔**

* While virtual machines rely on CPU virtualization support and hypervisor software on the host, containers are enabled by container technologies supported by the Linux kernel. But instead of interacting with these technologies directly, you typically rely on tools like Docker or Podman, which offer user-friendly interfaces for managing containers.

#### The Docker container platform

* While container technologies have existed for a long time, they only became widely known with the rise of Docker. Docker was the first container system that made containers easily portable across different computers. It simplifies the process of packaging up the application and its dependencies into a single package that can be deployed on any computer running Docker.

**Introducing containers, images and registries**

* Docker is a platform for packaging, distributing and running applications. As mentioned earlier, it allows you to package your application along with its entire environment
* Docker allows you to distribute this package via a public repository to any other Docker-enabled computer

<figure><img src="/files/ovg5OYNwdxpExybF3Gc2" alt=""><figcaption></figcaption></figure>

* A *container image* is the packaged bundle that includes your application and its environment, similar to a zip file or tarball. It consists of the entire filesystem needed by your application, and metadata, such as which executable file to run, the ports the application listens on, and other information about the image.
* An *image registry* is a repository for storing and sharing container images between different people and computers. After you build your image, you can either run it locally, or upload (*push*) the image to a registry and then download (*pull*) it to another computer. Some registries are public, allowing anyone to pull images from it, while others are private and only accessible to individuals, organizations or computers that have the required authentication credentials.
* A *container* is created from a container image and runs as a regular process on the host operating system. However, its environment is isolated from the host and the other processes. The container’s filesystem is derived from the container image, but additional filesystems can also be mounted into the container. Containers are typically resource-restricted, meaning they are allocated specific amounts of resources, such as CPU and memory, and can’t exceed these limits.

**Building, Distributing, Running a Container Image**

* The developer first builds an image. The image is stored locally until the developer pushes it to a registry
* anyone with access to the registry can now pull the image to any other computer running Docker and run it there. Docker creates an isolated container based on the image and runs the specified executable within in it
* Running the application on any computer is made possible by the fact that the environment of the application is decoupled from the environment of the host

<figure><img src="/files/r5jvdXYh4OXXRCCx6S0k" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/AryAEXZ5YpFUB6hYwB2U" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/0FY3WhAwxiXzf78sCcn2" alt=""><figcaption></figcaption></figure>

* when you run an application in a container, It interacts with the files bundled into the container image, along with files in additional filesystems you mount into the container.&#x20;
* The application sees the same files whether it's running on your laptop or production server, even if the production server uses a completely different linux distribution than your laptop. As the application typically can't access the files in the host's filesystem, it doesn't matter if the software libraries installed on the production server differs from those on your laptop.

Understanding Image layers :&#x20;

* Container images are composed of thin layers that can be reused across multiple images. This allows for efficient transfer of images, as only certain layers need to be downloaded if the rest were downloaded to the host previously, For example as part of another image containing the same layers
* Layers make image distribution very effiecient but also help to reduce the storage footprint of images. Docker stores each layer only once.&#x20;

<figure><img src="/files/CtLwMdR4TrYbyH2nSGPM" alt=""><figcaption></figcaption></figure>

```
If all three containers have access to the same files, how can they be completely isolated from each other ? Are changes that application A makes to a file stored in the shared layer not visible to application B? They aren't . Here's why

The filesystems are isolated by the copy-on-write(COW) mechanism. The filesystem of a container consists of read-only layers from the container image and an additional read/write layer stacked on top. when an application running in container A changes a file in one-of the read only layers, the entire file is copied into the container's read/write layer and the file contents are changed there. Since, each container has its own writable layer, changes to shared files are not visible in any other container.
when you delete a file, it is only marked as deleted in the read/write layer, but it's still present in one or more of the layers below. However, it means that deleting files does not reduce the size of the immage.
```

**WARNING**

Even seemingly harmless operations such as changing permissions or ownership of a file result in a new copy of the entire file being created in the read/write layer. If you perform this type of operation on a large file or many files, the image size may swell significantly.

**Installing Docker and running a hello-world container**

* Visit <http://docs.docker.com/install> to get instructions on how to install docker on any operating system
* Run `docker --version`  and `docker run busybox`&#x20;

<figure><img src="/files/1cJKVTfavEJhochMT9O1" alt=""><figcaption></figcaption></figure>

what happened when you run a  container ? 🤔

* The <kbd>docker</kbd> CLI tool sends an instruction to run the container to the Docker daemon, which checks whether the <kbd>busybox</kbd> image is already present in its local image cache. If it isn’t, the daemon pulls it from the Docker Hub registry.

  After downloading the image to your computer, the Docker daemon creates a container from that image and executes the <kbd>echo</kbd> command in it. The command prints the text to the standard output, the process then terminates and the container stops.
* To stop and exit the container, press control-c

<figure><img src="/files/hZm8HmnEc2pfWbKYtLO6" alt=""><figcaption></figcaption></figure>

* if we want to run an image from a different registry, you must specify the registry's address along with the image name. For example, to run an image from charan.io, a publicly accessible image registry simmilar to DockerHub, you would use the command `docker run charan.io/some/image`

Understanding Image tags :&#x20;

* Docker allows you to have multiple variants and versions of the same image under the same name. each variant has a unique tag. if you refer to the images without explicitly specifying the tag, docker assumes that u are refering to the special `latest` tag.
* Even for a single version, there are usually several varients of an image. for ex: redis-7.4.1-bookwarm and redis-7.4.1-alpine
* To run a specific version and/or variant of the image, specify the tag in the image name.&#x20;

```
docker run redis:7.4.1-alpine
```

<figure><img src="/files/CGPdRnWa3LL76lzgkKo7" alt=""><figcaption></figcaption></figure>

* Note that docker itself not the one provides process isolation. The actual isolation of the process takes place at the kernel level of linux using the mechanisms it provides.
* Docker is just a tool, that utilizing those mechanisms, but it's by no means the only one.

**Introducing OpenContainerInitiative (OCI) :**

* After the success of docker, Open contianer initiative(OCI) was born to create open industry standards around container formats and  runtime.&#x20;
* OCI members created OCI Image format specification, which prescribes a standard format for container images, and the OCI Runtime specifications, which defines a standard interface for container runtimes with the aim of standardizing the creation, configuration and execution of containers.

Container runtime interface (CRI), CRI-O, and CONTAINERD

* kubernetes initially used Docker as the container runtime. However, kubernetes now supports different container runtimes through the container runtime interface(CRI), which defines a set of methods for creating, starting, stopping and managing containers.
* One implementation of CRI is cri-o, a lightweight container runtime optimised for kubernetes, which allows it to run containers without using Docker, another commonly used CRI Implementation is containerd, a high performance container runtime developed by docker
* Because of open container initiative and the container runtime, we can build container images with docker and then run in the cluster that employs any other OCI compliant container runtime

**Deploying Kubernetes in Action Demo Application (KIADA)**

Introducing kiada application

* The kubernetes in action Demo Application (KIADA) is a web based application that shows quotes from the book kubernetes in action , asks you kubernetes related questions to help you check how your knowledge is progressing and provides a list of hyperlinks to external websites related to kubernetes

Architecture of the KIADA application :&#x20;

* The architecture of KIADA application is below and the HTML is served by a web application running in a node.js server. the client-side javascript code then retrieves the quote and question from the quote and Quiz restuful services. Together, node.js application and the services make up the complete KIADA Application.
* the web browser directly talks to the three different services.&#x20;

<figure><img src="/files/jB5drqvlwwNEhWTanp3k" alt=""><figcaption></figcaption></figure>

* The HTML version is accessible at the request URI <kbd>/html,</kbd> whereas the text version is at <kbd>/text</kbd>. If the client requests the root URI path <kbd>/</kbd>, the application inspects the <kbd>Accept</kbd> request header to guess whether the client is a graphical web browser, in which case it redirects it to <kbd>/html</kbd>, or a text-based tool like <kbd>curl</kbd>, in which case it sends the plain-text response.

<figure><img src="/files/novi0E2rQo2XiO0HGRlX" alt=""><figcaption></figcaption></figure>

#### Building the application

* let's make process slowly instead of running fully functional application
* Initial version includes the version of the application, the network hostname of the server that processed the client’s request, and the IP of the client. Here’s the plain-text response that it sends:

<https://github.com/charan-happy/KIADA.git> >> Repository for sourcecode of the application

Dockerfile for our app

```docker
FROM node:23-alpine
COPY app.js /app.js
COPY html /html
ENTRYPOINT ["node", "app.js"]
```

The `from` line defines the base image/starting point for our application

The `copy` line copies app.js from local directory to the root directory of the image

The another `copy` line copies html directory into the image. Finally, `entrypoint`  specifies the command that docker should run when you start the container, in the listing the command is `node app.js`

```
## Choosing a base image
You may wonder why use this specific image as your base. Because your app is a Node.js app, you need your image to contain the node binary file to run the app. You could have used any image containing this binary, or you could have even used a Linux distribution base image such as fedora or ubuntu and installed Node.js into the container when building the image. But since the node image already contains everything needed to run Node.js apps, it doesn’t make sense to build the image from scratch. In some organizations, however, the use of a specific base image and adding software to it at build-time may be mandatory.
```

**Building the container image**

`$ docker build -t kiada:latest .`\
![](/files/FcTmD4Vxbewbebnb5yfY)

once you run the above command, we can build our image and can be tagged as `kiada:latest`&#x20;

-t option at the end specifies the desired image name and tag . dot at the end specifies that dockerfile and the artifacts needed to build the image are in the current directory. This is so called build-context

* once the build process is complete, the newly created image is available in your computer's local image store. we can see it by using `docker images`

**How the Image is built**

<figure><img src="/files/h5N4OzZobzDMT1ZxdcxH" alt=""><figcaption></figcaption></figure>

* The build itself isn't performed by docker cli tool instead, the contents of the entire directory are uploaded to the docker daemon and the image is built by it. docker cli and daemon doesn't necessarily to be on the same computer.&#x20;

```
Tip :
Don't add unnecessary files to the build directory, as they will slow down the build process, especially if the docker daemon is located on a remote system
```

To build the image, docker first pulls the base image from the public image repository, unless image is aloready stored locally

It then creates a new container from the image and executes the next directive from dockerfile.&#x20;

The container's final state yields a new image with its own ID.&#x20;

The build process continues by processing the remaining directives in the dockerfile.&#x20;

Each one creates a new image.&#x20;

The final image is then tagged with the tag you specified with `-t` flag in the `docker build` command

**Understanding  image layers**&#x20;

* you may think that each image consists of only the layers of the base image and a single new  layer on top of it. But, that's not the case. when building an image a new layer is created for each individual directive in the dockerfile
* During the build of the `kiada` image, after it pulls all the layers of the base image, Docker creates  a new layer and add `app.js` file into it. sameway for `html` and finally creates the last layer, which specifies the command to run when the container is started. This layer is then triggered as `kiada:latest`

we can see the layers of an image and their sizes by running `docker history` (from top to bottom i.e; top most layers printted first)

```
Tip
Each directive creates a new layer. As mentioned previously, deleting a file only marks the file as deleted in the new layer and doesn’t actually remove the file from the underlying layers. Therefore, you must ensure that the command you run with the RUN directive, deletes all temporary files it creates before completing. Deleting those files in the next RUN directive is pointless

```

with the image build ready, now we can run the container `$docker run --name kiada-container -p 1234:8080 -d kiada`&#x20;

<figure><img src="/files/k81CkVXdWjOV0ckKj2bQ" alt=""><figcaption></figcaption></figure>

* This tells Docker to run a new container called <kbd>kiada-container</kbd> from the <kbd>kiada</kbd> image. The container is detached from the console (<kbd>-d</kbd> flag) and runs in the background. Port <kbd>1234</kbd> on the host computer is mapped to port <kbd>8080</kbd> in the container (specified by the <kbd>-p</kbd> <kbd>1234:8080</kbd> option), so you can access the app at [http://localhost:1234](http://localhost:1234/).

<figure><img src="/files/bExfyy4RhDA825E97wfY" alt=""><figcaption></figcaption></figure>

* Now access the application at [http://localhost:1234](http://localhost:1234/) using <kbd>curl</kbd> or your internet browser:

```
Note :
If the Docker Daemon runs on a different machine, you must replace localhost with the IP of that machine. You can look it up in the DOCKER_HOST environment variable.
```

<figure><img src="/files/Qq0s3AzuzOu9bu5qVUSX" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/KlwSRJtMeq2TLn2ANUA1" alt=""><figcaption></figcaption></figure>

\--> To list all running containers, `docker ps`

<figure><img src="/files/nejA7Rg0TFWXCtgezZMI" alt=""><figcaption></figcaption></figure>

* To get additional information about the container `docker inspect <container-name>`

<figure><img src="/files/xMWqclQGpFbBlWUmP7Ln" alt=""><figcaption></figcaption></figure>

* Docker captures and stores everything the application writes to the standard output and error streams. This is typically the place where applications write their logs. You can use the <kbd>docker logs</kbd> command to see the output:
* we use `docker logs <container-name>` to check application logs

<figure><img src="/files/poEzACgHTjIzsoeLhZZR" alt=""><figcaption></figcaption></figure>

Distributing the container image :&#x20;

* The image you've built is only available locally. To run it on other computers, you must first push it to an external image registry. let's push it to the public docker hub registry, so that you don't need to setup a private one. we have other choices too like acr, acr,gcr,quay.io
* Before you push the image, you must re-tag it according to docker hub's image naming schema. The image name must include your docker hub id, which you choose when you register at <http://hub.docker.com>.
* To re-tag use, `docker tag kiada <yourdockerid>/kiada:0.1`

<figure><img src="/files/fUe8UGml4Nsnjxyd70Jn" alt=""><figcaption></figcaption></figure>

As you can see, both <kbd>kiada</kbd> and charan63/<kbd>kiada:0.1</kbd> point to the same image ID, meaning that these aren’t two images, but a single image with two tags.

* Now before pushing image to dockerhub, first let's login to dockerhub

`docker login -u <username> docker.io`

`docker push <your-docker-hub-id>/kiada:0.1`

&#x20;\- we can run image on any docker-enabled host by running the following command

`docker run --name kiada-container -p 1234:8080 -d charan63/kiada:0.1`

To stop a container run `docker stop kiada-container`

To see all stopped and running containers use `docker ps -a`

To start a stopped container, run `docker start kiada-container`

To delete a container, run `docker rm kiada-container`

* if you delete container only container is deleted not the image associated with it

To delete an image, use `docker rmi kiada:latest`&#x20;

Alternatively, we can use `docker image prune` To remove all unused images

**Understanding containers:**&#x20;

* let's understand how containers do the process isolation without using VMs

**Using namespaces to customize the environment of a process**&#x20;

* The first feature called `linux namespace` ensures that each process has its own view of the system. This means that a process running in a container will only see some of the files , process, network interfaces on the system, as well as a different hostname, just as if it were running in a separate virtual machine.
* Initially, all system resources available in linux os such as filesystem, process id, user id, network interfaces and others, are all in the same bucket that all processes see and use.&#x20;
* But, the kernel allows you to create additional buckets known as namespaces and move resources into them so that they are organized in smaller sets.&#x20;
* This allows you to make each set visible only to process  or a group of processes.&#x20;
* When you create a new process, you can specify the namespace which it should use. The process only sees resources that are in this namespace and none in other namespaces

Introducing Available Namespaces :&#x20;

* There isn't just a single type of namespace. There are infact several type - one for each resource type. A process thus uses not only one namespace, but one namespace for each type :&#x20;

The following types of namespaces exist :&#x20;

&#x20;    👉 The Mount namespace (mnt) isolates mount points (file systems).\
&#x20;    👉 The Process ID namespace (pid) isolates process IDs.\
&#x20;   👉 The Network namespace (net) isolates network devices, stacks, ports,\
etc.\
&#x20;   👉 The Inter-process communication namespace (ipc) isolates the\
communication between processes (this includes isolating message\
queues, shared memory, and others).\
&#x20;   👉 The UNIX Time-sharing System (UTS) namespace isolates the system\
hostname and the Network Information Service (NIS) domain name.\
&#x20;   👉 The User ID namespace (user) isolates user and group IDs.\
&#x20;   👉 The Time namespace allows each container to have its own offset to the\
system clocks.\
&#x20;   👉 The Cgroup namespace isolates the Control Groups root directory. <br>

<figure><img src="/files/atwyG4r5Q88zBpQvejvH" alt=""><figcaption></figcaption></figure>

* Initially only a default process with two network interfaces in the network namesapce exists in the system.  we then create 2 network interfaces for the container and new network namespace. the interfaces can then be moved from the default namespace to the new namespace. once there, they can be renamed, because names must only be unique in each namespace. Finally, the process can be started in this network  &#x20;namespace, which allows it to only see the two interfaces that were created  &#x20;for it
* By solely looking at the available network interfaces, the process can't tell whether it's in a container or in a VM or OS directly running on a bare-metal machine.

Understanding how namespaces isolate processes from each other :&#x20;

* By creating a dedicated namesapce instance for all available namespace types and assigning it to a process. you can make the process believe that it's running in it's own OS.&#x20;
* The main reason for it is that each process has its own environment. A process can only see and use the resources in it's own namespaces.  it can't use any in other namespaces. Likewise, other processes can't use its resources either.&#x20;

Sharing namespaces between multiple processes :&#x20;

* we don't always want to isolate the containers completely from each other. Related containers may want to share certain resources.&#x20;

<figure><img src="/files/yljjwyTtFyuvntU0Mhcw" alt=""><figcaption></figcaption></figure>

In summary, processes may want to share some resources but not others. This is possible because separate namespace types exist. A process has an associated namespace for each type.&#x20;

A container is a process to which several namespaces (one for each type) are assigned. Some are shared with other processes, while others are not. This means that the boundaries between the processes do not at all fall in sameline.

**Exploring the environment of a running container :**&#x20;

* what if you want to see what the environment inside the container looks like ? what is the system host name, what is the local IP address, what binaries and libraries are available on the file system, and so on ?
* To explore this features in the case of VM, you typically connect to it remotely via ssh and use  a shell to execute commands . with containers you run a shell in the container

note: The shell's executable file must be present in the container's filesystem. This isn't always the case with containers running in production

* To run a shelll inside a running container --> use `docker exec -it <container-name> bash`

here -`i` tells docker to run command in interactive mode

`-t` tells it to allocate a pseudo terminal (TTY) so you can use the shell properly

We need both flags to use how shell used to be.&#x20;

if we ommit first, we can't execute any commands

if we ommit second,  the command prompt doesn't appear and some commands may complain that `TERM` varialbe is not set

To list all running process inside a container use , `ps aux`

```
NOTE
If you use macOS or Windows, you must list the processes in the VM that
hosts the Docker daemon, as that’s where your containers run. In Docker
Desktop, you can enter the VM using the command wsl -d docker-desktop
or with docker run --net=host --ipc=host --uts=host --pid=host -it
--security-opt=seccomp=unconfined --privileged --rm -v /:/host
alpine chroot /host
```

<figure><img src="/files/GmLz05iTAtdWGctQfnY4" alt=""><figcaption><p>The PID namespace makes a process sub-tree appear as a separate process tree with<br>its own numbering sequence</p></figcaption></figure>

* As with an isolated process tree, each container also has an isolated  \
  filesystem. If you list the contents of the container’s root directory, only the  \
  files in the container are displayed. This includes files from the container  \
  image and files created during container operation, such as log files.

Linux namespaces make it possible for processes to access only some of the host's resources, but they don't limit how much of a single resource each process can consume for ex: you can use namespace to allow a process to access a particular networkinterface, but you can't limit the network bandwidth the process consumes.&#x20;

You may want to prevent a partular process to consume all the CPU time and preventing critical system processes from running properly. For that we need another linux kernel feature called `cgroups`

The second linux kernel feature that makes containers possible is called `linux control groups (cgroups)`

* It limits, accounts for and isolate system resources such as CPU, memory, disk or network bandwidth.
* when using cgroups, a process or group of processes can only use the allotted CPU time, memory and network bandwidth for ex: this way, process cannot occupy resources that are reserved for other processes.

**Limiting container's use of CPU & Memory**

* if you don't impose any restrictions on the container's use of CPU, it has unrestricted access to all CPU cores on the host. you can explicitly specify which cores a container can use with Docker's `--cpuset -cpus` option
* For ex: `docker run --cpu-set -cpus="1,2.."`
* You can also limit the available CPU time using options --cpus, --cpu-period, --cpu-quota and --cpu-shares.
* For ex: `docker run --cpus="0.5"`
* Docker provides the following options to limit container memory and swap usage:&#x20;

`--memory, --memory-reservation, --kernel-memory, --memory-swap and --memory-swappiness`

For example, to set the maximum memory size available in the container to\
100MB, run the container as follows (m stands for megabyte):

`docker run --memory="100m"`&#x20;

you can use\` $ docker stat\` To know about current CPU/Memory consumption by a container in the system

* Behind the scenes, all these Docker options merely configure the cgroups of  \
  the process. It’s the Kernel that takes care of limiting the resources available  \
  to the process.
* Most containers should run without elevated privileges. Only those programs  \
  that you trust and that actually need the additional privileges should run in  \
  privileged containers.

Note: With Docker you create a privileged container by using the --privileged\
flag

* If an application only needs to invoke some of the sys-calls that require elevated privileges, creating a container with full privileges is not ideal. Fortunately, the Linux kernel also divides privileges into units called capabilities. Examples of capabilities are:
* &#x20;CAP\_NET\_ADMIN allows the process to perform network-related operations, CAP\_NET\_BIND\_SERVICE allows it to bbind to port numbers less than 1024,&#x20;
* CAP\_SYS\_TIME allows it to modify the system clock, and so on.&#x20;
* Capabilities can be added or removed (dropped) from a container when you create it. Each capability represents a set of privileges available to the processes in the container. Docker and Kubernetes drop all capabilities except those required by typical applications, but users can add or drop other capabilities if authorized to do so.

Note : Always follow the principle of least privilege when running containers. Don’t\
give them any capabilities that they don’t need. This prevents attackers from\
using them to gain access to your operating system

* If you need even finer control over what sys-calls a program can make, you  \
  can use seccomp (Secure Computing Mode). You can create a custom  \
  seccomp profile by creating a JSON file that lists the sys-calls that the  \
  container using the profile is allowed to make. You then provide the file to  \
  Docker when you create the container.

* Container can also be secured with two additional mandatory access control  (MAC) mechanisms: SELInux(Security enhanced linux) and APPArmor (Application Armor)

* With SELinux, you attach labels to files and system resources, as well as to  \
  users and processes. A user or process can only access a file or resource if the  \
  labels of all subjects and objects involved match a set of policies. AppArmor  \
  is similar but uses file paths instead of labels and focuses on processes rather  \
  than users.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://charan-techjourney.gitbook.io/charan-techjournal/books/kubernetes-in-action-marko-luksa-kevin-conner/2.-understanding-containers.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.