Categories:

Introduction

Anything that you store inside a docker container will be available to you till the lifetime of the container. This is because each container creates its own namespace for storage(using AUFS or devicemapper), and as soon as you delete the container, the storage associated with it is also deleted.  This is by design, and is expected behaviour from the docker engine. 

However, this does not work in an environment where your application needs to store data permanently. You need the application data even if you delete the container, so that another container can resume the work. This is a bare minimum requirement for running an application on a docker container. 

The easiest method available for persistency of storage in docker containers is to store it on the base system. Basically you designate a base system path for persistent application data, and map a location in the container to use it(this is done while starting the container). 

This method is also fine on a single system. Imagine if you are working on a docker orchestration platform like kubernetes, nomad or mesos. Your container can be scheduled on any server available to the scheduler, the persistent storage must be available to all systems and then only it can be mapped properly to a location inside the container. This can be achieved by having a network file system like Glusterfs, which can scale indefinitely. 

In this article, let us go through the process to use glusterfs filesystem, with kubernetes for persistent storage in pods. 

Note: If you have a similar requirement in AWS cloud, you can pretty much use EFS service from AWS with kubernetes. The method outlined in this article is much more helpful in a non cloud environment for running kubernetes.

Prerequisite

This article assumes that you have a working kubernetes cluster with proper access to it. This cluster will be used to demonstrate persistent storage for pods. 

This article also assumes the Operating system type to be Ubuntu 18.4. Although this is not a compulsory requirement. You can adapt this to any linux platform available.

What is Gluster?

Gluster file system is nothing but distributed file system that can scale to multiple nodes. The file system exposes NAS using either NFS protocol or glusterfs protocol. 

The idea of glusterfs file system is to combine multiple servers and their resources and storages, to form a giant storage pool, that can scale out on demand. With inbuilt facilities to replace servers, move data etc.  Consider it as RAID distributed over network with multiple servers taking part. 

How to Install Gluster?

Installing gluster can be done using the package manager like APT, or YUM. Below shown commands will get you gluster installed. 

#apt-get update
#apt install software-properties-common -y
#add-apt-repository ppa:gluster/glusterfs-3.12
#apt-get install glusterfs-server

 

How to Create a Gluster Volume?

In the previous step, we installed glusterfs software. Now we need to create a volume using a dedicated file system mounted at a particular location on the server. Imagine we have 20G storage available in a new hard disk attached to the server. 

Let us assume the name of the new hard drive attached to the server is /dev/xvdb. We need to first format that hard drive, so that we can use it for gluster volume. 

 

mkfs.xfs /dev/xvdb

The above command will create an XFS file system on our hard drive. XFS is the recommended file system for gluster volumes.  Let us now mount the above hard drive that we formatted to a location like /data/. The below commands does the same. 

#mkdir /data/
#mount /dev/xvdb /data/
#mkdir /data/glusterfs

It’s always better to have a directory created inside the hard drive that we mounted, and create the volume using that directory. This is because if the underlying harddrive is not mounted, gluster software can easily detect it, and the data won’t be written to wrong location. As the directory “glusterfs” is created inside /dev/xvdb, if it’s not mounted, gluster volume itself won’t be available. 

Now let us create the volume using the /data/glusterfs path we created. 

gluster volume create test-volume 10.12.1.105:/data/glusterfs

In the above shown command, 10.12.1.105 is the ip address of our gluster node. We can confirm if the volume is created successfully using the below command. 

# gluster volume info
Volume Name: test-volume
Type: Distribute
Volume ID: 9486401c-a32f-4718-88e3-09ed6370c6da
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.12.1.105:/data/glusterfs
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

We can see that we have a distributed volume available to be used on other servers for persistent storage. 

Please keep the fact in mind that in gluster you can have multiple servers with replicas for redundant storage. In this article we are using a single node gluster. In a production environment you can have replicas spanned across multiple nodes in gluster.

How to Use gluster storage inside kubernetes pod?

In the introduction, we discussed that we can map a base system location to another location inside the container for persistent storage. To use the glusterfs file system as persistent storage we first need to ensure that the kubernetes nodes themselves can mount the gluster file system. It is essential for kubernetes nodes to mount gluster locally, so that it can map a location to a pod. 

Get inside all kubernetes nodes and install glusterfs client using the below commands. 

#apt-get update
#apt install software-properties-common -y
#add-apt-repository ppa:gluster/glusterfs-3.12
#apt-get install glusterfs-client

To verify if the client is working as expected, you can test mounting the volume we created using the below command. 

#mount -t glusterfs 10.12.1.105:/test-volume /mnt

Replace 10.12.1.105 with your gluster node IP. If the above command succeeds, then pods should also be able to access the storage.  You can unmount it once the test succeeds using the below command. 

#umount /mnt

Follow this same procedure for all kubernetes nodes where pods can be scheduled. 

Now we need to tell kubernetes about our gluster storage endpoints. We can do that using the below json file.  Create a file named “gfsendpoint.json” with the below content. 

 

{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "glusterfs-cluster"
  },
  "subsets": [
    {
      "addresses": [
        {
          "ip": "10.12.1.105"
        }
      ],
      "ports": [
        {
          "port": 1
        }
      ]
    }
  ]
}

As you can see the above json contains our gluster server IP address. If you have multiple gluseterfs servers, you can provide all of the IP addresses with multiple items. If you see the “subsets” is a list and you can have multiple blocks inside for different glusterfs servers. 

You can now apply the above json using kubectl command as shown below. 

#kubectl create -f gfsendpoint.json

Now we need to create a service for the endpoint we created. Create a JSON file named “gfsservice.json” with the below content. 

{
  "kind": "Service",
  "apiVersion": "v1",
  "metadata": {
    "name": "glusterfs-cluster"
  },
  "spec": {
    "ports": [
      {"port": 1}
    ]
  }
}

 

We need to now apply this service against the kubernetes cluster using the kubectl command as shown below. 

#kubectl create -f gfsservice.json
Now we can refer this in a pod as part of container definition. Create a json for our pod using the below json content. Create a file named gfspod.json. 
{
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {
        "name": "glusterfs"
    },
    "spec": {
        "containers": [
            {
                "name": "glusterfs",
                "image": "nginx:1.5",
                "volumeMounts": [
                    {
                        "mountPath": "/opt/data",
                        "name": "glusterfsvol"
                    }
                ]
            }
        ],
        "volumes": [
            {
                "name": "glusterfsvol",
                "glusterfs": {
                    "endpoints": "glusterfs-cluster",
                    "path": "test-volume"
                  }
            }
        ]
    }
}

In the above json, “path” is the volume name that we created inside the gluster server. In our case the volume name is test-volume, hence we are using path as test-volume. Replace it with the name of the gluster volume that is appropriate in your environment.

#kubectl create -f gfspod.json

You can now try creating files inside the pod /opt/data location and it should reflect, and stay persistent in gluster. 

Some commands that will be handy to verify if this is indeed working is mentioned below. 

#kubectl get pods
#kubectl exec -it <podname> ls /opt/data 

 

Dascase provides critical DevOps-as-a-Service, Infrastructure as a Code, Cloud Migrations, Infrastructure solutions and Digital transformation to high growth companies looking for expert support with DevOps, Kubernetes, cloud security, cloud infrastructure, and CI/CD pipelines. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and read more about our Service offering.

 

Tags:

No responses yet

Leave a Reply

Your email address will not be published.