Wednesday, July 15, 2015

DRBD (Distributed Replicated Block Device)

DRBD (Distributed Replicated Block Device)
---------------------------------------------------------

DRBD refers to block devices which acts as to form high availability (HA) clusters on Linux systems. The process involves mirroring a whole block device over network. This is somewhat similar to network based RAID-1 where data is copied on two storage devices, failure of one results in activating the other one. 


But, there is marked difference between Network based RAID-1 and DRBD. In case of network based RAID-1, there exists a single application which access any one of the two storage devices at any point in time. The RAID layer chooses to read the other device when one fails. There is an abstraction between the application and RAID devices thus the application is not aware of which of the devices it is interacting at any given point. 

But, this is not the case with DRBD where exists two instances of the application, and each can read only from one of the two storage devices. Read more here...

Requirements

– Two disks  (preferably same size)

– Networking between machines (node1 & node2)
– Working DNS resolution  (/etc/hosts file)
– NTP synchronized times on both nodes
– Selinux Permissive (!)
– Iptables ports (7788) allowed

A. Test Environment:

Two CentOS-6.6 systems with IP 172.16.20.46 and 172.16.20.47 designated as 

           "node1" and "node2" respectively 

On both systems Disks are partitioned as;
  /dev/vdb1               1        2082     1049296+  83  Linux
  /dev/vdb2            2083      6241     2096136   83  Linux

DNS resolution is done through /etc/hosts on both the systems by setting hostname 
<--> IP address 
    instead of FQDN  as;
 172.16.20.46 node1
 172.16.20.47 node2

B. SELinux and IPTABLES:


B1. Setting SELinux to permissive (Not a very good idea though!! :) )
# setenfore 0

B2. Allow DRBD port to accept connection from all (test only, change it to specific IPs 
       to your need)
# iptables -I INPUT 4 -p tcp --dport 7788 -j ACCEPT
# service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables: [  OK  ]
 
     # service iptables restart
iptables: Setting chains to policy ACCEPT: filter           [  OK  ]
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Unloading modules:                               [  OK  ]
iptables: Applying firewall rules:                         [  OK  ]

C. Configuration:

Note:  Carry out the following steps on both the systems unless mentioned otherwise;

1. Setup ELRepo (not EPEL, as I could not find DRBD packages for CentOS-6.6 release) on both 

    systems;
  # wget http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm .
# rpm -ivh elrepo-release-6-6.el6.elrepo.noarch.rpm

2. Check the yum repositories once;

# yum repolist
repo id                 repo name                               status
base                  CentOS-6 - Base                                       6,518
datastax              DataStax Repo for Apache Cassandra                         158
elrepo                ELRepo.org Community Enterprise Linux Repository - el6        334
epel                  Extra Packages for Enterprise Linux 6 - x86_64                   11,735
extras                CentOS-6 - Extras                                         38
updates               CentOS-6 - Updates                                         1,336

3. Install the packages;

# yum install drbd83-utils kmod-drbd83

4. Now, insert DRBD module manually into Kernel on both systems (or reboot to make it 

    effective);
  # modprobe drbd
# lsmod | grep drbd
      drbd       332493  0 

5. Create the Distributed Replicated Block Device resource file (/etc/drbd.d/clusterdb.res):
resource clusterdb
{
startup {
wfc-timeout 30;
outdated-wfc-timeout 20;
degr-wfc-timeout 30;
}

net {
cram-hmac-alg sha1;
shared-secret sync_disk;
}

syncer {
rate 10M;
al-extents 257;
on-no-data-accessible io-error;
}

on node1 {
device /dev/drbd0;
disk /dev/vdb2;
address 172.16.20.46:7788;
flexible-meta-disk internal;
}
 
on node2 {
device /dev/drbd0;
disk /dev/vdb2;
address 172.16.20.47:7788;
meta-disk internal;
}
}

6. Copy DRBD configuration files (viz. /etc/drbd.d/clusterdb.res) and entries in /etc/hosts file to    

     "node2"

7. Initialize the DRBD meta data storage on both machines;

# drbdadm create-md clusterdb

--==  Thank you for participating in the global usage survey  ==--
The server's response is:

you are the 25728th user to install this version
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success

8. Start the DRBD service on both systems;
# service drbd start

Starting DRBD resources: [ d(clusterdb) s(clusterdb) n(clusterdb) ]..........


       DRBD's startup script waits for the peer node(s) to appear.

       - In case this node was already a degraded cluster before the
        reboot the timeout is 30 seconds. [degr-wfc-timeout]
        - If the peer was available before the reboot the timeout will
         expire after 30 seconds. [wfc-timeout]
        (These values are for resource 'clusterdb'; 0 sec -> wait forever)
        To abort waiting enter 'yes' [  29]:
.

9. Check status as;

# service drbd status
or,
# cat /proc/drbd

10. As you can see, at the beginning both nodes are secondary, which is normal. We need to figure out   
      which one would act as a primary that will initiate the first 'full sync' between the two nodes; 

      In our case, we choose "node1", thus;
A. On node1:
# drbdadm -- --overwrite-data-of-peer primary clusterdb

11. Put a watch on the sync in progress;

# watch 'cat /proc/drbd'
Every 2.0s: cat /proc/drbd                                 Tue Jul 14 08:59:44 2015

version: 8.3.16 (api:88/proto:86-97)

GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 
      14:51:37
     0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
     ns:429056 nr:0 dw:0 dr:429720 al:0 bm:26 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1666980
[===>................] sync'ed: 20.8% (1666980/2096036)K
finish: 0:03:38 speed: 7,600 (9,128) K/sec

12. Once full sync is achieved, check the status on both the nodes;

A. Node1
# cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-
            24 14:51:37
           0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
           ns:2096036 nr:0 dw:0 dr:2096700 al:0 bm:128 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

B. Node2
[root@node2 ~]# cat /proc/drbd 
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-       
            24 14:51:37
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
           ns:0 nr:2096036 dw:2096036 dr:0 al:0 bm:128 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

13. Create filesystem on Distributed Replicated Block Device device;

On the primary node i.e. node1:
# mkfs.ext4 /dev/drbd0

14. Now, mount DRBD device on primary node;

# mkdir /my-DRBD-Data
# mount /dev/drbd0 /my-DRBD-Data

15. Please note:  

One does not need need to mount the disk from secondary systems explicitly. All data that are 
       written on "/my-DRBD-Data" folder will be synced to the other system i.e. "node2".

16. Lets test that out;

16a. Unmount the "/my-DRBD-Data" folder from the primary node "node1". 
16b. Make the secondary node as primary node.  
16c. Mount back the "/my-DRBD-Data" on the second machine "node2", you will see the same 
                 contents in /my-DRBD-Data  folder.
17. Actual test;

17a. Create some data on Node1:
# cd /my-DRBD-Data
# mkdir bijit
# echo "hi" > test1.txt
# echo "hello" > test2.txt
 
                 [root@node1 my-DRBD-Data]# ll
total 28
drwxr-xr-x. 2 root root  4096 Jul 14 09:20 bijit
drwx------. 2 root root 16384 Jul 14 09:15 lost+found
-rw-r--r--. 1 root root     3 Jul 14 09:20 test1.txt
-rw-r--r--. 1 root root     6 Jul 14 09:21 test2.txt

17b. Unmount the data on the primary Node:

[root@node1 /]# umount /my-DRBD-Data/

17c. Make secondary node as the Primary:

[root@node1 /]# drbdadm secondary clusterdb
[root@node2 ~]# drbdadm -- --overwrite-data-of-peer primary clusterdb

17d. Mount back the "/my-DRBD-Data" on the second machine "node2";
[root@node2 ~]# mkdir /my-DRBD-Data
[root@node2 ~]# mount /dev/drbd0 /my-DRBD-Data/
 
  17e. Check the contents on Node2:
[root@node2 ~]# ll /my-DRBD-Data/
total 28
drwxr-xr-x. 2 root root  4096 Jul 14 09:20 bijit
drwx------. 2 root root 16384 Jul 14 09:15 lost+found
-rw-r--r--. 1 root root     3 Jul 14 09:20 test1.txt
-rw-r--r--. 1 root root     6 Jul 14 09:21 test2.txt

Now, delete and add some contents on the "node2" which now acts as primary. 

Unmount it and make it secondary. Switch to "node1" and make it as primary and mount it back. Check if the changes are replicated !

No comments: