wiki'd

by JoKeru

Server Backup with DRBD (and LVM + Snapshots)

Distributed Replicated Block Device (DRBD) mirrors block devices between multiple hosts via an assigned network to form high availability clusters. The replication is transparent to other applications on the host systems. Any block device hard disks, partitions, RAID devices, logical volumes, etc can be mirrored. DRBD can be understood as network based raid-1.

DRBD can also be used as a backup solution over network or data disaster recovery solution. In our setup we'll be using the following layers:
[cc lang='bash']


| physical disk |

| lvm |

| drbd |

| ext4 |

[/cc]
The second layer is LVM because on the backup server you cannot access the data directly (DRBD will not allow to mount the /dev/drbd0 volume, it’s locked to the drbd0 process). But snapshots allow us to subvert the lock a bit. Or at least give us read access to the existing data.

[cc lang='bash']
# node01 is the main server
# /dev/sdb is the disk containing important data, 16g size
\$ apt-get install drbd8-utils lvm2
\$ fdisk /dev/sdb # create 1 primary lvm partition (type 8e), 16g size
\$ pvcreate /dev/sdb1
\$ vgcreate vg0 /dev/sdb1
\$ lvcreate -l 100%FREE -n lv0 vg0
\$ lvdisplay | grep "Current LE" # this value will be used on node02 setup
Current LE 4095
[/cc]

[cc lang='bash']
# node02 is the backup server
# /dev/sdb is the disk storing the backup data, 32g size
\$ apt-get install drbd8-utils lvm2
\$ fdisk /dev/sdb # create 1 primary lvm partition (type 8e), 32g size
\$ pvcreate /dev/sdb1
\$ vgcreate vg0 /dev/sdb1
\$ lvcreate -l 4095 -n lv0 vg0 # this is the "Current LE" value from node01
[/cc]

Configure and start DRBD on both servers:
[cc lang='bash']
# node01 & node02
\$ cat \<\<'EOF' > /etc/drbd.conf
global { usage-count no; }
common { syncer { rate 1000M; } }
resource r0 {
protocol C; # Synchronous replication protocol
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
net {
cram-hmac-alg sha1;
shared-secret "secret";
}
on node01 {
device /dev/drbd0;
disk /dev/vg0/lv0;
address 10.20.30.40:7788;
meta-disk internal;
}
on node02 {
device /dev/drbd0;
disk /dev/vg0/lv0;
address 50.60.70.80:7788;
meta-disk internal;
}
}
EOF
\$ drbdadm create-md r0
\$ service drbd start
[/cc]

Run this only on the main server to promote node01 as primary and start syncing the data to node02:
[cc lang='bash']
# node01
\$ drbdadm -- --overwrite-data-of-peer primary all
\$ watch -n1 "cat /proc/drbd"
# wait until the data is fully synced
\$ mkfs.ext4 /dev/drbd0
\$ mkdir /important
\$ echo '/dev/drbd0 /important ext4 errors=remount-ro 0 1' >> /etc/fstab
\$ mount -a
# let's add some content
\$ cd /important/
\$ dd if=/dev/zero of=zero.100 bs=1M count=100
\$ md5sum zero.100
2f282b84e7e608d5852449ed940bfc51 zero.100
[/cc]

Now let's test if the data is actually syncing between the main and the backup:
[cc lang='bash']
# node01
\$ umount /important
\$ drbdadm secondary r0 # demote the main server to the secondary role

# node02
\$ drbdadm primary r0 # promote the backup server to the primary role
\$ mkdir /important
\$ mount /dev/drbd0 /important
\$ cd /important
\$ ls
lost+found zero.100
\$ md5sum zero.100
2f282b84e7e608d5852449ed940bfc51 zero.100 # same md5sum as main server
\$ service drbd status
drbd driver loaded OK; device status:
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C /important ext4
[/cc]

Let's switch back to the original scenario:
[cc lang='bash']
# node02
\$ umount /important
\$ drbdadm secondary r0

# node01
\$ drbdadm primary r0 # promote the backup server to the primary role
\$ mount -a
[/cc]

Now that everything is working as expected, let's take advantage of our remote real-time backup system using LVM Snapshots:
[cc lang='bash']
# node02
\$ lvcreate -L1G -s -n lv0-bkp01 /dev/vg0/lv0
\$ mkdir /important-bkp01
\$ mount -t ext4 /dev/vg0/lv0-bkp01 /important-bkp01/
\$ cd /important-bkp01/
\$ ls
lost+found zero.100
\$ md5sum zero.100
2f282b84e7e608d5852449ed940bfc51 zero.100 # same md5sum as main server
[/cc]
You can cron the creation / removal of snapshots according to your needs (and your disk size) to provide either a back-in-time file solution or a complete mirror access to the data on primary server.

Comments