Disk cloning in Linux using dd command
Everyone likes to have a copy of the disk drive to have a chance to recover. Even If you don’t want a copy now – you will want when your first HDD will be broken. It is used to clone all of the data from the initial etalon dump disk to several hosts disks. It will save you a huge portion-time. But what is the action options to clone the disk – lest’s look:
- have disk drive big enough to store your host disk drive dump in the file at this drive
- connect several disk drives to your localhost and copy it set by set
- connect several disk drives to remote hosts and transmit your host disk drive data over the network to remote hosts disk drives
All of these options are available at the Linux command line and are easy to establish and understand. “Everything is a file” at Linux: from disk driver to file at application layer point of view
“Everything is a file” at Linux
Let’s look at how disk drive partition and files look in Linux:
#!/usr/bin/env bash ls -l /dev/sda5 ls -l just_a_file
ls is the command to list selected files by names: /dev/sda5 is a special file associated with disk drive partition and just_a_file is a file at files system displayed by console. What is the difference in the output? The first letter “b” shows that /dev/sda5 is a block device file – not just a file. Let’s look from the inside of Linux: the global conception of data storage and manipulation at Linux is – “everything is the file”. It means that even devices a Linux point of view can be represented as files. In the deep of OS code you can find that file is an abstract instance that has several well-structured functions specific for the date it holds:
- read function
- write function
- open function (to make correct data access on several readers/writers)
- close function
- ioctl function (very specific actions function for device driver files)
In this conception – the hard disk drive is represented as a file at /dev/ folder. As a file, this HDD has a read function defined. This read acts like any read of file data at any application layer program. So – we can read the HDD entry. We will be in logic if this file has a write function defined also.
How to detect disk drive files?
Reasonable question is – how to detect the file and the “file size” for a copy. It is possible by using “lsblk” and “df” Linux utilities. First utility list of block devices files of the system. The second utility shows the size of the device in blocks.
lsblk script body:
#!/usr/bin/env bash lsblk
lsblk script execution console:
At console output we can see disk file “/dev/sda” with subfiles represents partitions of this disk: “/dev/sda1“, “/dev/sda2“, “/dev/sda3″.
df script body:
#!/usr/bin/env bash df /dev/sda
At the console output, we can see the size of our device file in 1K blocks.
dd – standard disk dump utility of Linux
Now let’s look at “dd” Linux standard disk dump utility. “dd” operates with files (with data streams exactly) in input and output and just get data from input file block by block and pass this data to the output file. “dd” utility have several running params – we will look at main params and overview how to use it.
dd running format with main wide used params:
dd [if=<input_file>] [of=<output_file>] [bs=<number_of_bytes_at_the_single_io_operation>] [count=<number_of_block_to_be_dumped>]
dd running notes:
“if” param can be reduced – in case of reduction input be collected from a standard input stream of application. “of” param can be reduced – in case of reduction output be directed to the standard output stream of application. “bs” param can be reduced – in case of reduction some system-specific default parameter be selected. “count” parameter can be reduced – in case of reduction all of the dd param input file entry be dumped to dd output file param sink.
(You should have root permissions to act with dd. Please be sure that you are going to do – your data can be harmed on wrong actions)
Using dd to store disk dump at file:
Now we go one by one to use cases to look at how to use “dd“. First, let’s look at storing disk images to file. In our case, we will copy cdrom device disk image clone to the file that we will specify.
dd disk to file cloning script body:
#!/usr/bin/env bash dd if=/dev/sr0 of=cdrom_disk_dump_file ls -l cdrom_disk_dump_file rm cdrom_disk_dump_file
dd disk to file cloning script running at the console:
As we can see at the output at console raw cdrom dump have been copied at the file at our disk with rate 392Mb/s. Throughput rate can vary on the disk type you using and “bs” param selected. “1457256+0″ is a number of blocks copied in and out on “dd” run.
(please note that /dev/sr0 is cdrom disk name at my system – your system can hold other names)
Using dd to store raw disk image at other disk connected to host:
As mentioned before it is very useful to clone the main disk as it is: from one disk drive to another. “dd” is useful for this case also.
dd raw disk image clone to other disk script body:
#!/usr/bin/env bash lsblk /dev/sda /dev/sdb sudo dd if=/dev/sda /dev/sdb bs=1k lsblk /dev/sda /dev/sdb
dd raw disk image clone to other disk script running at the console:
As we can see before the copy destination disk /dev/sdb have no data. But after the copy partition structure of /dev/sda is equal to the structure of /dev/sdb.
Note: Please note that “sudo” command is needed here to be root on working on dev file directly.
Store disk dump at another network-connected host disk
In this example, we will translate the dump of our disk over the network by using nc utility. This utility will be run at our host to put data in network and get it from the network. It will be run local – but you can write both of the sending and receiving scripts at different hosts.
Script sending disk dump body:
#!/usr/bin/env bash lsblk /dev/sda /dev/sdb sudo dd if=/dev/sda | nc -l
dd here have no “of” parameter – it will write to standard output stream in this case. This stream redirected to following nc call standard input stream by using of “|“. nc transmit disk data by TCP connection to “localhost” (out current host) port 9000. We only need an appropriate TCP listener at port 9000 to succeed. It will follow next.
Script receiving disk dump body:
#!/usr/bin/env bash lsblk /dev/sda /dev/sdb nc -l 9000 | sudo dd of=/dev/sdb bs=1M lsblk /dev/sda /dev/sdb
Here we listening for TCP connection at port 9000 by nc. Next nc transmit all connection data to the standard output stream of nc. This is passed to standard input stream of dd by using of “|” redirection command. dd have no “if” parameter here – it will get standard input stream as input in this case (means nc standard output stream). In the end, we can see – that /dev/sdb partition structure is equal to /dev/sda as expected.