Wednesday, 15 April 2015

USB

Been answering a support query recently, and mentioned to client that USB is outright bad in all respects [for data recovery use].

Well, pretty much so,
  • if one of the drives has a bad block, quite likely the USB converter will lock up on hitting that block;
  • with USB 2.0, speed is 15 MB/sec maximum, for all drives combined,
    • even if you have what appears to be different ports, they will be routed through a same root port or hub anyway;
  • devices advertised as USB 3.0 often work at 2.0 speeds, with no warning whatsoever;
  • power supply issues and limitations are difficult to control, 
    • especially so if hubs are involved;
  • any setup with daisy-chained hubs is unstable,
    • especially so with USB 3.0;

So, think twice before starting a recovery with a laptop-based all-USB setup.

Tuesday, 14 April 2015

RAID block size limiters

If you are doing a RAID Recovery and the software has the capability to limit the allowed block sizes for search (which is quite often actually, ReclaiMe Pro has it, Runtime has it, ZAR has it, and perhaps R-Studio has too), and if you happen to know the block size exactly, do not set the limiter to exact block size.

If you know the block size is 128 whatever units, set limits to 64 low and 256 high (of the same units, repeat, the same units). Otherwise, if the automatic detection gets you the value at one of the edges of the range, you do not know if it is because the value is correct, or because it hit the limit and was not able to further change the block size. The final block size must be inside the allowed range, not on the edge.

Friday, 20 March 2015

Fault tolerance in storage systems

When someone says RAID5 is fault-tolerant, this is not meaningful enough.

  1. Specific implementation must be named. 
  2. The set of anticipated failures must be listed.
  3. For each of the anticipated failures, the extent of degradation must be specified.
So, generic implementation of RAID5 does not lose data when exactly one drive fails. This does not say anything about performance and, generally, data availability. Another example is generic NAS does not lose data if its network connection fails. However, the data is unavailable until connection is fixed in some way or other.

So, when talking fault tolerance, don't forget to include at least the set of anticipated failures.

Monday, 19 January 2015

Two different definitions of fragmentation

There are two distinct definitions of file fragmentation on a filesystem.

1. The file is fragmented if reading the entire file requires a head seek after the first data block has been read. 

This definition mostly concerns read performance on rotational  drives. Head seek is slow, so it is something you want to avoid.

If the file is sparse, i.e. has large area full of zeros in it, then the filesystem will not store zeros. Instead, only non-zero start and end of file are stored. From the performance standpoint, that's fine. Filesystem driver will read the first part of the data, then will generate the amount of zeros as required, and continue reading the last part of data without ever making a head seek.

If the file contains some metadata (i.e. ReFS page table) embedded in content, that is also fine from performance standpoint. The ReFS driver will read the file data, and at some point the page table will be needed. It conveniently happens that a page table occupies the next cluster after file data, and the driver will read the page table, analyze it, and continue reading file data.

2. The file is fragmented if the extent of data starting with the first data block and the same size as the file is does not match the file content when the extent of data is directly read from disk.

Now this definition is about recoverability. From data recovery standpoint, the file is not fragmented if it can be recovered without having any metadata, by finding its header and reading an extent of appropriate size from the disk, starting with the header.

If the file is sparse, you cannot read it by just taking the required amount of disk data starting from the file header. You must insert some unspecified amount of zeros between start and end of the file.

Also if the metadata is embedded between file content, as is often the case with ReFS, the recovered file will contain ReFS page tables in it, rendering the file useless.

Sunday, 11 January 2015

X-RAID2 with two drives

Just thought I would clarify one misconception about X-RAID2.

If there are two identical drives, then the array is in fact RAID1.

No, not exactly. The array may be RAID1, but there are cases when there are two (or more) RAID1s, combined by LVM. This happens when a two-disk set is upgraded by replacing two drives with larger ones. If the array never had it's drives replaced, it is indeed a single RAID1.