Monday, 19 January 2015

Two different definitions of fragmentation

There are two distinct definitions of file fragmentation on a filesystem.

1. The file is fragmented if reading the entire file requires a head seek after the first data block has been read. 

This definition mostly concerns read performance on rotational  drives. Head seek is slow, so it is something you want to avoid.

If the file is sparse, i.e. has large area full of zeros in it, then the filesystem will not store zeros. Instead, only non-zero start and end of file are stored. From the performance standpoint, that's fine. Filesystem driver will read the first part of the data, then will generate the amount of zeros as required, and continue reading the last part of data without ever making a head seek.

If the file contains some metadata (i.e. ReFS page table) embedded in content, that is also fine from performance standpoint. The ReFS driver will read the file data, and at some point the page table will be needed. It conveniently happens that a page table occupies the next cluster after file data, and the driver will read the page table, analyze it, and continue reading file data.

2. The file is fragmented if the extent of data starting with the first data block and the same size as the file is does not match the file content when the extent of data is directly read from disk.

Now this definition is about recoverability. From data recovery standpoint, the file is not fragmented if it can be recovered without having any metadata, by finding its header and reading an extent of appropriate size from the disk, starting with the header.

If the file is sparse, you cannot read it by just taking the required amount of disk data starting from the file header. You must insert some unspecified amount of zeros between start and end of the file.

Also if the metadata is embedded between file content, as is often the case with ReFS, the recovered file will contain ReFS page tables in it, rendering the file useless.

Sunday, 11 January 2015

X-RAID2 with two drives

Just thought I would clarify one misconception about X-RAID2.

If there are two identical drives, then the array is in fact RAID1.

No, not exactly. The array may be RAID1, but there are cases when there are two (or more) RAID1s, combined by LVM. This happens when a two-disk set is upgraded by replacing two drives with larger ones. If the array never had it's drives replaced, it is indeed a single RAID1.

Monday, 5 January 2015

NAS recovery training

What we see in the support requests is that people have difficulties recovering their (or customers') NASes. For the duration of December 2014, more than half support queries associated with ReclaiMe software were in some way related to a NAS. So we decided to put off filesystems for a while and do a solid course on NAS recovery, starting with initial data collection, through partitioning schemes like FlexRAID or SHR (Synology Hybrid RAID), and to file extraction. This should be available on our training site eventually (hopefully by the end of February).

Sunday, 4 January 2015

EXT3 undelete

As you probably know, there are three basic variants of EXT filesystem versus undelete, depending on whether extents and journaling are used.
  • no extents and no journaling - EXT2; 
  • journaling only, but no extents - EXT3;
  • both journaling and extents - EXT4. 
Journaling for whatever fancy filesystem development reasons requires inodes to be cleared (overwritten with zeros) when files are deleted.

However, we now have an undelete capability for EXT3 (where extents are not used) in our ReclaiMe Pro software. The undelete requires the full scan of the disk to work, and also file names cannot be recovered (because inodes are gone), but otherwise works fairly decent.

Saturday, 4 October 2014

Entropy analysis

This one is the most perfect example of entropy analysis I've seen done with ReclaiMe Pro to date


This one features three distinct frequencies. Think of entropy analyzer as of oscilloscope, giving you wavelength (inverse frequency) and phase. Block size is wavelength and block start offset is phase.

So there are three distinct wavelengths (block sizes)
  1. 16KB block of parity,
  2. 4KB filesystem (NTFS) cluster
  3. 1KB NTFS MFT entry size