Archive for January, 2009

ZFS and the uberblock

January 6, 2009

Inspired by Constantin‘s comment on USB sticks wearing out Matthias‘s blog entry about an eco-friendly home server, I tried to find out more about how and how often the ZFS uberblock is written.

Using DTrace, it’s not that difficult:

We start by finding out which DTrace probes exist for the uberblock:

$ dtrace -l | grep -i uberblock
31726        fbt               zfs            vdev_uberblock_compare entry
31727        fbt               zfs            vdev_uberblock_compare return
31728        fbt               zfs          vdev_uberblock_load_done entry
31729        fbt               zfs          vdev_uberblock_load_done return
31730        fbt               zfs          vdev_uberblock_sync_done entry
31731        fbt               zfs          vdev_uberblock_sync_done return
31732        fbt               zfs               vdev_uberblock_sync entry
31733        fbt               zfs               vdev_uberblock_sync return
34304        fbt               zfs          vdev_uberblock_sync_list entry
34305        fbt               zfs          vdev_uberblock_sync_list return
34404        fbt               zfs                  uberblock_update entry
34405        fbt               zfs                  uberblock_update return
34408        fbt               zfs                  uberblock_verify entry
34409        fbt               zfs                  uberblock_verify return
34416        fbt               zfs               vdev_uberblock_load entry
34417        fbt               zfs               vdev_uberblock_load return

So there are two probes on uberblock_update: fbt:zfs:uberblock_update:entry and fbt:zfs:uberblock_update:return!

Now we can find out more about it by searching the OpenSolaris sources: When searching for definition of uberblock_update in project onnv, we find one hit for line 49 in file uberblock.c, and when clicking on it, we see:

source extract: line 49 of file uberblock.c

Now, when searching again for the definitions of the first two arguments (args[0| and args[1|) of uberblock_update (which is uberblock and vdev), we get:

For uberblock, the following hits are shown:

When clicking on the link on the definition of struct uberblock (around line 53 in file uberblock_impl.h), we get:

For the members of struct vdev, it’s not that easy. First, we get a long hit list when searching for the definition of vdev in the source browser. But if we search for "struct vdev" in that list, using the browser’s search function, we get:

When clicking on the definition of struct vdev (around line 108 in file vdev_impl.h), we can see all the members of this structure.

Here are all the links, plus one more for struct blkprt (a member of struct uberblock), again in one place:

Now we are prepared to access the data via DTrace, by printing the arguments and members as in the following example:

printf ("%d %d %d", args[0]->ub_timestamp, args[1]->vdev_id, args[2]);

So a sample final DTrace script to print out as much information in the event of an uberblock_update as we can, and also print out any relevant I/O (hoping that from showing both at the same time, we can see where and how often the uberblocks are written):

io:genunix:default_physio:start,
io:genunix:bdev_strategy:start,
io:genunix:biodone:done
{
printf ("%d %s %d %d", timestamp, execname,
args[0]->b_blkno, args[0]->b_bcount);
}
fbt:zfs:uberblock_update:entry
{
printf ("%d %s, %d, %d, %d, %d", timestamp, execname,
pid, args[0]->ub_rootbp.blk_prop, args[1]->vdev_asize, args[2]);
}

The lines for showing the I/O are derived from DTrace scripts for I/O analysis in the DTrace Toolkit

Although I was unable to print out members of struct vdev (the second argument to uberblock_update() ) with the fbt:zfs:uberblock_update:entry probe (I also tried fbt:zfs:uberblock_update:return but had other problems with that one), the results when running this script, using
$ dtrace -s zfs-uberblock-report-02.d
, are quite interesting. Here’s an extract (long lines shortened):

  0  33280  uberblock_update:entry 102523281435514 sched, 0, 922..345, 0, 21005
0   5510     bdev_strategy:start 102523490757174 sched 282 1024
0   5510     bdev_strategy:start 102523490840779 sched 794 1024
0   5510     bdev_strategy:start 102523490873844 sched 18493722 1024
0   5510     bdev_strategy:start 102523490903928 sched 18494234 1024
0   5498            biodone:done 102523491215729 sched 282 1024
0   5498            biodone:done 102523491576878 sched 794 1024
0   5498            biodone:done 102523491873015 sched 18493722 1024
0   5498            biodone:done 102523492232464 sched 18494234 1024
...
0  33280  uberblock_update:entry 102553280316974 sched, 0, 922..345, 0, 21006
0   5510     bdev_strategy:start 102553910907205 sched 284 1024
0   5510     bdev_strategy:start 102553910989248 sched 796 1024
0   5510     bdev_strategy:start 102553911022603 sched 18493724 1024
0   5510     bdev_strategy:start 102553911052733 sched 18494236 1024
0   5498            biodone:done 102553911344640 sched 284 1024
0   5498            biodone:done 102553911623733 sched 796 1024
0   5498            biodone:done 102553911981236 sched 18493724 1024
0   5498            biodone:done 102553912250614 sched 18494236 1024
...
0  33280  uberblock_update:entry 102583279275573 sched, 0, 922..345, 0, 21007
0   5510     bdev_strategy:start 102583540376459 sched 286 1024
0   5510     bdev_strategy:start 102583540459265 sched 798 1024
0   5510     bdev_strategy:start 102583540492968 sched 18493726 1024
0   5510     bdev_strategy:start 102583540522840 sched 18494238 1024
0   5498            biodone:done 102583540814677 sched 286 1024
0   5498            biodone:done 102583541091636 sched 798 1024
0   5498            biodone:done 102583541406962 sched 18493726 1024
0   5498            biodone:done 102583541743494 sched 18494238 1024

Using the following (n)awk one-liners:

$ nawk '/uberblock/{print}}' zfs-ub-report-02.d.out
$ nawk '/uberblock/{a=0}{a++;if ((a==2)){print}}' zfs-ub-report-02.d.out
$ nawk '/uberblock/{a=0}{a++;if ((a>=1)&&(a<=5)){print}}' zfs-ub-report-02.d.out

, we can print:

  • only the uberblock_update lines, or
  • just the next line after the line that matches the uberblock_update entry, or
  • all 4 lines after that entry, including the entry itself.

When running the script for a while and capturing its output, we can later analyze at which block number the first block after uberblock_update() is written, and we can see that the numbers are always even, the lowest number is 256 and the highest number is 510, with a block size of 1024. Those block numbers always go from 256, 258, 260, and so forth, until they reach 510. Then, they start with 256 again. So every (510-256)/2+1 = 128th iteration (yes, it’s one more, as we have to include the first element after subtracting the first from the last element), the first block is overwritten again. The same is true for blocks 768…1022, 18493696…18493950 and 18494208…18494462 (the third and fourth block ranges should be different for different zpool sizes).

Now that we understand how and in which order the uberblocks are written, we are prepared to examine after how many days the uberblock area of a USB stick without wear leveling would probably be worn out. More on that and how we can use zdb for that, in my next blog entry.

Some more links on this topic:

A compact primer on RBAC

January 5, 2009

A nice and compact primer on RBAC (Role-Based Access Control), with some examples for a quick start, is available here.

About screencasts…

January 4, 2009

Read about Sarah’s findings after viewing user testing screen casts. Her blog is full of examples and comments on application usability and web design, so I’ll add it to my link list.

SAP’s co-founder Hasso Plattner about the economy, the environment, and more

January 3, 2009

SAP’s co-founder Hasso Plattner, now chairman of the supervisory board, recently had an interview with Der Spiegel, the well-known German news magazine.

It’s well worth reading, either in the original German version or in the English translation.

The last paragraphs, about lessons learned from the financial crisis, reminded me of finishing reading Jared Diamond‘s book Collapse: How Societies Choose to Fail or Succeed, one of the most interesting books I ever read. Good book for reminding you to not always follow the mainstream (whatever kind or on which side it is) and to ask yourself from time to time which news are currently not discussed in the media (without falling into the conspiracy theories’ trap). Toyota’s Five Whys may be a guideline.

Amazing: A General Motors EV1 in Altlussheim/Germany!

January 3, 2009

Having watched the movie Who killed the Electric Car?, I never thought I’d ever come across one of the remaining EV1‘s. And I was quite surprised when I some weeks ago read about one being displayed in a distance of just 20 km from our Sun office in Walldorf, in the small but nice transportation museum Museum Autovision in Altlußheim! With their collection of electric cars and various incarnations of Wankel engines, it’s definitely worth a visit!

For some time, they showed the 1 litre car from Volkswagen. Looks like they had to give it back. But with the GM EV1, they have another groundbreaking car on display now: A car which of which more than 1000 have been produced and which does not create any exhaust gas (at least not after it has left the production plant, and in case it is charged using renewable energy). Most of these cars have been destroyed by GM – only few are left, one of which is in Altlußheim now.

Among the other cars on display are:

Here are some pictures I took:

Front view. Notice the small flap which covers the charge port (right above the GM logo).



The engine compartment. The big part is the converter.



The rear wheels are partially covered, for low resistance air flow.



Rear view



Yes, it’s the real EV1!



And yes, it’s from General Motors!



The EV1’s shape is still modern.



The interior looks quite sporty.


Pictures from some of the other cars:

The Audi Duo, another seminal hybrid car



The Toyota Prius 1, the first hybrid car from Toyota. The brick on the floor, next to the car’s left rear wheel, is the battery pack.



The engine compartment of the Toyota Prius 1



The Honda Insight, a hybrid car from Honda

ZFS on external disks: Why free disk space might not increase when deleting files…

January 2, 2009

Interesting: After copying about 30,000 files from an internal disk (OpenSolaris 2008.11) to a new directory of an external USB disk and later removing a similar number of files, the free disk space as reported by zpool status or in the column “Available” of df -k did not increase! And the df -k output showed that the number of total blocks of the only file system on that disk had decreased! What had happened?

Here’s the explanation:

For backup purposes, I wanted to use an external 2.5″ USB disk with a capacity of about 186GB. On that USB disk, I had created a zpool, using:
$ zpool create -f dpool-2 c5t0d0

For a first backup, I copied the mentioned 30,000 files with a total size of about 75GB from one system to a USB disk connected to another server (running OpenSolaris 2008.11 as well) via the network. I used a simple scp -pr which preserves modes and times but not uids and gids. One possible procedude for preserving file ownership as well, as mentioned here, for example, would be:
cd source_dir; tar -cf - .|ssh user@targethost "cd target_dir; tar -xvf -".

Anyway, I thought it would be a good idea to export that USB disk and import it on another server (also on OpenSolaris 2008.11) to check if there were any problems. I also wanted to run a
$ zpool scrub dpool-2
to verify the data integrity. Everything went smoothly.

OK. So I exported the zpool again, connected the disk to the server on which the original data is located, imported the zpool and started copying the same data which I had copied before (via the network) locally, using
$ tar -cf - | ( cd /dpool-2; tar -xpf - )

As this finished without errors as well, and a comparison using /usr/bin/diff or the default /usr/gnu/bin/diff only showed a problem with one of the files that had an interesting character in its file name (the problem went away after changing LANG from en_US.UTF-8 to C), I decided to remove the old 30,000 files (the ones which I had copied via the network). I had removed files on ZFS before, using rm -rf, and it always was amazingly fast, compared to PCFS or UFS. But this time, removing those files took quite a while, with some disk activity.

After all the “old” files were removed, I discovered the following:

  • Free space as in df -k or zpool status dropped from 112GB to 36GB!
  • Used space as in zpool status increased from 75GB to 150GB!
  • df -k reported a total size of the only file system on that zpool of just 108GB, compared to 183GB as it was reported before!

The solution was to execute the following command:
$ zfs list -t snapshot
The output showed several snapshots, taken during the time the disk was connected to the second server (on which I had set up time slider for creating automated snapshots). When I connected the disk to the final server, the snapshots were still there, and when removing files, the snapshots were still valid so it would have been possible to create the deleted files again. So the easy solution for recreating the free space again was just to remove all those snapshots, using zfs destroy, as in the following example:
$ zfs destroy dpool-2@zfs-auto-snap:frequent-2009-01-02-18:45
After the final snapshot for dpool-2 was removed, things showed up in df -k and zpool status as expected. “Problem” solved!