jeffr_tech ([info]jeffr_tech) wrote,

What's in a journal anyway?

In this post I'll detail the contents of the journal and the recovery operation. Since we know that softupdates only leaves two inconsistencies, leaked inodes and leaked blocks, we only have to journal conditions which can create these circumstances. In truth we have to track all link count changes to an inode since they can have multiple named references via hardlinks. Blocks are somewhat simpler although ffs fragments complicate them considerably. At recovery time we verify whether links or pointers to blocks exist and use this information to free them if necessary. There are only 4 journal records (add ref, rem ref, new block, free block) and each is only 32bytes. This is effectively an intent log, there is no copy of the metadata in each record. Sounds simple no?


In the addref/remref case the journal record contains the inode number, directory inode number, 64bit offset within the directory for the new/removed link, and the link count before the adjustment. At recovery time when we find one of these records we verify whether the path exists or not and adjust the link appropriately. The path may not exist if the parent inode doesn't point at the directory block that this filename occupied, or if the directory write didn't happen in time, or any number of other scenarios. If we adjust a link down to 0 we free the inode and any blocks it owns, and if it is a directory, we recursively decrement the link counts of any children also potentially freeing them. This only happens when you crash immediately after adding a tree of files as with tar, etc. The directory offset tells us the exact place this should exist, we don't need to know the actual name, and this is how we handle multiple links to the same inode within the same directory. The recovery operation actual finds all valid journal records for each inode and sorts them in a list to remove duplicates before operating on an inode so we know if a name was added and immediately removed or if it was added twice we should not adjust the link twice, etc.

For adding and removing blocks we record the inode, logical block number (like a file offset), and disk block address. The lbn may be negative to indicate indirect blocks, which are blocks that hold pointers to data blocks for large files. If we discover that a block does not exist at the indicated lbn we may recursively free indirect block children. This allows us to truncate huge files with a very small number of journal entries, no more than 15 which is the number of direct and indirect block pointers contained in an inode. There is an additional test to be certain that the freed block was not allocated to a new file after this record was written.

In ffs, the filesystem is partitioned into 'cylinder groups' which partition the data blocks and inodes for locality. Each of these CGs has summary information describing where there are fragments, large clusters of available blocks, how many inodes are free, etc. Some of this summary information is copied into the superblock so that we can quickly find a CG with free blocks, inodes, etc. As a final stage in the recovery operation any CG that was modified recomputes its summary information and updates the superblock copy.

So how fast is it? In my tests so far it looks like less than 2 seconds per megabyte of journal in use. A megabyte of journal space describes 32,768 filesystem operations! Even on a machine with a very large amount of memory it's unlikely that you could have more than a few hundred thousand operations outstanding. So this is really quite acceptable. Furthermore the recovery operation is currently generating a text log of every decision it makes that is several times the size of the binary log. Once disabled will probably halve the recovery time again.

  • Post a new comment

    Error

    Your reply will be screened

    Your IP address will be recorded 

  • 12 comments

Anonymous

December 25 2009, 20:41:54 UTC 2 years ago

Journal Checksumming

This is absolutely awesome: UFS will be the most intelligent filesystem out there!

Since you don't want to be changing the journal format later on, you might want to consider journal checksumming, like ext4 does. This would be advantageous for reliability and maybe errors can be reported somehow when using UFS in a ZFS pool.

[info]jeffr_tech

December 27 2009, 06:32:03 UTC 2 years ago

Re: Journal Checksumming

I certainly wouldn't claim it was the most intelligent filesystem. I do think it's an interesting variation on classic consistency mechanisms.

I already have a checksum on the journal. And changing the journal format isn't too problematic so long as you make sure the filesystem is clean before switching implementations.

Anonymous

December 27 2009, 19:02:08 UTC 2 years ago

Re: Journal Checksumming

Your definition of intelligence can vary: I think it's intelligent in that it avoids doing stupid things. As a filesystem the SU+J approach you've made has the know-how centered in it's own functionality: writing consistently to the disk.

I guess other filesystems have more specific functionality for SSDs or other uses.

The idea of checksumming the journal is pretty basic and it saves us from attempting to restore an inconsistent journal: I am glad you did it already. I guess we could checksum more information from the filesystem as long as we don't affect performance or break compatibility, but that's a topic for future research.

Anonymous

December 29 2009, 19:04:58 UTC 2 years ago

UFS...

I think UFS will be the best filsystem too !

[info]rwatsoff

January 8 2010, 15:14:11 UTC 2 years ago

Consistency and snapshots

Since you're explaining consistency mechanisms for UFS, could you say a little about how the current sync/SU models protect the consistency of the copy-on-write snapshots supported by UFS? Does the advent of journaling change things at all there, and will they remain fully supported in the new world order even though bgfsck will no longer be required?

[info]jeffr_tech

January 9 2010, 02:42:30 UTC 2 years ago

Re: Consistency and snapshots

First off Robert allow me to commend you on a fantastic user icon.

For now background fsck will be incompatible with softdep journaling. It would be possible to adapt the checker to work in a background mode but I hope that it is fast enough there will never be a desire to do so. So snapshots will no longer be used for consistency checking.

Snapshots are conveniently regular files with regard to allocation and truncation, so that needs no extra effort. However, there are some operations when creating the snapshot which are presently not journaled that I will attend to before committing to current. They may only affect the copy filesystem and not real allocation information so they may not actually require any work but I need to review it carefully.

[info]rwatsoff

January 9 2010, 13:12:59 UTC 2 years ago

Re: Consistency and snapshots

I'm afraid that sheep, from near Morpeth, England, was a bit grumpy when I started taking its photo, and a few shots later had not changed its ways!

I assumed as much for background fsck, and to be honest, I suspect no one will be sad to see it go. However, snapshots are in fact quite a useful facility, and I had wondered about the integration between soft updates and copy-on-write. I assume our copy-on-write allocates a new block for the snapshot file, and copies the data in, rather than doing anything too fancy (which would be problematic for inode tables, increase fragmentation, and cause atomicity problems). However, does SU(+J) provide guarantees about the order of the writes such that the snapshot definitely has a stable copy of the original block on disk before the new version of the data block on the live file system goes to disk? I.e., can a snapshot in fact change in the presence of an untimely power loss, as it continues to reference a block that has been updated by the live file system? (All this asked as someone with no knowledge of the implementation!)

Thanks

[info]gofli

January 25 2011, 03:04:22 UTC 1 year ago

Christian Louboutin

discount Ed hardy on sale
tall Ed hardy clothing classic
prefect cheap Ed hardy gift

cheap Louboutin is very good
classic Christian Louboutin UK is very Comfortable
Beautiful Louboutin Shoes UK on sale

very good Ed hardy is very cheep
our Ed hardy UK is Good quality
cheap Ed Hardy Clothing is very good


boots Louboutin is Good quality
classic Christian Louboutin UK is very Comfortable
very classic Louboutin Shoes UK is very good


nice Ralph Lauren Polo is very Comfortable
cute Polo Ralph Lauren nice
discount Ralph Lauren on sale
our Polo Shirts is Good quality
cheap Ralph Lauren Online is very good

cheap christian louboutin is very good
very good christian louboutin sale uk is very cheep
our discount louboutins is Good quality

very good ralph lauren polo is very cheep
our polo ralph Lauren is Good quality
cheap ralph Lauren is very good
classic polo shirts is very Comfortable
Beautiful ralph lauren online on sale

boots UGG Boots UK is Good quality
classic ugg boots sale is very Comfortable
very classic cheap ugg boots is very good
very cardy ugg boots is very Beautiful

nice chaussures nike is very Comfortable
cute nike air max nice
discount air max on sale
tall nike tn classic
prefect nike shox gift

nice chaussures nike is very Comfortable
cute nike air max nice
discount air max on sale
tall nike tn classic
prefect nike shox gift

our Chaussures Nike is very good
cheap Air max is Good quality
cheap Nike Air max is very good
classic Nike Tn is very Comfortable
Beautiful Nike Shox on sale

cheap Chaussures Nike
our Air max is Good quality
classic Nike Air max is very Comfortable
Beautiful Nike Tn on sale
cheap Nike Shox is very good

cheap cheap louboutin store is very good
very good louboutin uk is very cheep
our christian louboutin uk is Good quality

[info]biletchi

January 26 2011, 20:21:51 UTC 1 year ago

en ucuz uçak bileti sorgula rezervasyonlariniz için..

tüm şehirlere tatil otelleri ve rezervasyonları

[info]joemalan

February 9 2011, 09:24:17 UTC 1 year ago

rayban cool glasses


• Great article, it's helpful to me, and I also like the useful info about ray ban.
• Your do have some unique ideas here and I expect more
ray ban
sunglasses
articles from you.
• I like your ideas about
ray ban glasses
and I hope in the future there
can be more bright articles like this from you.
• You have given us some interesting points on
ray ban sunglasses
sale
. This is a wonderful article and surely worth reading.
• We share the opinion on
ray ban eyeglasses
and I really enjoy reading
your article.
• This is the best
ray ban aviator article I have ever found on the Internet.
• I greatly benefit from your articles every time I read one.
Thanks for the
ray ban aviator sunglasses
info, it helps a lot.
• I appreciate your bright ideas in this
ray ban wayfarer sunglasses article. Great work!
• This
ray ban warrior sunglasses article is definitely eye-opening and inspiring.
• Thank you so much for sharing some great ideas of
ray ban
polarized
with us, they are helpful.
• Good job for writing this brilliant article of
ray ban polarized sunglasses.
• Great resources of
ray ban 3025! Thank you for sharing this with us.
• I love this ray ban
2140
article since it is one of those which truly
convey useful ideas.






We share the opinion on designer reading glasses and I really enjoy reading your article.

I love this eyeglasses shopping article since it is one of those which truly convey useful ideas.

I like your ideas about awesome eyeglasses and I hope in the future there can be more bright articles like this from you.

I appreciate your bright ideas in this coolest sunglasses article. Great work!

Great resources of fashion eyeywear! Thank you for sharing this with us.

I am glad to read some fantastic cheap progressive glasses article like this.

It has been long before I can find some useful articles about sunglasses factory. Your views truly open my mind.

This bifocal reading glasses article is definitely eye-opening and inspiring.

This is the best sunglasses article I have ever found on the Internet.

I totally agree with you on the point of glasses. This is a nice article for sure.

I greatly benefit from your articles every time I read one. Thanks for the ray ban info, it helps a lot.

Good job for writing this brilliant article of ray ban sunglasses.

Bright idea, hope there can be more useful articles about iphone accessories.


[info]ahmet_sansar

March 26 2011, 17:17:26 UTC 1 year ago

cool

Mucize hap altın çilek ile kilo verin.
Ağrılarınız mı var ? orjin krem ile ağrılara son verin.
Kilo vermek zor mu altın çilek form seti ile çok kolay.
Sigaradan bıktım leş gibi kokuyoruz ya tütüne son ile buna son ver.

[info]jreew

April 12 2011, 03:22:55 UTC 1 year ago

glasses

You have given us some interesting points on Plastic eyeglasses.
I really like this Plastic Glasses Sale article, and hope there can be more great resources like this.
I totally agree with you on the point of Plastic glasses. This is a nice article for sure.
Good job for writing this brilliant article of women eyeglasses.
Great article, it's helpful to me, and I also like the useful info about women glasses.
I appreciate your bright ideas in this bifocal eyeglasses article. Great work!
Great resources of progressive eyeglasses! Thank you for sharing this with us.
Create an Account
Forgot your login or password?
Facebook Twitter More login options
English • Español • Deutsch • Русский…