Th30z (Matteo Bertozzi Code): HBase I/O: HFile

Wednesday, February 9, 2011

HBase I/O: HFile

In the beginning HBase uses MapFile class to store data persistently to disk, and then (from version 0.20) a new file format is introduced. HFile is a specific implementation of MapFile with HBase related features.

HFile doesn't know anything about key and value struct/type (row key, qualifier, family, timestamp, …). As Hadoop' SequenceFile (Block-Compressed), keys and values are grouped in blocks, and blocks contains records. Each record has two Int Values that contains Key Length and Value Length followed by key and value byte-array.

HFile.Writer has only a couple of append overload methods, one for KeyValue class and the other for byte-array type. As for SequenceFile, each key added must be greater than the previous one. If this condition is not satisfied an IOException() is raised.

By default each 64k of data (key + value) records are squeezed together in a block and the block is written to the HFile OutputStream with the specified compression, if specified. Compression Algorithm and Block size are both (long)constructor arguments.

One thing that SequenceFile is not good at, is adding Metadata. Metadata can be added to SequenceFile just from the constructor, so you need to prepare all your metadata before creating the Writer.

HFile adds two "metadata" type. One called Meta-Block and the other called FileInfo. Both metadata types are kept in memory until close() is called.

Meta-Block is designed to keep large amount of data and its key is a String, while FileInfo is a simple Map and is preferred for small information and keys and values are both byte-array. Region-server' StoreFile uses Meta-Blocks to store a BloomFilter, and FileInfo for Max SequenceId, Major compaction key and Timerange info.

On close(), Meta-Blocks and FileInfo is written to the OutputStream. To speedup lookups an Index is written for Data-Blocks and Meta-Blocks, Those indices contains n records (where n is the number of blocks) with block information (block offset, size and first key).
At the end a Fixed File Trailer is written, this block contains offsets and counts for all the HFile Indices, HFile Version, Compression Codec and other few information.

Once the file is written, the next step is reading it. You've to start by loading FileInfo, the loadFileInfo() of HFile.Reader loads in memory the Trailer-block and all the indices, that allows to easily query keys. Through the HFileScanner you can seek to a specified key, and iterate over.

The picture above, describe the internal format of HFile...

6 comments:

Lars GeorgeFebruary 20, 2011 at 11:27 PM
Hi Matteo,

Great post! One small note: the meta blocks are optional and you could have a HFile that does not have any of them.

Cheers,
Lars
ReplyDelete
Replies
UnknownApril 22, 2012 at 1:32 PM
i am having hard time understanding the value of a data block. What would HBase lose if concept of block is removed?
You could compress whole file (in fact compression would be better), you could build an index for whole file. You can make file available by replicating it.
ReplyDelete
Replies
UnknownDecember 20, 2013 at 7:37 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownDecember 20, 2013 at 7:45 PM
Those indices contains n records (where n is the number of blocks) with block information (block offset, size and first key).
+++
the first key that is stored is the rowkey or rowkey+cf+column+timestamp key?

The reason I ask is there could be cases when a row could span multiple data blocks. Having the second type of key as the 'first key' could speed up query(if we query for row+column value) right?

Looks like the 'start key' that is stored is rowkey as that could influence flat-wide vs tall-narrow table design choice.
ReplyDelete
Replies
JadimansinagaJuly 20, 2018 at 2:20 AM
Thank you created this article in wait next article alternatif cmd368
ReplyDelete
Replies

Add comment

Pages

Wednesday, February 9, 2011

HBase I/O: HFile

6 comments: