Storage Space Efficiency in Avro and HBase

Categories: BigData

I recently had a customer who suggested (for various reasons) storing large amounts of write-once data in HBase, using an (implicit) schema with long and complicated column names. I had immediate concerns about efficient use of disk storage with this approach (these were quite large amounts of data). Various sites warn about long column-names with HBase, but I could not find any actual statistics on it. A colleague and I therefore measured the efficiency of HBase with various column name lengths, and compared it to Avro.