They help define a set of shared features to the group of columns that are a part of the family.
Naming convention for HBase column families is family:name
For example, a column Science which belongs to column family Subject will be represented as subject:science
Column families need to be created upfront, while columns can be added at any later point of time.
Column families help storing data over multiple locations.
Physically, all column families are stored together on one device.
Creation using Command Line:
Using command line, we can declare a column family as:
CREATE ‘subjects’ ‘commerce’ ‘science’
where the syntax is
CREATE ‘tablename’ ‘colfamily1’ ‘colfamilyn’
Adding columns along with creating columnfamilies can be done as follows:
CREATE ‘subjects’ ‘commerce:accounts’ ‘science:maths’
Advantages:
- It is easier to tune and manage storage at the column family level.
- Column families have the same access pattern and characteristics.
Disadvantages:
- An important point to be considered is that performance degrades with increased number of column families.
- Data management becomes an overhead if the number of column families is more.
Leave a Reply