Question:- Define cell in HBase?
Answer:- The cell is the smallest unit of HBase table which stores the data in the form of a tuple.
Question:- Define compaction in HBase?
Answer:- Compaction is a process which is used to merge the Hfiles into the one file and after the merging file is created and then old file is deleted. There are different types of tombstone markers which make cells invisible and these tombstone markers are deleted during compaction. Become Master of Apache HBase by going through this online HBase Course.
Question:- What is the use of HColumnDescriptor class?
Answer:- HColumnDescriptor stores the information about a column family like compression settings , Number of versions etc.
Question:- What is the function of HMaster?
Answer:- It is a MasterServer which is responsible for monitoring all regionserver instances in a cluster.
Question:- How many compaction types are in HBase?
Answer:- There are two types of Compaction i.e. Minor Compaction and Major Compaction.
Question:- Define HRegionServer in HBase
Answer:- It is a RegionServer implementation which is responsible for managing and serving regions.
Question:- Which filter accepts the pagesize as the parameter in HBase?
Answer:- PageFilter accepts the pagesize as the parameter.
Question:- Which method is used to access HFile directly without using HBase?
Answer:- HFile.main() method used to access HFile directly without using HBase.
Question:- Which type of data HBase can store?
Answer:- HBase can store any type of data that can be converted into the bytes.
Question:- What is the use of Apache HBase?
Answer:- Apache HBase is used when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
Question:- What are the features of Apache HBase?
Answer:- • Linear and modular scalability. • Strictly consistent reads and writes. • Automatic and configurable sharding of tables • Automatic failover support between RegionServers. • Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables. • Easy to use Java API for client access. • Block cache and Bloom Filters for real-time queries. • Query predicate push down via server side Filters • Thrift gateway and an REST-ful Web service that supports XML, Protobuf, and binary data encoding options • Extensible JRuby-based (JIRB) shell • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
Question:- How do I upgrade Maven-managed projects from HBase 0.94 to HBase 0.96+?
Answer:- In HBase 0.96, the project moved to a modular structure. Adjust your project’s dependencies to rely upon the HBase-client module or another module as appropriate, rather than a single JAR. You can model your Maven depency after one of the following, depending on your targeted version of HBase. See Section 3.5, “Upgrading from 0.94.x to 0.96.x” or Section 3.3, “Upgrading from 0.96.x to 0.98.x” for more information. • Maven Dependency for HBase 0.98 org.apache.hbase hbase-client 0.98.5-hadoop2 • Maven Dependency for HBase 0.96 org.apache.hbase hbase-client 0.96.2-hadoop2 • Maven Dependency for HBase 0.94 org.apache.hbase hbase 0.94.3
Question:- What is the Hierarchy of Tables in Apache HBase?
Answer:- The hierarchy for tables in HBase is as follows: Tables >> Column Families >> Rows Columns >> Cells When a table is created, one or more column families are defined as high-level categories for storing data corresponding to an entry in the table. As is suggested by HBase being “column-oriented”, column family data for all table entries, or rows, are stored together. For a given (row, column family) combination, multiple columns can be written at the time the data is written. Therefore, two rows in an HBase table need not necessarily share the same columns, only column families. For each (row, column-family, column) combination HBase can store multiple cells, with each cell associated with a version, or timestamp corresponding to when the data was written. HBase clients can choose to only read the most recent version of a given cell, or read all versions.
Question:- How can I troubleshoot my HBase cluster?
Answer:- Always start with the master log (TODO: Which lines?). Normally it’s just printing the same lines over and over again. If not, then there’s an issue. Google or search-hadoop.com should return some hits for those exceptions you’re seeing. An error rarely comes alone in Apache HBase, usually when something gets screwed up what will follow may be hundreds of exceptions and stack traces coming from all over the place. The best way to approach this type of problem is to walk the log up to where it all began, for example, one trick with RegionServers is that they will print some metrics when aborting so grapping for Dump should get you around the start of the problem. RegionServer suicides are ‘normal’, as this is what they do when something goes wrong. For example, if ulimit and max transfer threads (the two most important initial settings, see [ulimit] and dfs.datanode.max.transfer.threads) aren’t changed, it will make it impossible at some point for DataNodes to create new threads that from the HBase point of view is seen as if HDFS was gone. Think about what would happen if your MySQL database was suddenly unable to access files on your local file system, well it’s the same with HBase and HDFS. Another very common reason to see RegionServers committing seppuku is when they enter prolonged garbage collection pauses that last longer than the default ZooKeeper session timeout. For more information on GC pauses, see the 3 part blog post by Todd Lipcon and Long GC pauses above.
