Question:- What is Super Column in Cassandra?
Answer:- Cassandra Super Column is a unique element consisting of similar collections of data. They are actually key–value pairs with values as columns. It is a sorted array of columns, and they follow a hierarchy when in action: keystore > column family > super column > column data structure in JSON. Similar to the row keys, super column data entries contain no independent values but are used to collect other columns. It is interesting to note that super column keys appearing in different rows do not necessarily match and will not ever.
Question:- Define the consistency levels for read operations in Cassandra.
Answer:- • ALL: Highly consistent. A write must be written to a commitlog and a memtable on all replica nodes in the cluster. • EACH_QUORUM: A write must be written to a commitlog and a memtable on quorum of replica nodes in all data centers. • LOCAL_QUORUM: A write must be written to a commitlog and a memtable on quorum of replica nodes in the same center. • ONE: A write must be written to a commitlog and a memtable of at least one replica node. • TWO, Three: Same as One but with at least two and three replica nodes, respectively • LOCAL_ONE: A write must be written for at least one replica node in the local data center. • ANY • SERIAL: Linearizable consistency to prevent unconditional update • LOCAL_SERIAL: Same as serial but restricted to a local data center
Question:- What is the difference between Column and Super Column?
Answer:- Both elements work on the principle of tuples having name and value. However, the former’s value is a string, while the value of the latter is a map of columns with different data types. Unlike Columns, Super Columns do not contain the third component of timestamp.
Question:- What is Column Family?
Answer:- As the name suggests, a column family refers to a structure having an infinite number of rows. Those are referred by a key–value pair, where the key is the name of the column and the value represents the column data. It is much similar to a hashmap in Java or a dictionary in Python. Rememeber, the rows are not limited to a predefined list of columns here. Also, the column family is absolutely flexible with one row having 100 columns while the other having only 2 columns.
Question:- Define the use of the source command in Cassandra.
Answer:- Source command is used to execute a file consisting of CQL statements.
Question:- What is Thrift?
Answer:- Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.
Question:- Explain Tombstone in Cassandra.
Answer:- Tombstone is a row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassandra supports eventual consistency, where the data must respond before any successful operation.
Question:- On what platforms does Cassandra run?
Answer:- Since Cassandra is a Java application, it can successfully run on any Java-driven platform or on Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on Red Hat, CentOS, Debian, and Ubuntu Linux platforms.
Question:- Since Cassandra is a Java application, it can successfully run on any Java-driven platform or on Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on Red Hat, CentOS, Debian, and Ubuntu Linux platforms.
Answer:- The default settings state that Cassandra uses 7000 port for Cluster Management, 9160 for Thrift Clients, and 8080 for JMX. These are all TCP ports and can be edited in the configuration file: bin/cassandra.in.sh
Question:- Can you add or remove column families in a working cluster?
Answer:- Yes, but while doing that we need to keep in mind the following processes: • Do not forget to clear the commitlog with ‘nodetool drain’ • Turn off Cassandra to ensure that there is no data left in the commitlog • Delete the SSTable files for the removed CFs
Question:- What is replication factor in Cassandra?
Answer:- Replication factor is the measure of the number of data copies existing. It is important to increase the replication factor to log into the cluster.
Question:- Can we change the replication factor on a live cluster?
Answer:- Yes, but it will require running repair to alter the replica count of the existing data.
Question:- How to iterate all rows in a Column Family?
Answer:- Using get_range_slices. You can start iteration with an empty string, and after each iteration the last key read serves as the start key for the next iteration.
Question:- Compare NoSQL & RDBMS
Answer:- • NoSQL • Does not follow any order • Very Good • Limited as no Join Clause • Key-Value Pair, document, column storage, etc. • RDBMS • Organized and structured • Average • Using SQL • Data & relationship stored in different tables
