Explain the concept of Bloom Filter.

Answer:- With a strong requirement to scale systems when additional resources are needed, CAP Theorem plays a major role in maintaining the scaling strategy. It is an efficient way to handle scaling in distributed systems. Consistency, availability, and partition tolerance (CAP) theorem states that in distributed systems like Cassandra, users can enjoy only two out of these three characteristics. One of them needs to be sacrificed. Consistency guarantees the return of most recent write for the client; availability returns a rational response within minimum time; and in partition tolerance, the system will continue its operations when network partitions occur. The two options available are AP and CP.

Question:- State the differences between a node, a cluster, and a data center in Cassandra.

Answer:- There are various components of Cassandra. While a node is a single machine running Cassandra, cluster is a collection of nodes that have similar types of data grouped together. Data centers are useful components when serving customers in different geographical areas. You can group different nodes of a cluster into different data centers.

Question:- How to write a query in Cassandra?

Answer:- Using CQL (Cassandra Query Language) we can write queries in Cassandra. Cqlsh is used for interacting with the database.

Question:- What OS does Cassandra support?

Answer:- Cassandra supports both Windows and Linux.

Question:- What is Cassandra Data Model?

Answer:- Cassandra data model consists of four main components: Cluster: Made up of multiple nodes and keyspaces Keyspace: A namespace to group multiple column families, especially one per partition Column: Consisting of a column name, value, and timestamp Column Family: Multiple columns with the row key reference

Question:- What is CQL?

Answer:- CQL is Cassandra query language to access and query Apache distributed database. It consists of a CQL parser that incites all the implementation details to the server. The syntax of CQL is similar to SQL, but it does not alter the Cassandra data model.

Question:- Explain the concept of compaction in Cassandra.

Answer:- Compaction refers to a maintenance process in Cassandra, in which the SSTables are reorganized for data optimization of data structures on the disk. The compaction process is useful during interacting with memtables. There are two types of compaction in Cassandra. • Minor compaction: It gets started automatically when a new SSTable is created. Here, Cassandra condenses all the equally sized SSTables into one. • Major compaction: It is triggered manually using the nodetool. It compacts all SSTables of a column family into one.

Question:- Does Cassandra support ACID transactions?

Answer:- Unlike relational databases, Cassandra does not support ACID transactions.

Question:- Explain Cqlsh.

Answer:- Cqlsh expands to Cassandra Query Language Shell that configures the CQL interactive terminal. It is a Python-based command-line prompt used on Linux or Windows and executes CQL commands like ASSUME, CAPTURE, CONSISTENCY, COPY, DESCRIBE, and many others. With cqlsh, users can define a schema, insert data, and execute a query.

Question:- What is Super Column in Cassandra?

Answer:- Cassandra Super Column is a unique element consisting of similar collections of data. They are actually key–value pairs with values as columns. It is a sorted array of columns, and they follow a hierarchy when in action: keystore > column family > super column > column data structure in JSON. Similar to the row keys, super column data entries contain no independent values but are used to collect other columns. It is interesting to note that super column keys appearing in different rows do not necessarily match and will not ever.

Question:- Define the consistency levels for read operations in Cassandra.

Answer:- • ALL: Highly consistent. A write must be written to a commitlog and a memtable on all replica nodes in the cluster. • EACH_QUORUM: A write must be written to a commitlog and a memtable on quorum of replica nodes in all data centers. • LOCAL_QUORUM: A write must be written to a commitlog and a memtable on quorum of replica nodes in the same center. • ONE: A write must be written to a commitlog and a memtable of at least one replica node. • TWO, Three: Same as One but with at least two and three replica nodes, respectively • LOCAL_ONE: A write must be written for at least one replica node in the local data center. • ANY • SERIAL: Linearizable consistency to prevent unconditional update • LOCAL_SERIAL: Same as serial but restricted to a local data center

Question:- What is the difference between Column and Super Column?

Answer:- Both elements work on the principle of tuples having name and value. However, the former’s value is a string, while the value of the latter is a map of columns with different data types. Unlike Columns, Super Columns do not contain the third component of timestamp.

Question:- What is Column Family?

Answer:- As the name suggests, a column family refers to a structure having an infinite number of rows. Those are referred by a key–value pair, where the key is the name of the column and the value represents the column data. It is much similar to a hashmap in Java or a dictionary in Python. Rememeber, the rows are not limited to a predefined list of columns here. Also, the column family is absolutely flexible with one row having 100 columns while the other having only 2 columns.

Question:- Define the use of the source command in Cassandra.

Answer:- Source command is used to execute a file consisting of CQL statements.