What is Column Family?

Question:- Define the use of the source command in Cassandra.

Answer:- Source command is used to execute a file consisting of CQL statements.

Answer:- Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.

Question:- Explain Tombstone in Cassandra.

Answer:- Tombstone is a row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassandra supports eventual consistency, where the data must respond before any successful operation.

Question:- On what platforms does Cassandra run?

Answer:- Since Cassandra is a Java application, it can successfully run on any Java-driven platform or on Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on Red Hat, CentOS, Debian, and Ubuntu Linux platforms.

Question:- Since Cassandra is a Java application, it can successfully run on any Java-driven platform or on Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on Red Hat, CentOS, Debian, and Ubuntu Linux platforms.

Answer:- The default settings state that Cassandra uses 7000 port for Cluster Management, 9160 for Thrift Clients, and 8080 for JMX. These are all TCP ports and can be edited in the configuration file: bin/cassandra.in.sh

Question:- Can you add or remove column families in a working cluster?

Answer:- Yes, but while doing that we need to keep in mind the following processes: • Do not forget to clear the commitlog with ‘nodetool drain’ • Turn off Cassandra to ensure that there is no data left in the commitlog • Delete the SSTable files for the removed CFs

Question:- What is replication factor in Cassandra?

Answer:- Replication factor is the measure of the number of data copies existing. It is important to increase the replication factor to log into the cluster.

Question:- Can we change the replication factor on a live cluster?

Answer:- Yes, but it will require running repair to alter the replica count of the existing data.

Question:- How to iterate all rows in a Column Family?

Answer:- Using get_range_slices. You can start iteration with an empty string, and after each iteration the last key read serves as the start key for the next iteration.

Question:- Compare NoSQL & RDBMS

Answer:- • NoSQL • Does not follow any order • Very Good • Limited as no Join Clause • Key-Value Pair, document, column storage, etc. • RDBMS • Organized and structured • Average • Using SQL • Data & relationship stored in different tables

Question:- What is NoSQL?

Answer:- NoSQL encompasses a wide variety of different database technologies that were developed in response to a rise in the volume of data stored about users, objects and products. The frequency in which this data is accessed, and performance and processing needs. Relational databases, on the other hand, were not designed to cope with the scale and agility challenges that face modern applications, nor were they built to take advantage of the cheap storage and processing power available today.

Question:- What are the features of NoSQL?

Answer:- When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address: • Large volumes of structured, semi-structured, and unstructured data • Agile sprints, quick iteration, and frequent code pushes • Object-oriented programming that is easy to use and flexible • Efficient, scale-out architecture instead of expensive, monolithic architecture

Question:- Explain the difference between NoSQL v/s Relational database?

Answer:- The history seem to look like this: Google needs a storage layer for their inverted search index. They figure a traditional RDBMS is not going to cut it. So they implement a NoSQL data store, BigTable on top of their GFS file system. The major part is that thousands of cheap commodity hardware machines provides the speed and the redundancy.Everyone else realizes what Google just did.Brewers CAP theorem is proven. All RDBMS systems of use are CA systems. People begin playing with CP and AP systems as well. K/V stores are vastly simpler, so they are the primary vehicle for the research. Software-as-a-service systems in general do not provide an SQL-like store. Hence, people get more interested in the NoSQL type stores.I think much of the take-off can be related to this history. Scaling Google took some new ideas at Google and everyone else follows suit because this is the only solution they know to the scaling problem right now. Hence, you are willing to rework everything around the distributed database idea of Google because it is the only way to scale beyond a certain size.Get to know more about this NoSQL vs. SQL – What is Better? that can help you grow in your career.

Question:- Explain “Polyglot Persistence” in NoSQL?

Answer:- In 2006, Neal Ford coined the term polyglot programming, to express the idea that applications should be written in a mix of languages to take advantage of the fact that different languages are suitable for tackling different problems. Complex applications combine different types of problems, so picking the right language for each job may be more productive than trying to fit all aspects into a single language.Similarly, when working on an e-commerce business problem, using a data store for the shopping cart which is highly available and can scale is important, but the same data store cannot help you find products bought by the customers’ friends—which is a totally different question. We use the term polyglot persistence to define this hybrid approach to persistence. These are described in NoSQL’s online reference guide and on NoSQL community.