Question:- Explain Tombstone in Cassandra.
Answer:- Tombstone is a row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassandra supports eventual consistency, where the data must respond before any successful operation.
Question:- On what platforms does Cassandra run?
Answer:- Since Cassandra is a Java application, it can successfully run on any Java-driven platform or on Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on Red Hat, CentOS, Debian, and Ubuntu Linux platforms.
Question:- Since Cassandra is a Java application, it can successfully run on any Java-driven platform or on Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on Red Hat, CentOS, Debian, and Ubuntu Linux platforms.
Answer:- The default settings state that Cassandra uses 7000 port for Cluster Management, 9160 for Thrift Clients, and 8080 for JMX. These are all TCP ports and can be edited in the configuration file: bin/cassandra.in.sh
Question:- Can you add or remove column families in a working cluster?
Answer:- Yes, but while doing that we need to keep in mind the following processes: • Do not forget to clear the commitlog with ‘nodetool drain’ • Turn off Cassandra to ensure that there is no data left in the commitlog • Delete the SSTable files for the removed CFs
Question:- What is replication factor in Cassandra?
Answer:- Replication factor is the measure of the number of data copies existing. It is important to increase the replication factor to log into the cluster.
Question:- Can we change the replication factor on a live cluster?
Answer:- Yes, but it will require running repair to alter the replica count of the existing data.
Question:- How to iterate all rows in a Column Family?
Answer:- Using get_range_slices. You can start iteration with an empty string, and after each iteration the last key read serves as the start key for the next iteration.
Question:- Compare NoSQL & RDBMS
Answer:- • NoSQL • Does not follow any order • Very Good • Limited as no Join Clause • Key-Value Pair, document, column storage, etc. • RDBMS • Organized and structured • Average • Using SQL • Data & relationship stored in different tables
Question:- What is NoSQL?
Answer:- NoSQL encompasses a wide variety of different database technologies that were developed in response to a rise in the volume of data stored about users, objects and products. The frequency in which this data is accessed, and performance and processing needs. Relational databases, on the other hand, were not designed to cope with the scale and agility challenges that face modern applications, nor were they built to take advantage of the cheap storage and processing power available today.
Question:- What are the features of NoSQL?
Answer:- When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address: • Large volumes of structured, semi-structured, and unstructured data • Agile sprints, quick iteration, and frequent code pushes • Object-oriented programming that is easy to use and flexible • Efficient, scale-out architecture instead of expensive, monolithic architecture
Question:- Explain the difference between NoSQL v/s Relational database?
Answer:- The history seem to look like this: Google needs a storage layer for their inverted search index. They figure a traditional RDBMS is not going to cut it. So they implement a NoSQL data store, BigTable on top of their GFS file system. The major part is that thousands of cheap commodity hardware machines provides the speed and the redundancy.Everyone else realizes what Google just did.Brewers CAP theorem is proven. All RDBMS systems of use are CA systems. People begin playing with CP and AP systems as well. K/V stores are vastly simpler, so they are the primary vehicle for the research. Software-as-a-service systems in general do not provide an SQL-like store. Hence, people get more interested in the NoSQL type stores.I think much of the take-off can be related to this history. Scaling Google took some new ideas at Google and everyone else follows suit because this is the only solution they know to the scaling problem right now. Hence, you are willing to rework everything around the distributed database idea of Google because it is the only way to scale beyond a certain size.Get to know more about this NoSQL vs. SQL – What is Better? that can help you grow in your career.
Question:- Explain “Polyglot Persistence” in NoSQL?
Answer:- In 2006, Neal Ford coined the term polyglot programming, to express the idea that applications should be written in a mix of languages to take advantage of the fact that different languages are suitable for tackling different problems. Complex applications combine different types of problems, so picking the right language for each job may be more productive than trying to fit all aspects into a single language.Similarly, when working on an e-commerce business problem, using a data store for the shopping cart which is highly available and can scale is important, but the same data store cannot help you find products bought by the customers’ friends—which is a totally different question. We use the term polyglot persistence to define this hybrid approach to persistence. These are described in NoSQL’s online reference guide and on NoSQL community.
Question:- How does NoSQL DB budget memory?
Answer:- The Replication Node manages the data in a NoSQL DB store and is the main consumer of memory. The Java heap and cache size used by the Replication Node can be important performance factors. By default, the Replication Node heap and cache are calculated by NoSQL DB based on the amount of memory available to the Storage Node. We recommend that you specify the available memory for a Storage Node using the -memory_mb flag for makebootconfig, or the memory_mb Storage Node parameter. If you do not define memory_mb, it will default to the memory available on the node. NoSQL DB will then use 85% of memory_mb as the heap for the Replication Node processes hosted by that Storage Node. If the Storage Node hosts more than one Replication Node, the memory will be divided evenly between all RNs. If the number of Replication Nodes on a Storage Node changes, the per-RN memory will be recalculated dynamically. The percentage used for heap is controlled by the rnHeapPercent Storage Node parameter. You can choose to override the default value of 85%.Each Replication Node uses a cache, and the size of that cache defaults to 70% of the Replication Node heap. You can override the 70% default by setting the rnCachePercent Replication Node parameter. The Replication Node heap can also be specified directly by setting the -Xmx in the Replication Node javaMiscParams parameter. Likewise, the Replication Node cache can be set directly with the cache Size Replication Node parameter. While that’s possible, it’s advisable to use the Storage Node memory_mb setting. As an example, suppose you specify that a Storage Node may use 3000 MB of memory, by setting memory_mb to 3000. If that Storage Node hosts two Replication Nodes, the heap for each RN will be (3000 * .85)/2 = 1275MB. Each RN cache will be (1275 * .70) = 892MB.
Question:- How to script NoSQL DB configuration?
Answer:- You may find that you want to build the same NoSQL DB configuration repeatedly for testing purposes. The Admin CLI commands can be scripted in several ways.Many uses of the Admin CLI are simple commands, such as java -jar kvstore.jar makebootconfig to initially configure a StorageNode, shown above. These are as amenable to scripting as any other UNIX commands and will not be discussed further here.The interactive commands available in java -jar kvstore.jar runadmin, among which are those used to create and execute plans, can be scripted in two ways. You can create a file containing the sequence of commands that you want to run, and run them in a batch using java -jar kvstore.jar runadmin load -file
