A list of top frequently asked Cassandra interview questions and answers are given below.
1) What is Cassandra?
Cassandra is a one of the NoSQL distributed database system. It is an open source data storage system effectively designed to store and manages large volume of data without any failure.
2) In which language Cassandra is written?
Cassandra is written in Java. It is originally designed by Facebook consisting of flexible schemas. It is highly scalable for big data.
3) What are the benefits/ advantages of Cassandra?
Advantages/ Benefits of Cassandra:
- Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure.
- Cassandra delivers near real-time performance simplifying the work of Developers, Administrators, Data Analysts and Software Engineers.
- It provides extensible scalability and can be easily scaled up and scaled down as per the requirements.
- It is fault tolerant and consistent.
- It is a column-oriented database.
- It has no single point of failure.
- There is no need for separate caching layer.
- It has flexible schema design.
- It has flexible data storage, easy data distribution, and fast writes.
- It supports ACID (Atomicity, Consistency, Isolation, and Durability) properties.
- It has multi-data center and cloud capable.
4) How Cassandra stores data?
Cassandra stores all data as bytes. When you specify validator, Cassandra ensures that those bytes are encoded as per requirement and then a comparator orders the column based on the ordering specific to the encoding.
5) What are the main components of Cassandra data models?
Following are the main components of Cassandra data model:
- Column & Family
6) What are the other components of Cassandra?
Some other components of Cassandra are:
- Data Center
- Commit log
- Bloom Filter
7) What is keyspace in Cassandra?
In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster contains of one keyspace per node.
8) What is the syntax to create keyspace in Cassandra?
- CREATE KEYSPACE <identifier> WITH <properties>
9) What is a column family in Cassandra?
In Cassandra, a collection of rows is referred as “column family”.
10) How does Cassandra perform write function?
Cassandra performs the write function by applying two commits:
- First commit is applied on disk and then second commit to an in-memory structure known as memtable.
- When the both commits are applied successfully, the write is achieved.
- Writes are written in the table structure as SSTable (sorted string table).
11) What is memtable?
Memtable is in-memory/write-back cache space containing content in key and column format. In memtable, data is sorted by key, and each ColumnFamily has a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.
12) What are the management tools in Cassandra?
DataStaxOpsCenter: It is an internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter.
SPM: SPM primarily administers Cassandra metrics and various OS and JVM metrics. It also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms besides Cassandra.
13) What are the main features of SPM in Cassandra?
The main features of SPM are:
- Correlation of events and metrics
- Distributed transaction tracing
- Creating real-time graphs with zooming
- Detection and heartbeat alerting
14) What is cluster in Cassandra?
In Cassandra, the cluster is an outermost container for keyspaces that arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.
15) When can you use ALTER KEYSPACE?
The ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.
16) What is Cassandra-Cqlsh?
Cassandra-Cqlsh is a query language, used to communicate with its database. Cassandra cqlsh facilitates you to do the following things:
- Define a schema
- Insert a data and
- Execute a query
17) What are the differences between a node, a cluster, and datacenter in Cassandra?
Node: A node is a single machine running Cassandra.
Cluster: A cluster is a collection of nodes that contains similar types of data together.
Datacenter: A datacenter is a useful component when serving customers in different geographical areas. Different nodes of a cluster can be grouped into different data centers.
18) What is Cassandra-CQL collection?
Cassandra-CQL collection is used to store multiple values in single variable. Cassandra facilitates you to use CQL collections in following ways:
- List: List is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements).
- SET: SET is used for group of elements to store and returned in sorted orders (holds repeating elements).
- MAP: MAP is a data type used to store a key-value pair of elements.
19) What is the use of Bloom Filter in Cassandra?
A bloom filter is a space efficient data structure that is used to find whether an SSTable has data for a particular row. In Cassandra a Bloom Filter is used to save IO when performing a KEY LOOKUP.
20) How does Cassandra delete data?
SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.
21) What is SuperColumn in Cassandra?
In Cassandra, SuperColumn is a unique element containing similar collection of data. They are actually key-value pairs with values as columns.
22) What is the difference between Column and SuperColumn?
Difference between Column and SuperColumn:
- The values in columns are string while the values in SuperColumn are Map of Columns with different data types.
- Unlike Columns, Super Columns do not contain the third component of timestamp.