ScyllaDB - Getting started

Recently I read this article where Discord migrated its messages cluster from Cassandra to ScyllaDB, it reduced message latencies from 200 milliseconds to 5 milliseconds, which got me intrigued to explore ScyllaDB.
How Discord Migrated Trillions of Messages to ScyllaDB

Scylla is an open-source distributed NoSQL database that is compatible with Apache Cassandra, but it provides faster performance and lower latencies. Scylla is based on the C++ programming language, and it has been designed to take advantage of modern hardware that is high-core count CPUs and fast SSDs. Scylla is also designed to be scalable, fault-tolerant, and highly available.

In this blog post, we will look at the steps to use ScyllaDB, starting from installation to creating and querying data using the Scylla Query Language (CQL).

Prerequisites:

Before getting started with ScyllaDB, ensure that you have the following prerequisites:
• A Linux machine running on the Ubuntu operating system
• JDK 11 or higher installed
• Maven installed
• A basic knowledge of Cassandra Query Language (CQL)
• A text editor of your choice

Steps:

Install ScyllaDB:

To install ScyllaDB, we need to add the Scylla repository to our Ubuntu system. Then update the package list and finally run the command to install Scylla.
The following commands install the ScyllaDB 4.4 version on Ubuntu 20.04.

$ curl -o /etc/apt/sources.list.d/scylla.list \
  https://repositories.scylladb.com/scylla/repo/\
scylladb-4.4-focal.list
$ apt-get update
$ apt-get install scyllaCopy Code

Start ScyllaDB:
After installing ScyllaDB, we need to start the ScyllaDB service. To start the Scylla service, run the following command:

$ systemctl start scylla-serverCopy Code

Create a keyspace:
To create a keyspace in Scylla, we can use the CQL command CREATE KEYSPACE. Keyspace is similar to a database in the relational world. It is a logical container for tables.

CREATE KEYSPACE myKeyspace WITH replication = {
 'class': 'SimpleStrategy',
 'replication_factor': '1'
};Copy Code

Here, we created a keyspace named "myKeyspace" with a replication factor of "1". The replication class "SimpleStrategy" is used here.

Create a table:
To create a table, we can use the CQL command CREATE TABLE. A table is like a table in the relational world, which stores data.

CREATE TABLE myKeyspace.users (
   user_id uuid PRIMARY KEY,
   username text,
   email text
);Copy Code

Here we created a table named "users" with three columns: "user_id," which is the primary key of type UUID, "username," which is of type text, and "email," which is also of type text.

Insert data:
To insert data into the table, we can use the CQL command INSERT INTO.

INSERT INTO myKeyspace.users 
   (user_id, username, email)
   VALUES (now(), 'john', 'john@example.com');Copy Code

Here, we inserted a row into the "users" table with a user_id generated by the UUID function now().

Query data:
To query data from the table, we can use the CQL command SELECT.

SELECT * FROM myKeyspace.users;Copy Code

This command returns all the rows present in the "users" table.

Update data:
To update any data in the table, we can use the CQL command UPDATE.

UPDATE myKeyspace.users
SET username = 'peter'
WHERE user_id = d7a57b06-28a7-4eb2-acad-f4fe3a529adf;Copy Code

Here, we updated the username from "john" to "peter" where the user_id is d7a57b06-28a7-4eb2-acad-f4fe3a529adf.

Delete data:
To delete any data from the table, we can use the CQL command DELETE.

DELETE FROM myKeyspace.users
WHERE user_id = d7a57b06-28a7-4eb2-acad-f4fe3a529adf;Copy Code

This command deletes the row where the user_id is d7a57b06-28a7-4eb2-acad-f4fe3a529adf.

Sample Code w/ Python Driver:

Now that we've covered the basics of Scylla DB, let's take a look at some sample code using the Python driver for Scylla DB.

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider

# Connect to the Scylla cluster
cluster = Cluster(['127.0.0.1'], auth_provider=PlainTextAuthProvider(username='myusername', password='mypassword'))
session = cluster.connect('mykeyspace')

# Insert a row into the mytable table
query = "INSERT INTO mytable (id, name, age) VALUES (%s, %s, %s)"
session.execute(query, (2, 'Bob', 30))

# Select rows from the mytable table
query = "SELECT * FROM mytable WHERE age > %s"
rows = session.execute(query, (20,))
for row in rows:
    print(row.id, row.name, row.age)

This code connects to the Scylla cluster and inserts a row into the "mytable" table with an ID of 2, a name of "Bob", and an age of 30. It then selects all rows from the "mytable" table where the age is greater than 20 and prints out the results.

Creating a Table:

from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])
session = cluster.connect()

session.execute("""
    CREATE KEYSPACE IF NOT EXISTS mykeyspace
    WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}
""")

session.execute("""
    CREATE TABLE IF NOT EXISTS mykeyspace.users (
        user_id INT PRIMARY KEY,
        first_name TEXT,
        last_name TEXT,
        email TEXT
    )
""")

In this example, we first connect to the Scylla cluster using the Cluster object. We then create a new keyspace and table using CQL statements executed through the session object.

Inserting Data:

from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')

insert_query = """
    INSERT INTO mykeyspace.users (user_id, first_name, last_name, email)
    VALUES (%s, %s, %s, %s)
"""

session.execute(insert_query, (1, 'John', 'Doe', 'johndoe@example.com'))
session.execute(insert_query, (2, 'Jane', 'Doe', 'janedoe@example.com'))

In this example, we insert two rows into the "users" table. We use a parameterized query to pass in the values for the user_id, first_name, last_name, and email columns.

Querying Data:

from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')

select_query = """
    SELECT * FROM mykeyspace.users WHERE user_id = %s
"""

result = session.execute(select_query, (1,))
for row in result:
    print(row.user_id, row.first_name, row.last_name, row.email)

In this example, we query the "users" table for the row with user_id = 1. We use a parameterized query to pass in the value for the user_id column, and then loop through the result set to print out the values for each column in the row.

Updating Data:

from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')

update_query = """
    UPDATE mykeyspace.users SET email = %s WHERE user_id = %s
"""

session.execute(update_query, ('johndoe_updated@example.com', 1))

In this example, we update the email address for the row with user_id = 1. We use a parameterized query to pass in the new email address and the value for the user_id column.

Deleting Data:

from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')

delete_query = """
    DELETE FROM mykeyspace.users WHERE user_id = %s
"""

session.execute(delete_query, (1,))

In this example, we delete the row with user_id = 1 from the "users" table. We use a parameterized query to pass in the value for the user_id column.

Conclusion

ScyllaDB is a fast, scalable, and fault-tolerant NoSQL database. In this blog post, we went through the steps to install and use ScyllaDB on Linux. We also looked at the basics of CQL commands to create, query, update and delete data from a table. ScyllaDB has a lot of features that we did not cover in this blog post, such as data modeling, high availability, and performance tuning. In the future, we will cover these topics in more detail.

ScyllaDB - Getting started

Prerequisites:

Steps:

Sample Code w/ Python Driver:

Creating a Table:

Inserting Data:

Querying Data:

Updating Data:

Deleting Data:

Conclusion

Comments

More from this blog

Apache Hudi: A Deep Dive with Python Code Examples

Exploring Large Language Models (LLMs) with Python: A Comprehensive Guide

Implementing Real-Time Credit Card Fraud Detection with Apache Flink on AWS

Managing keys & environment variables in a python pipeline/app

Migrating from AWS Redshift to Google BigQuery: A Step-by-Step Guide

Command Palette

Prerequisites:

Steps:

Sample Code w/ Python Driver:

Creating a Table:

Inserting Data:

Querying Data:

Updating Data:

Deleting Data:

Conclusion

Comments

More from this blog