High Level Design
Overview
This chapter covers the overall design choices made when building SableDB
.
The networking layer of SableDB uses a lock free design. i.e. once a connection is assigned to a worker thread it does not interact with any other threads or shared data structures.
Having said that, there is one obvious "point" that requires locking: the storage.
The current implementation of SableDB
uses RocksDB
as its storage engine
(but it can, in principal, work with other storage engines like Sled
), even though
the the storage itself is thread-safe, SableDB
still needs to provide atomicity for multiple database access (consider the ValKey
's
getset
command which requires to perform both get
and set
in a single operation) - SableDB
achieves this by using a shard locking (more details on this later).
By default, SableDB
listens on port 6379
for incoming connections. A newly arrived connection is then assigned
to a worker thread (using simple round-robin method). The worker thread spawns a local task
(A task, is tokio's implementation for green threads)
which performs the TLS handshake (if dictated by the configuration) and then splits the connection stream into two:
- Reader end
- Writer end
Each end of the stream is then passed into a newly spawned local task for handling
Below is a diagram shows the main components within SableDB
:
Acceptor thread
The main thread of SableDB
- after spawning the worker threads - is used as the TCP acceptor thread.
Unless specified otherwise, SableDB
listens on port 6379. Every incoming connection is moved to a thread
for later handling so the acceptor can accept new connections
TLS handshake
The worker thread moves the newly incoming connection to a task which does the following:
- If TLS is enabled by configuration, performs the TLS handshake (asynchronously) and split the connection into two (receiver and writer ends)
- If TLS is not needed, it just splits the connection into two (receiver and writer ends)
The TLS handshake task spawns the reader and writer tasks and moves two proper ends of the connection to each of the task. A tokio channel is then established between the two tasks for passing data from the reader -> writer task
The reader task
The reader task is responsible for:
- Reading bytes from the stream
- Parsing the incoming message and constructing a
RedisCommand
structure - Once a full command is read from the socket, it is moved to the writer task for processing
The writer task
The writer task input are the commands read and constructed by the reader task.
Once a command is received, the writer task invokes the proper handler for that command (if the command it not supported an error message is sent back to the client).
The command handler, can return one of 2 possible actions:
Send a response to the client
There are 2 ways that the writer task can send back a response to the client:
- The command handler returns the complete response (e.g.
+OK\r\n
) - The command handler writes the response directly to the socket
The decision whether to reply directly or propagate the response to the caller task is done on per command basis. The idea is to prevent huge memory spikes where possible.
For example, the hgetall
command might generate a huge output (depends on the number of fields in the hash and their size)
so it is probably better to write the response directly to the socket (using a controlled fixed chunks) rather than building
a complete response in memory (which can take Gigabytes of RAM) and only then write it to the client.
Block the client
When a client executes a blocking call on a resource that is not yet available, the writer task is suspended until:
- Timeout occurrs (most blocking commands allow to specify timeout duration)
- The resource is available