Overview
SableDB uses a Key / Value database for its underlying data storage. We chose to use RocksDB
as its mature, maintained and widely used in the industry by giant companies.
Because the RocksDB is key-value storage and Redis data structures can be more complex, an additional
data encoding is required.
This chapter covers how SableDB encodes the data for the various data types (e.g. String, Hash, Set etc)
Note
Numbers are encoded using Big Endians to preserve lexicographic ordering
SableDB takes advantage of the following RocksDB traits:
- RocksDBkeys are stored lexicographically (this is why- SableDBuses big-endiands)
- RocksDBprovides prefix iterators which allows- SableDBto place iterator on the first item that matches a prefix
The String data type
The most basic data type in SableDB is the String data type. Strings in SableDB are always binary safe
Each String record in the SableDB consists of a single entry in RocksDB:
   A    B      C      D                E        F       G     H
+-----+-----+-------+----------+    +-----+------------+----+-------+
| 1u8 | DB# | Slot# | user key | => | 0u8 | Expirtaion | ID | value |
+-----+-----+-------+----------+    +-----+------------+----+-------+
The key for a String record is encoded as follows:
- Athe first byte (- u8) is always set to- 1- this indicates that this is a data entry (there are other type of keys in the database)
- Bthe database ID is encoded as- u16(this implies that- SableDBsupports up to- 64Kdatabases)
- Cthe slot number
- Dthe actual key value (e.g.- set mykey myvalue->- mykeyis set here)
The value is encoded as follows:
- Ethe first byte is the type bit, value of- 0means that the this record is of type- String
- Fthe record expiration info
- Gunique ID (relevant for complex types like- Hash), for- Stringthis is always- 0
- Hthe user value
Using the above encoding, we can now understand how SableDB reads from the database. Lets have a look a the command:
get mykey
SableDB encodes a key from the user key (mykey) by prepending the following:
- 1u8 - to indicate that this is the data record
- The active database number (defaults to 0)
- The slot number
- The user string key (i.e. mykey)
This is the key that is passed to RocksDB for reading
- If the key exists in the database:
    - If the type (field E) is != 0 - i.e. the entry is not a String, SableDB returns a -WRONGTYPE error
    - If value is expired -> SableDB returns null and deletes the record from the database
    - Otherwise, SableDB returns the H part of the value (the actual user data)
- Else (no such key) return null
The List data type
A List is a composite data type. SableDB stores the metadata of the list using a dedicated record and each list element
is stored in a separate entry.
List metadata:
   A    B       C        D
+-----+---- +--------+------------+
| 1u8 | DB# |  Slot# |  list name |
+-----+---- +--------+------------+
                             E        F        G        H      I       J
                        +-----+------------+--------- +------+------+-------+
                   =>   | 1u8 | Expirtaion | List UID | head | tail |  size |
                        +-----+------------+--------- +------+------+-------+
List item:
   K        L              M                 N      O           P
+-----+--------------+---------------+    +------+--------+------------+
| 2u8 | List ID(u64) |  Item ID(u64) | => | Left | Right  |     value  |
+-----+--------------+---------------+    +------+--------+------------+
Unlike String, a List is using an additional entry in the database that holds the list metadata.
- Encoded items A->Dare the same asString
- Ethe first byte is always set to- 1(unlike- Stringwhich is set to- 0)
- FExpiration info
- GThe list UID. Each list is assigned with a unique ID (an incremental number that never repeat itself, evern after restarts)
- Hthe UID of the list head item (- u64)
- Ithe UID of the list tail item (- u64)
- Jthe list length
In addition to the list metadata (SableDB keeps a single metadata item per list) we add a list item per new list item
using the following encoding:
- Kthe first bit which is always set to- 2("List Item")
- Lthe parent list ID (see field- Gabove)
- Mthe item UID
- Nthe UID of the previous item in the list (- 0means that this item is the head)
- Othe UID of the next item in the list (- 0means that this item is the last item)
- Pthe list value
The above encoding allows SableDB to iterate over all list items by creating a RocksDB iterator and move it to
the prefix [ 2 | <list-id>] (2 indicates that only list items should be scanned, and list-id makes sure that only
the requested list items are visited)
The Hash data type
Hash items are encoded using the following:
Hash metadata:
   A    B       C        D                E        F        G         H
+-----+---- +--------+-----------+    +-----+------------+---------+-------+
| 1u8 | DB# |  Slot# | Hash name | => | 2u8 | Expirtaion | Set UID |  size |
+-----+---- +--------+-----------+    +-----+------------+---------+-------+
Hash item:
   P        Q           R           S
+-----+--------------+-------+    +-------+
| 3u8 | Hash ID(u64) | field | => | value |
+-----+--------------+-------+    +-------+
- Encoded items A->Hare basically identical to the hashA->Hfields
- Palways set to- 3("hash member")
- Qthe hash ID for which this member belongs to
- Rthe hash field
- Sthe field's value
The Sorted Set data type
The sorted set ( Z* commands) is encoded using the following:
Sorted set metadata:
   A    B       C        D                E        F        G         H
+-----+---- +--------+-----------+    +-----+------------+---------+-------+
| 1u8 | DB# |  Slot# | ZSet name | => | 3u8 | Expirtaion | ZSet UID|  size |
+-----+---- +--------+-----------+    +-----+------------+---------+-------+
ZSet item 1 (Index: "Find by member"):
   K        L              M           O
+-----+--------------+---------+    +-------+
| 4u8 | ZSet ID(u64) |  member | => | score |
+-----+--------------+---------+    +-------+
ZSet item 2 (Index: "Find by score"):
   P        Q           R       S            T
+-----+--------------+-------+-------+    +------+
| 5u8 | ZSet ID(u64) | score |member | => | null |
+-----+--------------+-------+-------+    +------+
Sorted set requires double index (score & member), this is why each zset item member is kept using 2 records.
The zset metadata contains:
- Encoded items A->Dare the same asString
- Ewill always contains- 3for- sorted set
- Fthe expiration info
- Gthe unique zset ID
- Hthe set size (number of members)
Each zset item are kept using 2 records:
Index: "Find by member"
The first record allows SableDB to find a member score (the key is the member value)
- Kthe first bit which is always set to- 4("ZSet member Item")
- Lthe zset ID for which this item belongs to
- Mthe zset member
- Othis member score value
Index: "Find by score"
The second record, allows SableDB to find member by score (we use the score as the key)
- Pthe first bit is always set to- 5("Zset score item")
- Qthe zset ID for which this item belongs to
- Rthe record's score value
- Sthe member
- Tnot used
The above encoding records provides all the indexing required by SableDB to implement the sorted set commands.
For example, in order to implement the command ZCOUNT (Returns the number of elements in the sorted set at key with a score between min and max):
- SableDBfirst loads the metadata using the zset key in order to obtain its unique ID
- Creates an iterator using the prefix [5 | ZSET UID | MIN_SCORE](Index: "Find by score")
- Start iterating until it either finds the first entry that does not belong to the zset, or it finds the MAX_SCOREvalue
The Set data type
Set items are encoded using the following:
Set metadata:
   A    B       C        D                E        F        G         H
+-----+---- +--------+-----------+    +-----+------------+---------+-------+
| 1u8 | DB# |  Slot# | Set name  | => | 4u8 | Expirtaion | Set UID |  size |
+-----+---- +--------+-----------+    +-----+------------+---------+-------+
Set item:
   P        Q           R           S
+-----+--------------+-------+    +------+
| 6u8 | Set ID(u64)  | field | => | null |
+-----+--------------+-------+    +------+
- Encoded items A->Hare basically identical to the sorted setA->Hfields
- Palways set to- 6("set member")
- Qthe set ID for which this member belongs to
- Rthe set field
- Snull (not used)
Bookkeeping records
Every composite item (Hash, Sorted Set, List or Set) created by SableDB, also creates a record in the bookkeeping "table".
A bookkeeping records keeps track of the composite item unique ID + its type (which is needed by the data eviction job)
The bookkeeping record is encoded as follows:
Bookkeeping:
   A    B       C        D                E
+-----+----+--------+-----------+    +----------+
| 0u8 | UID|  DB#   | UID type  | => | user key |
+-----+----+--------+-----------+    +----------+
- Aa bookkeeping records starts with- 0
- Ba- u64field containing the composite item UID (e.g.- Hash UID)
- Cthe database ID for which the UID belongs to
- Dthe UID type when it was created (e.g. "hash" or "set")
- Ethe user key associated with the UID (e.g. the hash name)