Understanding Caches

January 10, 2024

In Symbol, the global blockchain state is stored in one or more caches. There are some core caches, but other caches can be added via the plugin system. The cache implementations are template-heavy, so they can be a bit confusing to understand. Let's examine the Mosaic Cache to understand how they work!

Cache Descriptor

Every cache must have a descriptor that describes the cache. In the case of the mosaic cache, the descriptor looks like this:

struct MosaicCacheDescriptor {
public:
    static constexpr auto Name = "MosaicCache";

public:
    // key value types
    using KeyType = MosaicId;
    using ValueType = state::MosaicEntry;

    // cache types
    using CacheType = MosaicCache;
    using CacheDeltaType = MosaicCacheDelta;
    using CacheViewType = MosaicCacheView;

    using Serializer = MosaicEntryPrimarySerializer;
    using PatriciaTree = MosaicPatriciaTree;

public:
    /// Gets the key corresponding to \a entry.
    static auto GetKeyFromValue(const ValueType& entry) {
        return entry.mosaicId();
    }
};

Name is the name of the cache. When a cache database is enabled, the data for the cache will be stored in a directory with this name. Also, this is frequently used in logs.

All data caches are structured as key value maps. KeyType is the type of the key and ValueType is the type of the value. There are a few requirements on these types:

  1. Given an instance of ValueType there must be a way to derive the corresponding KeyType. The function GetKeyFromValue performs this derivation. For this cache, MosaicEntry directly exposes its mosaic id via the mosaicId accessor.
  2. ValueType must support serialization to and deserialization from a binary buffer.

Each Symbol cache is composed of three parts:

  • CacheType - owns the underlying data
  • CacheViewType - a read only view on the data
  • CacheViewType - an in-memory writable view on the data using copy on write semantics

Serializer provides functions to serialized and deserialize ValueType. PatriciaTree uses the Serializer to implement a Merkle Patricia Tree.

Key Type

KeyType is type of key that is used to lookup data in a cache. For the MosaicCache this is defined as MosaicId, which is a simple 64-bit unsigned integer value:

struct MosaicId_tag {};
using MosaicId = utils::BaseValue<uint64_t, MosaicId_tag>;

ℹ️ BaseValue is a wrapper around primitive types in order to prevent the mixing of different conceptual types. For example, while MosaicId and NamespaceId are both 64-bit unsigned integers, they represent different things. If they were simply defined as uint64_t there would be no errors assigning one to the other. Using BaseValue, this assignment would lead to a compile-time error.

Value Type

ValueType is type of value that is returned by the cache. Notice that this is a C++ type, and not a simple struct. This simplifies code interacting with the cache. For the MosaicCache this is defined as MosaicEntry, which is defined below:

/// Tuple composed of a mosaic definition and its current state.
class PLUGIN_API_DEPENDENCY MosaicEntry : public MosaicEntrySupplyMixin {
public:
    static constexpr auto Is_Deactivation_Destructive = false;

public:
    /// Creates a mosaic entry around mosaic \a id and mosaic \a definition.
    MosaicEntry(MosaicId id, const MosaicDefinition& definition);

public:
    /// Gets the mosaic id.
    MosaicId mosaicId() const;

    /// Gets the mosaic definition.
    const MosaicDefinition& definition() const;

    /// Returns \c true if entry is active at \a height.
    bool isActive(Height height) const;

private:
    MosaicId m_id;
    MosaicDefinition m_definition;
};

MosaicEntrySupplyMixin is a mixin that contains data and functions relating to mosaic supply. Usage of a mixin is not required. These data and functions could be defined directly in MosaicEntry. The benefit of using a mixin is that it leads to cleaner code by keeping related data and functions grouped together.

Remember, that mosaics are artifacts that can expire. This adds two additional requirements to any value type:

  1. Is_Deactivation_Destructive - Flag set to true if the value changes on expiry and needs to be rehashed. In the case of mosaics, this is false because mosaics are not pruned. In the case of namespaces and lock hashes, this is true because both of those are pruned.
  2. isActive - Function that accepts a height and returns true if the value is active at that height.

Cache

In Symbol, each cache is simply a data store that provides different views to the underlying data. If you're working with plugins and extensions, you'll rarely have or need access to a cache directly. Still, it's important to understand how a cache works.

Types

While caches are conceptually key value stores, for optimization reasons, they will often have secondary stores as well. Importantly, the cache is fully described by the data in the primary store. In fact, this is the only data used to build the Merkle Patricia Tree. Secondary stores are normally used for grouping data and should not contain any new data. In other words, all secondary stores should be able to be rebuilt from the primary store.

CacheTypes structure is used to describe the key value stores:

/// Mosaic cache types.
struct MosaicCacheTypes {
public:
    using CacheReadOnlyType = ReadOnlyArtifactCache<BasicMosaicCacheView, BasicMosaicCacheDelta, MosaicId, state::MosaicEntry>;

// region secondary descriptors

public:
    struct HeightGroupingTypesDescriptor {
    public:
        using KeyType = Height;
        using ValueType = utils::IdentifierGroup<MosaicId, Height, utils::BaseValueHasher<MosaicId>>;
        using Serializer = MosaicHeightGroupingSerializer;

    public:
        static auto GetKeyFromValue(const ValueType& heightMosaics) {
            return heightMosaics.key();
        }
    };

// endregion

public:
    using PrimaryTypes = MutableUnorderedMapAdapter<MosaicCacheDescriptor, utils::BaseValueHasher<MosaicId>>;
    using HeightGroupingTypes = MutableUnorderedMapAdapter<HeightGroupingTypesDescriptor, utils::BaseValueHasher<Height>>;

public:
    using BaseSetDeltaPointers = MosaicBaseSetDeltaPointers;
    using BaseSets = MosaicBaseSets;
};

Here, we can see a secondary store with its own descriptor: HeightGroupingTypesDescriptor. Notice that a secondary store descriptor only requires KeyType, ValueType and Serializer. These have the same meanings as in the primary type descriptor discussed previously.

PrimaryTypes is the primary store. It is a unordered map with a key type of MosaicId and a value type of MosaicEntry (from MosaicCacheDescriptor). Notice that the Mutable variant must be used because MosaicEntry is mutable. This will enable copy on write semantics. utils::BaseValueHasher is simply a hash object for base value so that it can be used in STL unordered containers.

HeightGroupingTypes is a secondary store. It is a unordered map with a key type of Height and a value type of IdentifierGroup<...> (from HeightGroupingTypesDescriptor). Notice that the Mutable variant must be used because IdentifierGroup<...> is mutable.

Base Sets

The underlying cache data is composed of one or more stores. All of these stores - primary and secondary key value stores along with an optional Merkle Patricia Tree store - are stored in a BaseSets structure:

struct MosaicBaseSets : public CacheDatabaseMixin {
public:
    /// Indicates the set is not ordered.
    using IsOrderedSet = std::false_type;

public:
    explicit MosaicBaseSets(const CacheConfiguration& config)
            : CacheDatabaseMixin(config, { "default", "height_grouping" })
            , Primary(GetContainerMode(config), database(), 0)
            , HeightGrouping(GetContainerMode(config), database(), 1)
            , PatriciaTree(hasPatriciaTreeSupport(), database(), 2)
    {}

public:
    MosaicCacheTypes::PrimaryTypes::BaseSetType Primary;
    MosaicCacheTypes::HeightGroupingTypes::BaseSetType HeightGrouping;
    CachePatriciaTree<MosaicPatriciaTree> PatriciaTree;
...

CacheDatabaseMixin is responsible for initializing the underlying cache storage. Typically, this will be RocksDB (when enableCacheDatabaseStorage is true). The RocksDB column family names are provided as the second argument. When enableVerifiableState is true, an additional patricia_tree column family will be added automatically.

IsOrderedSet needs to be provided (as std::true_type or std::false_type) to indicate whether or not the data is ordered. Here it is not. This is needed in order to allow pruning of ordered sets.

ℹ️ In this article we're looking at the most common type of caches - data caches. However, there are also a few runtime caches - like a hash cache that stores recent confirmed transaction hashes. Due to the temporal nature of these caches, they need to be pruned. Also, they are not input into the blockchain state hash because they can be completely rebuilt easily by processing a relatively small finite number of blocks.

Cache Container

Now that we have all of our descriptors set up, we can actually define the cache. The cache is simply a wrapper around the data contained in the BaseSets. It adds the ability to create different types of views on that data:

  1. View - Read only view of the data.
  2. Delta - Writable view of the data that can be committed back and change the underlying data. Importantly, all changes are made using copy on write semantics. If they are not committed, they will be rejected. For example, if a block contains some valid transactions and then an invalid one, after the invalid one is rejected all in-memory changes will be reverted.
  3. DetachedDelta - Writable view of the data that CANNOT be committed back nor change the underlying data. This view is used most importantly during block production. The harvester needs to ensure that the block it creates is valid and does not contain any conflicting transactions. In order to accomplish this, it validates and executes transactions against the blockchain state while it builds the block.

Now that we understand views, let's look at the MosaicCache:

/// Cache composed of mosaic information.
using BasicMosaicCache = BasicCache<MosaicCacheDescriptor, MosaicCacheTypes::BaseSets>;

/// Synchronized cache composed of mosaic information.
class MosaicCache : public SynchronizedCache<BasicMosaicCache> {
public:
    DEFINE_CACHE_CONSTANTS(Mosaic)

public:
    /// Creates a cache around \a config.
    explicit MosaicCache(const CacheConfiguration& config) : SynchronizedCache<BasicMosaicCache>(BasicMosaicCache(config))
    {}
};

This is really simple because nearly all of the logic is common across all caches and has been refactored. BasicCache defines four important functions:

  1. createView - Creates a read only view of the data.
  2. createDelta - Creates a writable view of the data that can be committed back and change the underlying data.
  3. createDetachedDelta - Creates a writable view of the data that CANNOT be committed back nor change the underlying data.
  4. commit - Commits pending changes in a delta view to the underlying storage.

SynchronizedCache adds reader writer lock semantics across all data views. View creation acquires a READ lock in all cases. A WRITE lock is only acquired in commit after all changes are validated and pending in memory. Commit effectively flushes the changes from memory to storage.

DEFINE_CACHE_CONSTANTS simply derives a cache id and a cache name from the cache friendly name (Mosaic) and makes them accessible from the cache.

Cache View

Due to similarities across cache views, most of the common cache functionality is refactored into mixins. For example, if a cache wants to expose functionality to check the existence of an item, it can add the contains mixin:

/// Mixin for adding contains support to a cache.
template<typename TSet, typename TCacheDescriptor>
class ContainsMixin {
private:
    using KeyType = typename TCacheDescriptor::KeyType;

public:
    /// Creates a mixin around \a set.
    explicit ContainsMixin(const TSet& set) : m_set(set)
    {}

public:
    /// Gets a value indicating whether or not the cache contains an element with \a key.
    bool contains(const KeyType& key) const {
        return m_set.contains(key);
    }

private:
    const TSet& m_set;
};

This mixin simply wraps a const reference to a set. It then exposes a contains function that simply delegates to the one on the set.

Looking at the mosaic cache view, we can see it is simply composed of a lot of mixins:

/// Mixins used by the mosaic cache view.
struct MosaicCacheViewMixins : public PatriciaTreeCacheMixins<MosaicCacheTypes::PrimaryTypes::BaseSetType, MosaicCacheDescriptor> {};

/// Basic view on top of the mosaic cache.
class BasicMosaicCacheView
        : public utils::MoveOnly
        , public MosaicCacheViewMixins::Size
        , public MosaicCacheViewMixins::Contains
        , public MosaicCacheViewMixins::Iteration
        , public MosaicCacheViewMixins::ConstAccessor
        , public MosaicCacheDeltaMixins::PatriciaTreeView
        , public MosaicCacheViewMixins::ActivePredicate {
public:
    using ReadOnlyView = MosaicCacheTypes::CacheReadOnlyType;

public:
    /// Creates a view around \a mosaicSets.
    explicit BasicMosaicCacheView(const MosaicCacheTypes::BaseSets& mosaicSets)
            : MosaicCacheViewMixins::Size(mosaicSets.Primary)
            , MosaicCacheViewMixins::Contains(mosaicSets.Primary)
            , MosaicCacheViewMixins::Iteration(mosaicSets.Primary)
            , MosaicCacheViewMixins::ConstAccessor(mosaicSets.Primary)
            , MosaicCacheViewMixins::PatriciaTreeView(mosaicSets.PatriciaTree.get())
            , MosaicCacheViewMixins::ActivePredicate(mosaicSets.Primary)
    {}
};

/// View on top of the mosaic cache.
class MosaicCacheView : public ReadOnlyViewSupplier<BasicMosaicCacheView> {
public:
    /// Creates a view around \a mosaicSets.
    explicit MosaicCacheView(const MosaicCacheTypes::BaseSets& mosaicSets) : ReadOnlyViewSupplier(mosaicSets)
    {}
};

PatriciaTreeCacheMixins is a simple struct that contains mixin aliases. Since mixins typically take multiple template arguments, this simplifies their use. MosaicCacheViewMixins further simplifies this by providing appropriate template arguments to PatriciaTreeCacheMixins.

In the definition of BasicMosaicCacheView, notice that the selected mixins are inherited. Instances of MoveOnly can only be moved but not copied accidentally. The constructor accepts the BaseSets discussed previously and constructs the inherited mixins.

Finally, MosaicCacheView uses ReadOnlyViewSupplier to add a function called asReadOnly. This returns a read only view (of type ReadOnlyView). The delta view exposes a similar function, which provides an easy way to write code that can interrogate both read only and writable cache views.

Cache Delta

The definition of the cache delta view, should look very similar to that of cache read only view:

/// Mixins used by the mosaic cache delta.
struct MosaicCacheDeltaMixins
        : public PatriciaTreeCacheMixins<MosaicCacheTypes::PrimaryTypes::BaseSetDeltaType, MosaicCacheDescriptor> {
    using Touch = HeightBasedTouchMixin<
        typename MosaicCacheTypes::PrimaryTypes::BaseSetDeltaType,
        typename MosaicCacheTypes::HeightGroupingTypes::BaseSetDeltaType>;
};

/// Basic delta on top of the mosaic cache.
class BasicMosaicCacheDelta
        : public utils::MoveOnly
        , public MosaicCacheDeltaMixins::Size
        , public MosaicCacheDeltaMixins::Contains
        , public MosaicCacheDeltaMixins::ConstAccessor
        , public MosaicCacheDeltaMixins::MutableAccessor
        , public MosaicCacheDeltaMixins::PatriciaTreeDelta
        , public MosaicCacheDeltaMixins::ActivePredicate
        , public MosaicCacheDeltaMixins::BasicInsertRemove
        , public MosaicCacheDeltaMixins::Touch
        , public MosaicCacheDeltaMixins::DeltaElements {
public:
    using ReadOnlyView = MosaicCacheTypes::CacheReadOnlyType;

public:
    /// Creates a delta around \a mosaicSets.
    explicit BasicMosaicCacheDelta(const MosaicCacheTypes::BaseSetDeltaPointers& mosaicSets);

public:
    using MosaicCacheDeltaMixins::ConstAccessor::find;
    using MosaicCacheDeltaMixins::MutableAccessor::find;

public:
    /// Inserts the mosaic \a entry into the cache.
    void insert(const state::MosaicEntry& entry);

    /// Removes the value identified by \a mosaicId from the cache.
    void remove(MosaicId mosaicId);

private:
    MosaicCacheTypes::PrimaryTypes::BaseSetDeltaPointerType m_pEntryById;
    MosaicCacheTypes::HeightGroupingTypes::BaseSetDeltaPointerType m_pMosaicIdsByExpiryHeight;
};

/// Delta on top of the mosaic cache.
class MosaicCacheDelta : public ReadOnlyViewSupplier<BasicMosaicCacheDelta> {
public:
    /// Creates a delta around \a mosaicSets.
    explicit MosaicCacheDelta(const MosaicCacheTypes::BaseSetDeltaPointers& mosaicSets) : ReadOnlyViewSupplier(mosaicSets)
    {}
};

PatriciaTreeCacheMixins is used again, but this time it is provided different template arguments. It is passed BaseSetDeltaType instead of BaseSetType. Additionally, an alias for a custom HeightBasedTouchMixin is added.

BasicMosaicCacheDelta is similar to BasicMosaicCacheView. Notice that it overrides insert and remove. The custom implementations of these functions handle updating the secondary store, which groups mosaic ids by expiration height. m_pEntryById and m_pMosaicIdsByExpiryHeight are pointers to the primary and secondary key value stores, which are assigned in the constructor.

MosaicCacheDelta also uses ReadOnlyViewSupplier to add a function called asReadOnly. This returns a read only view (of type ReadOnlyView). Importantly, the return type of this function is the same as in MosaicCacheView.

ℹ️ Notice that the same MosaicCacheDelta type is used for both Delta and DetachedDelta views. This allows observers to seamlessly work with either.

Serializers

In order to support RocksDB, each primary and secondary store must have an associated serializer. For the mosaic cache, the following serializers are defined:

/// Primary serializer for mosaic cache.
struct MosaicEntryPrimarySerializer : public CacheSerializerAdapter<state::MosaicEntrySerializer, MosaicCacheDescriptor> {};

/// Serializer for mosaic cache height grouped elements.
struct MosaicHeightGroupingSerializer : public IdentifierGroupSerializer<MosaicCacheTypes::HeightGroupingTypesDescriptor> {};

Both primary and secondary store serializers must define Save and Load functions for serializing and deserializing values to and from streams. Save accepts a value and a stream and serializes the value to the stream. Load accepts a stream and deserializes the first value.

There is a slight difference between serializers for primary and secondary stores. Primary serializers always write versioned data, but secondary stores do not. Primary serializers must always be able to handle data from any point in the blockchain. If they could not, state hashes would break. In contrast, secondary stores can always be completely rebuilt from primary store data. To support this, primary serializer must additionally define State_Version, which indicates the version of the state.

Reviewing the MosaicEntrySerializer we can see Save, Load and State_Version all defined:

/// Policy for saving and loading mosaic entry data.
struct MosaicEntrySerializer {
    /// Serialized state version.
    static constexpr uint16_t State_Version = 1;

    /// Saves \a entry to \a output.
    static void Save(const MosaicEntry& entry, io::OutputStream& output);

    /// Loads a single value from \a input.
    static MosaicEntry Load(io::InputStream& input);
};

Save and Load do not do anything with State_Version. Notice that the MosaicEntrySerializer is wrapped in a CacheSerializerAdapter. This adapter handles writing and checking State_Version. It also automatically prepares the streams passed to MosaicEntrySerializer.

In contrast, IdentifierGroupSerializer<...> is not wrapped in CacheSerializerAdapter. Nor does it define State_Version.

Patricia Tree

Lastly, let's look at the MosaicPatriciaTree:

using BasicMosaicPatriciaTree = tree::BasePatriciaTree<
    SerializerHashedKeyEncoder<MosaicCacheDescriptor::Serializer>,
    PatriciaTreeRdbDataSource,
    utils::BaseValueHasher<MosaicId>>;

class MosaicPatriciaTree : public BasicMosaicPatriciaTree {
public:
    using BasicMosaicPatriciaTree::BasicMosaicPatriciaTree;
    using Serializer = MosaicCacheDescriptor::Serializer;
};

BasicMosaicPatriciaTree is simply an alias for a BasePatriciaTree with specific template arguments provided. BasePatriciaTree is a wrapper around a Merkle Patricia Tree implementation that gives it similar copy on write semantics to BaseSet. SerializerHashedKeyEncoder defines a key encoder that converts a cache key into a Merkle Patricia Tree key. For the mosaic cache, it accepts a MosaicId and hashes it to produce a Merkle Patricia Tree key. PatriciaTreeRdbDataSource provides access to Merkle Patricia Tree nodes using RocksDB.

ℹ️ All the key value stores we've discussed are actually BaseSets. Conceptually, a BaseSet wraps a standard STL container and adds copy on write semantics. In reality, it does a bit more, including forwarding requests to RocksDB as appropriate. It is VERY template heavy code.

ℹ️ Why do we need a key encoder instead of using MosaicId directly as the Merkle Patricia Tree key? Since hashes are cryptographically random, using hashes leads to a more balanced tree, which leads to better performance. It also prevents attacks that could otherwise reduce performance by creating a very unbalanced tree.

Finally, note that the MosaicEntryPrimarySerializer is used for serializing the values that are stored in the tree. Importantly, this is the same serializer used to write the values to RocksDB. Since the exact same serializer is used in both places, we can be sure that they are always consistent.

Postscript

I hope you have a better understanding of how caches work in Symbol. I also hope you don't have too big of a headache!