In Symbol, the global blockchain state is stored in one or more caches. There are some core caches, but other caches can be added via the plugin system. The cache implementations are template-heavy, so they can be a bit confusing to understand. Let's examine the Mosaic Cache to understand how they work!
Every cache must have a descriptor that describes the cache. In the case of the mosaic cache, the descriptor looks like this:
struct MosaicCacheDescriptor {
public:
static constexpr auto Name = "MosaicCache";
public:
// key value types
using KeyType = MosaicId;
using ValueType = state::MosaicEntry;
// cache types
using CacheType = MosaicCache;
using CacheDeltaType = MosaicCacheDelta;
using CacheViewType = MosaicCacheView;
using Serializer = MosaicEntryPrimarySerializer;
using PatriciaTree = MosaicPatriciaTree;
public:
/// Gets the key corresponding to \a entry.
static auto GetKeyFromValue(const ValueType& entry) {
return entry.mosaicId();
}
};
Name
is the name of the cache.
When a cache database is enabled, the data for the cache will be stored in a directory with this name.
Also, this is frequently used in logs.
All data caches are structured as key value maps.
KeyType
is the type of the key and ValueType
is the type of the value.
There are a few requirements on these types:
ValueType
there must be a way to derive the corresponding KeyType
.
The function GetKeyFromValue
performs this derivation.
For this cache, MosaicEntry
directly exposes its mosaic id via the mosaicId
accessor.ValueType
must support serialization to and deserialization from a binary buffer.Each Symbol cache is composed of three parts:
CacheType
- owns the underlying dataCacheViewType
- a read only view on the dataCacheViewType
- an in-memory writable view on the data using copy on write semanticsSerializer
provides functions to serialized and deserialize ValueType
.
PatriciaTree
uses the Serializer
to implement a Merkle Patricia Tree.
KeyType
is type of key that is used to lookup data in a cache.
For the MosaicCache
this is defined as MosaicId
, which is a simple 64-bit unsigned integer value:
struct MosaicId_tag {};
using MosaicId = utils::BaseValue<uint64_t, MosaicId_tag>;
ℹ️ BaseValue is a wrapper around primitive types in order to prevent the mixing of different conceptual types. For example, while
MosaicId
andNamespaceId
are both 64-bit unsigned integers, they represent different things. If they were simply defined asuint64_t
there would be no errors assigning one to the other. UsingBaseValue
, this assignment would lead to a compile-time error.
ValueType
is type of value that is returned by the cache.
Notice that this is a C++ type, and not a simple struct.
This simplifies code interacting with the cache.
For the MosaicCache
this is defined as MosaicEntry
, which is defined below:
/// Tuple composed of a mosaic definition and its current state.
class PLUGIN_API_DEPENDENCY MosaicEntry : public MosaicEntrySupplyMixin {
public:
static constexpr auto Is_Deactivation_Destructive = false;
public:
/// Creates a mosaic entry around mosaic \a id and mosaic \a definition.
MosaicEntry(MosaicId id, const MosaicDefinition& definition);
public:
/// Gets the mosaic id.
MosaicId mosaicId() const;
/// Gets the mosaic definition.
const MosaicDefinition& definition() const;
/// Returns \c true if entry is active at \a height.
bool isActive(Height height) const;
private:
MosaicId m_id;
MosaicDefinition m_definition;
};
MosaicEntrySupplyMixin
is a mixin that contains data and functions relating to mosaic supply.
Usage of a mixin is not required.
These data and functions could be defined directly in MosaicEntry
.
The benefit of using a mixin is that it leads to cleaner code by keeping related data and functions grouped together.
Remember, that mosaics are artifacts that can expire. This adds two additional requirements to any value type:
Is_Deactivation_Destructive
- Flag set to true if the value changes on expiry and needs to be rehashed.
In the case of mosaics, this is false
because mosaics are not pruned.
In the case of namespaces and lock hashes, this is true
because both of those are pruned.isActive
- Function that accepts a height and returns true if the value is active at that height.In Symbol, each cache is simply a data store that provides different views to the underlying data. If you're working with plugins and extensions, you'll rarely have or need access to a cache directly. Still, it's important to understand how a cache works.
While caches are conceptually key value stores, for optimization reasons, they will often have secondary stores as well. Importantly, the cache is fully described by the data in the primary store. In fact, this is the only data used to build the Merkle Patricia Tree. Secondary stores are normally used for grouping data and should not contain any new data. In other words, all secondary stores should be able to be rebuilt from the primary store.
CacheTypes
structure is used to describe the key value stores:
/// Mosaic cache types.
struct MosaicCacheTypes {
public:
using CacheReadOnlyType = ReadOnlyArtifactCache<BasicMosaicCacheView, BasicMosaicCacheDelta, MosaicId, state::MosaicEntry>;
// region secondary descriptors
public:
struct HeightGroupingTypesDescriptor {
public:
using KeyType = Height;
using ValueType = utils::IdentifierGroup<MosaicId, Height, utils::BaseValueHasher<MosaicId>>;
using Serializer = MosaicHeightGroupingSerializer;
public:
static auto GetKeyFromValue(const ValueType& heightMosaics) {
return heightMosaics.key();
}
};
// endregion
public:
using PrimaryTypes = MutableUnorderedMapAdapter<MosaicCacheDescriptor, utils::BaseValueHasher<MosaicId>>;
using HeightGroupingTypes = MutableUnorderedMapAdapter<HeightGroupingTypesDescriptor, utils::BaseValueHasher<Height>>;
public:
using BaseSetDeltaPointers = MosaicBaseSetDeltaPointers;
using BaseSets = MosaicBaseSets;
};
Here, we can see a secondary store with its own descriptor: HeightGroupingTypesDescriptor
.
Notice that a secondary store descriptor only requires KeyType
, ValueType
and Serializer
.
These have the same meanings as in the primary type descriptor discussed previously.
PrimaryTypes
is the primary store.
It is a unordered map with a key type of MosaicId
and a value type of MosaicEntry
(from MosaicCacheDescriptor
).
Notice that the Mutable
variant must be used because MosaicEntry
is mutable.
This will enable copy on write semantics.
utils::BaseValueHasher
is simply a hash object for base value so that it can be used in STL unordered containers.
HeightGroupingTypes
is a secondary store.
It is a unordered map with a key type of Height
and a value type of IdentifierGroup<...>
(from HeightGroupingTypesDescriptor
).
Notice that the Mutable
variant must be used because IdentifierGroup<...>
is mutable.
The underlying cache data is composed of one or more stores.
All of these stores - primary and secondary key value stores along with an optional Merkle Patricia Tree store - are stored in a BaseSets
structure:
struct MosaicBaseSets : public CacheDatabaseMixin {
public:
/// Indicates the set is not ordered.
using IsOrderedSet = std::false_type;
public:
explicit MosaicBaseSets(const CacheConfiguration& config)
: CacheDatabaseMixin(config, { "default", "height_grouping" })
, Primary(GetContainerMode(config), database(), 0)
, HeightGrouping(GetContainerMode(config), database(), 1)
, PatriciaTree(hasPatriciaTreeSupport(), database(), 2)
{}
public:
MosaicCacheTypes::PrimaryTypes::BaseSetType Primary;
MosaicCacheTypes::HeightGroupingTypes::BaseSetType HeightGrouping;
CachePatriciaTree<MosaicPatriciaTree> PatriciaTree;
...
CacheDatabaseMixin
is responsible for initializing the underlying cache storage.
Typically, this will be RocksDB (when enableCacheDatabaseStorage
is true).
The RocksDB column family names are provided as the second argument.
When enableVerifiableState
is true, an additional patricia_tree
column family will be added automatically.
IsOrderedSet
needs to be provided (as std::true_type
or std::false_type
) to indicate whether or not the data is ordered.
Here it is not.
This is needed in order to allow pruning of ordered sets.
ℹ️ In this article we're looking at the most common type of caches - data caches. However, there are also a few runtime caches - like a hash cache that stores recent confirmed transaction hashes. Due to the temporal nature of these caches, they need to be pruned. Also, they are not input into the blockchain state hash because they can be completely rebuilt easily by processing a relatively small finite number of blocks.
Now that we have all of our descriptors set up, we can actually define the cache.
The cache is simply a wrapper around the data contained in the BaseSets
.
It adds the ability to create different types of views on that data:
View
- Read only view of the data.Delta
- Writable view of the data that can be committed back and change the underlying data.
Importantly, all changes are made using copy on write semantics.
If they are not committed, they will be rejected.
For example, if a block contains some valid transactions and then an invalid one, after the invalid one is rejected all in-memory changes will be reverted.DetachedDelta
- Writable view of the data that CANNOT be committed back nor change the underlying data.
This view is used most importantly during block production.
The harvester needs to ensure that the block it creates is valid and does not contain any conflicting transactions.
In order to accomplish this, it validates and executes transactions against the blockchain state while it builds the block.Now that we understand views, let's look at the MosaicCache
:
/// Cache composed of mosaic information.
using BasicMosaicCache = BasicCache<MosaicCacheDescriptor, MosaicCacheTypes::BaseSets>;
/// Synchronized cache composed of mosaic information.
class MosaicCache : public SynchronizedCache<BasicMosaicCache> {
public:
DEFINE_CACHE_CONSTANTS(Mosaic)
public:
/// Creates a cache around \a config.
explicit MosaicCache(const CacheConfiguration& config) : SynchronizedCache<BasicMosaicCache>(BasicMosaicCache(config))
{}
};
This is really simple because nearly all of the logic is common across all caches and has been refactored.
BasicCache
defines four important functions:
createView
- Creates a read only view of the data.createDelta
- Creates a writable view of the data that can be committed back and change the underlying data.createDetachedDelta
- Creates a writable view of the data that CANNOT be committed back nor change the underlying data.commit
- Commits pending changes in a delta view to the underlying storage.SynchronizedCache
adds reader writer lock semantics across all data views.
View creation acquires a READ lock in all cases.
A WRITE lock is only acquired in commit after all changes are validated and pending in memory.
Commit effectively flushes the changes from memory to storage.
DEFINE_CACHE_CONSTANTS
simply derives a cache id and a cache name from the cache friendly name (Mosaic
) and makes them accessible from the cache.
Due to similarities across cache views, most of the common cache functionality is refactored into mixins. For example, if a cache wants to expose functionality to check the existence of an item, it can add the contains mixin:
/// Mixin for adding contains support to a cache.
template<typename TSet, typename TCacheDescriptor>
class ContainsMixin {
private:
using KeyType = typename TCacheDescriptor::KeyType;
public:
/// Creates a mixin around \a set.
explicit ContainsMixin(const TSet& set) : m_set(set)
{}
public:
/// Gets a value indicating whether or not the cache contains an element with \a key.
bool contains(const KeyType& key) const {
return m_set.contains(key);
}
private:
const TSet& m_set;
};
This mixin simply wraps a const reference to a set.
It then exposes a contains
function that simply delegates to the one on the set.
Looking at the mosaic cache view, we can see it is simply composed of a lot of mixins:
/// Mixins used by the mosaic cache view.
struct MosaicCacheViewMixins : public PatriciaTreeCacheMixins<MosaicCacheTypes::PrimaryTypes::BaseSetType, MosaicCacheDescriptor> {};
/// Basic view on top of the mosaic cache.
class BasicMosaicCacheView
: public utils::MoveOnly
, public MosaicCacheViewMixins::Size
, public MosaicCacheViewMixins::Contains
, public MosaicCacheViewMixins::Iteration
, public MosaicCacheViewMixins::ConstAccessor
, public MosaicCacheDeltaMixins::PatriciaTreeView
, public MosaicCacheViewMixins::ActivePredicate {
public:
using ReadOnlyView = MosaicCacheTypes::CacheReadOnlyType;
public:
/// Creates a view around \a mosaicSets.
explicit BasicMosaicCacheView(const MosaicCacheTypes::BaseSets& mosaicSets)
: MosaicCacheViewMixins::Size(mosaicSets.Primary)
, MosaicCacheViewMixins::Contains(mosaicSets.Primary)
, MosaicCacheViewMixins::Iteration(mosaicSets.Primary)
, MosaicCacheViewMixins::ConstAccessor(mosaicSets.Primary)
, MosaicCacheViewMixins::PatriciaTreeView(mosaicSets.PatriciaTree.get())
, MosaicCacheViewMixins::ActivePredicate(mosaicSets.Primary)
{}
};
/// View on top of the mosaic cache.
class MosaicCacheView : public ReadOnlyViewSupplier<BasicMosaicCacheView> {
public:
/// Creates a view around \a mosaicSets.
explicit MosaicCacheView(const MosaicCacheTypes::BaseSets& mosaicSets) : ReadOnlyViewSupplier(mosaicSets)
{}
};
PatriciaTreeCacheMixins
is a simple struct that contains mixin aliases.
Since mixins typically take multiple template arguments, this simplifies their use.
MosaicCacheViewMixins
further simplifies this by providing appropriate template arguments to PatriciaTreeCacheMixins
.
In the definition of BasicMosaicCacheView
, notice that the selected mixins are inherited.
Instances of MoveOnly
can only be moved but not copied accidentally.
The constructor accepts the BaseSets
discussed previously and constructs the inherited mixins.
Finally, MosaicCacheView
uses ReadOnlyViewSupplier
to add a function called asReadOnly
.
This returns a read only view (of type ReadOnlyView
).
The delta view exposes a similar function, which provides an easy way to write code that can interrogate both read only and writable cache views.
The definition of the cache delta view, should look very similar to that of cache read only view:
/// Mixins used by the mosaic cache delta.
struct MosaicCacheDeltaMixins
: public PatriciaTreeCacheMixins<MosaicCacheTypes::PrimaryTypes::BaseSetDeltaType, MosaicCacheDescriptor> {
using Touch = HeightBasedTouchMixin<
typename MosaicCacheTypes::PrimaryTypes::BaseSetDeltaType,
typename MosaicCacheTypes::HeightGroupingTypes::BaseSetDeltaType>;
};
/// Basic delta on top of the mosaic cache.
class BasicMosaicCacheDelta
: public utils::MoveOnly
, public MosaicCacheDeltaMixins::Size
, public MosaicCacheDeltaMixins::Contains
, public MosaicCacheDeltaMixins::ConstAccessor
, public MosaicCacheDeltaMixins::MutableAccessor
, public MosaicCacheDeltaMixins::PatriciaTreeDelta
, public MosaicCacheDeltaMixins::ActivePredicate
, public MosaicCacheDeltaMixins::BasicInsertRemove
, public MosaicCacheDeltaMixins::Touch
, public MosaicCacheDeltaMixins::DeltaElements {
public:
using ReadOnlyView = MosaicCacheTypes::CacheReadOnlyType;
public:
/// Creates a delta around \a mosaicSets.
explicit BasicMosaicCacheDelta(const MosaicCacheTypes::BaseSetDeltaPointers& mosaicSets);
public:
using MosaicCacheDeltaMixins::ConstAccessor::find;
using MosaicCacheDeltaMixins::MutableAccessor::find;
public:
/// Inserts the mosaic \a entry into the cache.
void insert(const state::MosaicEntry& entry);
/// Removes the value identified by \a mosaicId from the cache.
void remove(MosaicId mosaicId);
private:
MosaicCacheTypes::PrimaryTypes::BaseSetDeltaPointerType m_pEntryById;
MosaicCacheTypes::HeightGroupingTypes::BaseSetDeltaPointerType m_pMosaicIdsByExpiryHeight;
};
/// Delta on top of the mosaic cache.
class MosaicCacheDelta : public ReadOnlyViewSupplier<BasicMosaicCacheDelta> {
public:
/// Creates a delta around \a mosaicSets.
explicit MosaicCacheDelta(const MosaicCacheTypes::BaseSetDeltaPointers& mosaicSets) : ReadOnlyViewSupplier(mosaicSets)
{}
};
PatriciaTreeCacheMixins
is used again, but this time it is provided different template arguments.
It is passed BaseSetDeltaType
instead of BaseSetType
.
Additionally, an alias for a custom HeightBasedTouchMixin
is added.
BasicMosaicCacheDelta
is similar to BasicMosaicCacheView
.
Notice that it overrides insert
and remove
.
The custom implementations of these functions handle updating the secondary store, which groups mosaic ids by expiration height.
m_pEntryById
and m_pMosaicIdsByExpiryHeight
are pointers to the primary and secondary key value stores, which are assigned in the constructor.
MosaicCacheDelta
also uses ReadOnlyViewSupplier
to add a function called asReadOnly
.
This returns a read only view (of type ReadOnlyView
).
Importantly, the return type of this function is the same as in MosaicCacheView
.
ℹ️ Notice that the same
MosaicCacheDelta
type is used for bothDelta
andDetachedDelta
views. This allows observers to seamlessly work with either.
In order to support RocksDB, each primary and secondary store must have an associated serializer. For the mosaic cache, the following serializers are defined:
/// Primary serializer for mosaic cache.
struct MosaicEntryPrimarySerializer : public CacheSerializerAdapter<state::MosaicEntrySerializer, MosaicCacheDescriptor> {};
/// Serializer for mosaic cache height grouped elements.
struct MosaicHeightGroupingSerializer : public IdentifierGroupSerializer<MosaicCacheTypes::HeightGroupingTypesDescriptor> {};
Both primary and secondary store serializers must define Save
and Load
functions for serializing and deserializing values to and from streams.
Save
accepts a value and a stream and serializes the value to the stream.
Load
accepts a stream and deserializes the first value.
There is a slight difference between serializers for primary and secondary stores.
Primary serializers always write versioned data, but secondary stores do not.
Primary serializers must always be able to handle data from any point in the blockchain.
If they could not, state hashes would break.
In contrast, secondary stores can always be completely rebuilt from primary store data.
To support this, primary serializer must additionally define State_Version
, which indicates the version of the state.
Reviewing the MosaicEntrySerializer
we can see Save
, Load
and State_Version
all defined:
/// Policy for saving and loading mosaic entry data.
struct MosaicEntrySerializer {
/// Serialized state version.
static constexpr uint16_t State_Version = 1;
/// Saves \a entry to \a output.
static void Save(const MosaicEntry& entry, io::OutputStream& output);
/// Loads a single value from \a input.
static MosaicEntry Load(io::InputStream& input);
};
Save
and Load
do not do anything with State_Version
.
Notice that the MosaicEntrySerializer
is wrapped in a CacheSerializerAdapter
.
This adapter handles writing and checking State_Version
.
It also automatically prepares the streams passed to MosaicEntrySerializer
.
In contrast, IdentifierGroupSerializer<...>
is not wrapped in CacheSerializerAdapter
.
Nor does it define State_Version
.
Lastly, let's look at the MosaicPatriciaTree
:
using BasicMosaicPatriciaTree = tree::BasePatriciaTree<
SerializerHashedKeyEncoder<MosaicCacheDescriptor::Serializer>,
PatriciaTreeRdbDataSource,
utils::BaseValueHasher<MosaicId>>;
class MosaicPatriciaTree : public BasicMosaicPatriciaTree {
public:
using BasicMosaicPatriciaTree::BasicMosaicPatriciaTree;
using Serializer = MosaicCacheDescriptor::Serializer;
};
BasicMosaicPatriciaTree
is simply an alias for a BasePatriciaTree
with specific template arguments provided.
BasePatriciaTree
is a wrapper around a Merkle Patricia Tree implementation that gives it similar copy on write semantics to BaseSet
.
SerializerHashedKeyEncoder
defines a key encoder that converts a cache key into a Merkle Patricia Tree key.
For the mosaic cache, it accepts a MosaicId
and hashes it to produce a Merkle Patricia Tree key.
PatriciaTreeRdbDataSource
provides access to Merkle Patricia Tree nodes using RocksDB.
ℹ️ All the key value stores we've discussed are actually
BaseSet
s. Conceptually, aBaseSet
wraps a standard STL container and adds copy on write semantics. In reality, it does a bit more, including forwarding requests to RocksDB as appropriate. It is VERY template heavy code.
ℹ️ Why do we need a key encoder instead of using
MosaicId
directly as the Merkle Patricia Tree key? Since hashes are cryptographically random, using hashes leads to a more balanced tree, which leads to better performance. It also prevents attacks that could otherwise reduce performance by creating a very unbalanced tree.
Finally, note that the MosaicEntryPrimarySerializer
is used for serializing the values that are stored in the tree.
Importantly, this is the same serializer used to write the values to RocksDB.
Since the exact same serializer is used in both places, we can be sure that they are always consistent.
I hope you have a better understanding of how caches work in Symbol. I also hope you don't have too big of a headache!