Class RocksDBStoragePartition
- java.lang.Object
-
- com.linkedin.davinci.store.AbstractStoragePartition
-
- com.linkedin.davinci.store.rocksdb.RocksDBStoragePartition
-
- Direct Known Subclasses:
ReplicationMetadataRocksDBStoragePartition
@NotThreadSafe public class RocksDBStoragePartition extends AbstractStoragePartition
InRocksDBStoragePartition, it assumes the update(insert/delete) will happen sequentially. If the batch push is bytewise-sorted by key, this class is leveragingSstFileWriterto generate the SST file directly and ingest all the generated SST files into the RocksDB database at the end of the push. If the ingestion is unsorted, this class is using the regular RocksDB interface to support update operations.
-
-
Field Summary
Fields Modifier and Type Field Description protected booleanblobTransferEnabledprotected java.util.List<org.rocksdb.ColumnFamilyDescriptor>columnFamilyDescriptorsprotected java.util.List<org.rocksdb.ColumnFamilyHandle>columnFamilyHandleListColumn Family is the concept in RocksDB to create isolation between different value for the same key.protected booleandeferredWriteWhether the input is sorted or not.protected intpartitionIdprotected static org.rocksdb.ReadOptionsREAD_OPTIONS_DEFAULTprotected java.util.concurrent.locks.ReentrantReadWriteLockreadCloseRWLockSince all the modification functions are synchronized, we don't need any other synchronization for the update path to guard RocksDB closing behavior.protected booleanreadOnlyWhether the database is read only or not.protected booleanreadWriteLeaderForDefaultCFprotected booleanreadWriteLeaderForRMDCFprotected java.lang.StringreplicaIdprotected org.rocksdb.RocksDBrocksDBprotected java.lang.StringstoreNameprotected java.lang.StringstoreNameAndVersionprotected booleanwriteOnlyprotected org.rocksdb.WriteOptionswriteOptionsHere RocksDB disables WAL, but relies on the 'flush', which will be invoked throughsync()to avoid data loss during recovery.
-
Constructor Summary
Constructors Modifier Constructor Description RocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, java.lang.String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, VeniceStoreVersionConfig storeConfig)protectedRocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, java.lang.String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, java.util.List<byte[]> columnFamilyNameList, VeniceStoreVersionConfig storeConfig)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidbeginBatchWrite(java.util.Map<java.lang.String,java.lang.String> checkpointedInfo, java.util.Optional<java.util.function.Supplier<byte[]>> expectedChecksumSupplier)booleancheckDatabaseIntegrity(java.util.Map<java.lang.String,java.lang.String> checkpointedInfo)checks whether the current state of the database is valid during the start of ingestion.voidclose()Close the specific partitionvoidcreateSnapshot()Creates a snapshot of the current state of the storage if the blob transfer feature is enabled via the store configurationvoiddelete(byte[] key)Delete a key from the partition databasevoiddeleteFilesInDirectory(java.lang.String fullPath)voiddrop()Drop when it is not required anymore.voidendBatchWrite()byte[]get(byte[] key)Get a value from the partition databasejava.nio.ByteBufferget(byte[] key, java.nio.ByteBuffer valueToBePopulated)byte[]get(java.nio.ByteBuffer keyBuffer)<K,V>
Vget(K key)Get a Value from the partition databasejava.util.Map<org.rocksdb.MemoryUsageType,java.lang.Long>getApproximateMemoryUsageByType(java.util.Set<org.rocksdb.Cache> caches)protected java.lang.BooleangetBlobTransferEnabled()voidgetByKeyPrefix(byte[] keyPrefix, BytesStreamingCallback callback)Populate provided callback with key-value pairs from the partition database where the keys have provided prefix.protected java.util.List<org.rocksdb.ColumnFamilyHandle>getColumnFamilyHandleList()protected org.rocksdb.EnvOptionsgetEnvOptions()java.lang.StringgetFullPathForTempSSTFileDir()AbstractStorageIteratorgetIterator()protected org.rocksdb.OptionsgetOptions()longgetPartitionSizeInBytes()Get the partition database size in byteslonggetRmdByteUsage()RocksDBSstFileWritergetRocksDBSstFileWriter()longgetRocksDBStatValue(java.lang.String statName)protected org.rocksdb.OptionsgetStoreOptions(StoragePartitionConfig storagePartitionConfig, boolean isRMD)protected voidmakeSureRocksDBIsStillOpen()java.util.List<byte[]>multiGet(java.util.List<byte[]> keys)java.util.List<java.nio.ByteBuffer>multiGet(java.util.List<java.nio.ByteBuffer> keys, java.util.List<java.nio.ByteBuffer> values)voidput(byte[] key, byte[] value)Puts a value into the partition databasevoidput(byte[] key, java.nio.ByteBuffer valueBuffer)<K,V>
voidput(K key, V value)voidreopen()Reopen the underlying RocksDB database, and this operation will unload the data cached in memory.java.util.Map<java.lang.String,java.lang.String>sync()Sync current database.booleanvalidateBatchIngestion()booleanverifyConfig(StoragePartitionConfig partitionConfig)-
Methods inherited from class com.linkedin.davinci.store.AbstractStoragePartition
deleteWithReplicationMetadata, getPartitionId, getReplicationMetadata, putReplicationMetadata, putWithReplicationMetadata, putWithReplicationMetadata
-
-
-
-
Field Detail
-
READ_OPTIONS_DEFAULT
protected static final org.rocksdb.ReadOptions READ_OPTIONS_DEFAULT
-
writeOptions
protected final org.rocksdb.WriteOptions writeOptions
Here RocksDB disables WAL, but relies on the 'flush', which will be invoked throughsync()to avoid data loss during recovery.
-
replicaId
protected final java.lang.String replicaId
-
storeName
protected final java.lang.String storeName
-
storeNameAndVersion
protected final java.lang.String storeNameAndVersion
-
blobTransferEnabled
protected final boolean blobTransferEnabled
-
partitionId
protected final int partitionId
-
readCloseRWLock
protected final java.util.concurrent.locks.ReentrantReadWriteLock readCloseRWLock
Since all the modification functions are synchronized, we don't need any other synchronization for the update path to guard RocksDB closing behavior. The followingreadCloseRWLockis only used to guardget(byte[])since we don't want to synchronize get requests.
-
rocksDB
protected org.rocksdb.RocksDB rocksDB
-
deferredWrite
protected final boolean deferredWrite
Whether the input is sorted or not.
deferredWrite = sortedInput => ingested via batch push which is sorted in VPJ, can useRocksDBSstFileWriterto ingest the input data to RocksDB
!deferredWrite = !sortedInput => can not use RocksDBSstFileWriter for ingestion
-
readOnly
protected final boolean readOnly
Whether the database is read only or not.
-
writeOnly
protected final boolean writeOnly
-
readWriteLeaderForDefaultCF
protected final boolean readWriteLeaderForDefaultCF
-
readWriteLeaderForRMDCF
protected final boolean readWriteLeaderForRMDCF
-
columnFamilyHandleList
protected final java.util.List<org.rocksdb.ColumnFamilyHandle> columnFamilyHandleList
Column Family is the concept in RocksDB to create isolation between different value for the same key. All KVs are stored in `DEFAULT` column family, if no column family is specified. If we stores replication metadata in the RocksDB, we stored it in a separated column family. We will insert all the column family descriptors into columnFamilyDescriptors and pass it to RocksDB when opening the store, and it will fill the columnFamilyHandles with handles which will be used when we want to put/get/delete from different RocksDB column families.
-
columnFamilyDescriptors
protected final java.util.List<org.rocksdb.ColumnFamilyDescriptor> columnFamilyDescriptors
-
-
Constructor Detail
-
RocksDBStoragePartition
protected RocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, java.lang.String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, java.util.List<byte[]> columnFamilyNameList, VeniceStoreVersionConfig storeConfig)
-
RocksDBStoragePartition
public RocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, java.lang.String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, VeniceStoreVersionConfig storeConfig)
-
-
Method Detail
-
makeSureRocksDBIsStillOpen
protected void makeSureRocksDBIsStillOpen()
-
getEnvOptions
protected org.rocksdb.EnvOptions getEnvOptions()
-
getBlobTransferEnabled
protected java.lang.Boolean getBlobTransferEnabled()
-
getStoreOptions
protected org.rocksdb.Options getStoreOptions(StoragePartitionConfig storagePartitionConfig, boolean isRMD)
-
getColumnFamilyHandleList
protected java.util.List<org.rocksdb.ColumnFamilyHandle> getColumnFamilyHandleList()
-
getRmdByteUsage
public long getRmdByteUsage()
- Overrides:
getRmdByteUsagein classAbstractStoragePartition
-
checkDatabaseIntegrity
public boolean checkDatabaseIntegrity(java.util.Map<java.lang.String,java.lang.String> checkpointedInfo)
Description copied from class:AbstractStoragePartitionchecks whether the current state of the database is valid during the start of ingestion.- Overrides:
checkDatabaseIntegrityin classAbstractStoragePartition
-
beginBatchWrite
public void beginBatchWrite(java.util.Map<java.lang.String,java.lang.String> checkpointedInfo, java.util.Optional<java.util.function.Supplier<byte[]>> expectedChecksumSupplier)- Overrides:
beginBatchWritein classAbstractStoragePartition
-
endBatchWrite
public void endBatchWrite()
- Overrides:
endBatchWritein classAbstractStoragePartition
-
createSnapshot
public void createSnapshot()
Description copied from class:AbstractStoragePartitionCreates a snapshot of the current state of the storage if the blob transfer feature is enabled via the store configuration- Specified by:
createSnapshotin classAbstractStoragePartition
-
put
public void put(byte[] key, byte[] value)Description copied from class:AbstractStoragePartitionPuts a value into the partition database- Specified by:
putin classAbstractStoragePartition
-
put
public void put(byte[] key, java.nio.ByteBuffer valueBuffer)- Specified by:
putin classAbstractStoragePartition
-
put
public <K,V> void put(K key, V value)- Specified by:
putin classAbstractStoragePartition
-
get
public byte[] get(byte[] key)
Description copied from class:AbstractStoragePartitionGet a value from the partition database- Specified by:
getin classAbstractStoragePartition- Parameters:
key- key to be retrieved- Returns:
- null if the key does not exist, byte[] value if it exists.
-
get
public java.nio.ByteBuffer get(byte[] key, java.nio.ByteBuffer valueToBePopulated)- Overrides:
getin classAbstractStoragePartition
-
get
public <K,V> V get(K key)
Description copied from class:AbstractStoragePartitionGet a Value from the partition database- Specified by:
getin classAbstractStoragePartition- Type Parameters:
K- the type for KeyV- the type for the return value- Parameters:
key- key to be retrieved- Returns:
- null if the key does not exist, V value if it exists
-
get
public byte[] get(java.nio.ByteBuffer keyBuffer)
- Specified by:
getin classAbstractStoragePartition
-
multiGet
public java.util.List<byte[]> multiGet(java.util.List<byte[]> keys)
-
multiGet
public java.util.List<java.nio.ByteBuffer> multiGet(java.util.List<java.nio.ByteBuffer> keys, java.util.List<java.nio.ByteBuffer> values)
-
getByKeyPrefix
public void getByKeyPrefix(byte[] keyPrefix, BytesStreamingCallback callback)Description copied from class:AbstractStoragePartitionPopulate provided callback with key-value pairs from the partition database where the keys have provided prefix. If prefix is null, callback will be populated will all key-value pairs from the partition database.- Specified by:
getByKeyPrefixin classAbstractStoragePartition
-
validateBatchIngestion
public boolean validateBatchIngestion()
- Overrides:
validateBatchIngestionin classAbstractStoragePartition
-
delete
public void delete(byte[] key)
Description copied from class:AbstractStoragePartitionDelete a key from the partition database- Specified by:
deletein classAbstractStoragePartition
-
sync
public java.util.Map<java.lang.String,java.lang.String> sync()
Description copied from class:AbstractStoragePartitionSync current database.- Specified by:
syncin classAbstractStoragePartition- Returns:
- Database related info, which is required to be checkpointed.
-
deleteFilesInDirectory
public void deleteFilesInDirectory(java.lang.String fullPath)
-
drop
public void drop()
Description copied from class:AbstractStoragePartitionDrop when it is not required anymore.- Specified by:
dropin classAbstractStoragePartition
-
close
public void close()
Description copied from class:AbstractStoragePartitionClose the specific partition- Specified by:
closein classAbstractStoragePartition
-
reopen
public void reopen()
Reopen the underlying RocksDB database, and this operation will unload the data cached in memory.- Overrides:
reopenin classAbstractStoragePartition
-
getRocksDBStatValue
public long getRocksDBStatValue(java.lang.String statName)
-
getApproximateMemoryUsageByType
public java.util.Map<org.rocksdb.MemoryUsageType,java.lang.Long> getApproximateMemoryUsageByType(java.util.Set<org.rocksdb.Cache> caches)
-
verifyConfig
public boolean verifyConfig(StoragePartitionConfig partitionConfig)
- Specified by:
verifyConfigin classAbstractStoragePartition- Parameters:
partitionConfig-- Returns:
-
getPartitionSizeInBytes
public long getPartitionSizeInBytes()
Description copied from class:AbstractStoragePartitionGet the partition database size in bytes- Specified by:
getPartitionSizeInBytesin classAbstractStoragePartition- Returns:
- partition database size
-
getOptions
protected org.rocksdb.Options getOptions()
-
getFullPathForTempSSTFileDir
public java.lang.String getFullPathForTempSSTFileDir()
-
getRocksDBSstFileWriter
public RocksDBSstFileWriter getRocksDBSstFileWriter()
-
getIterator
public AbstractStorageIterator getIterator()
- Overrides:
getIteratorin classAbstractStoragePartition
-
-