Secondary Storage
Secondary (warm) storage provides a low-cost means to extend primary storage. As primary storage is typically based on fast but expensive NVMe technology, secondary storage can be used to extend local storage capacity using lower cost SSDs or spinning disks. Because of the use of lower cost technology, seocndary storage is often much higher capacity than primary storage. Only segment files get moved to the secondary storage. Older files get moved before more recent ones.
Files moved based on available disk space
When secondary storage is configured using the
SECONDARY_DATA_DIRECTORY
option, LogScale moves
segment files to secondary storage once the primary disk reaches the usage
threshold set using PRIMARY_STORAGE_PERCENTAGE
.
LogScale does not check what is using the space; it bases the
decision on the OS response to system calls for "disk space used" and
"disk space total", for the mount point that the primary data directory is
on.
When the PRIMARY_STORAGE_PERCENTAGE
threshold is exceeded,
LogScale copies files totaling the excess number of bytes to the
secondary storage, and then deletes the segment files from the primary
data directory. The files are selected based on the latest event timestamp
in them to keep the most recent events on the primary disk. This is done
to get the best possible query performance from the assumed faster primary
drive, since LogScale is normally used for querying the latest
data.
The extra storage gained is thus almost the available space of the secondary data directory, as only a single segment file is ever present on both volumes at once.
The secondary directory needs to be private to the LogScale node, just like the primary directory does.
Note
Never share data directories across LogScale nodes.
As an example, suppose you have a server with 1 TB NVMe being used for system files, and LogScale data. Adding a 2 TB SAN connection (or 2x2 TB local spinning disks in a mirror) and then designating that as secondary storage directory enables LogScale to store up to 2.8 TB, while still querying the latest ~800 GB from the NVMe, and also keeping all segment files still being constructed on the NVMe. When searching beyond what the NVMe holds, LogScale reads from the slower disks.
Configuration
Secondary storage is configured using the
SECONDARY_DATA_DIRECTORY
variable, as shown in the
following example:
# SECONDARY_DATA_DIRECTORY enables the feature
# and sets where to store the files.
SECONDARY_DATA_DIRECTORY=/secondaryMountPoint/humio-data2
# PRIMARY_STORAGE_PERCENTAGE options decide the amount of data (LogScale
# and otherwise) that the drive holding the data directory must at least hold
# before LogScale decides to move any segments files to the secondary location.
# If set to zero, LogScale will move files to secondary as soon as possible,
# which is when they become immutable completed segment files.
# (Default 80)
PRIMARY_STORAGE_PERCENTAGE=80
Note
When using Docker, make sure to mount the secondary directory into the container as well.
Add a new primary and make the current primary disk secondary
If you have a relatively slow disk as your primary storage, you might want to add a new, faster disk to the server, and use that disk as the primary storage, while leaving the bulk of the data on the old slow disk.
While this is possible, there is a process involved, as only completed segment files can reside in the secondary storage. All other support files, and segment files in progress (mini-segments), need to reside on the primary disk. LogScale must be shut down while this operation takes place.
Only files matching humiodata.*
(and bloom5*
) can stay on the
secondary storage; everything else must be on the primary. You must move
the soft links
*/humiodata.current
along with
the files they point to.
You will need to move some specific files from the "new secondary" (old
primary) onto the "new primary" while the system is shut down for that
to work, as some files must be on the primary. The list below shows
their names as they appear in
/humio-data
. The directory
structure must be preserved. The files to move are:
Files matching
dataspace_*/datasource_*/humiodata.current
For all the above
humiodata.current
soft links, the file the soft link points to as well.uploadedfiles
directoryglobal-data-snapshot.json
file
If the above files are moved from the secondary to the primary, you may leave the remaining segment files, and start out with almost all data being on secondary. Or, if you want, move selected parts of the completed segment files from secondary to primary as well to get improved performance from the new disk on searches that hit those. For example, you could move all segments that are less than seven days old if that matches the search typical search range for the system.
LogScale does not move files from secondary back to primary. Once the primary is full, LogScale starts migrating segment files from primary to secondary.
Migrating storage directories
When expanding the capacity for primary or secondary storage, and online expansion is not an option, you can move the existing data stored on the partitions and minimize downtime using rsync. Rsync allows you to sync only the new data between directories.
Assuming you are moving secondary data storage from
/var/lib/humio-secondary
to
/var/lib/humio-secondary-new
you
can do an initial rsync while
LogScale is running. Make sure your new mount has its owner and
user set appropriately, generally
humio:humio
.
rsync -acv /var/lib/humio-secondary/ /var/lib/humio-new-secondary/
This can be run multiple times.
When you are ready to complete the move, start by stopping LogScale:
systemctl stop humio
This command may differ depending on your LogScale deployment. To move the data written since your last rsync, the delete option removes files that are no longer needed from the destination. Double check your source and destination directories.
rsync -acv --delete /var/lib/humio-secondary/ /var/lib/humio-new-secondary/
There are two options for having LogScale use the new partition: you can either update the LogScale configuration to use the new mount or you can remount the new partition in the place of the old partitions mount point.
To update the LogScale configuration appropriately, change either
the value of the DIRECTORY
or
SECONDARY_DATA_DIRECTORY
option in
/etc/humio/server_XX.conf
file to
point to the new mount. Then restart LogScale:
systemctl start humio
To unmount and mount the new partition in place of LogScale, first
make sure that the appropriate changes are made for your OS, such as
updating the /etc/fstab
, and mount
the new partition in the location of the old one. Once that's done, start
LogScale.