Media database performance and recommendations

When designing or specifying an XProtect VMS system, there are many variables that impact the load and performance of the recording server and storage system, and thus the design of the XProtect VMS, for instance:

Devices
- Video resolution, video codec, framerate, bitrate, etc.
Recording server
- Recording server spec’s
- Number of devices per server
Storage solution
- Disks or combination of disks to use
- RAID level
- Use of archiving
When and what to record
- Record – always, record on motion, record on event, or a combination
- Record – video, audio, metadata, or a combination
How many users are expected to view recorded media at the same time

Recording database

When choosing a storage system for the recording database, it would be optimal if the XProtect VMS requirements could be calculated and the specific performance of the storage system could be looked up in a table or specification sheet – maybe as IOPS as this is often thought to be a good metric. Unfortunately, IOPS only makes sense if below is true:

The size of the data block being written by the application matches the size used for the IOPS specified for the storage system
The size of the data blocks being written is of a constant size
The access time (or response time) is known – which depends on knowing the level of sequential/non-sequential disk access

In the real world, an XProtect VMS system and its usage of the storage system is much more dynamic which makes IOPS meaningless:

Data blocks vary in size from large blocks of video data to very small blocks of metadata
In addition to the media data being recorded, the media database at frequent but variable intervals, makes small updates to index files
Video, audio, and metadata streams are not of a fixed size, they vary over time depending on what is being captured by the camera, microphone, or sensor
The disk access time varies over time depending on what type of data is being written

This means that because the specific application load and storage performance cannot be known or calculated, an exact answer to what storage performance is needed cannot be given.

However, based on knowledge of how the XProtect VMS and media databases work, as well as results from tests and real-world experience, the below tables can be used to get a general idea of the recommended disk types as a function of cameras per recording server and bitrate/framerate.

Disk recommendations

The reason the 'per camera’ units in the below tables are different for cameras using MPEG and MJPEG is that for MPEG devices, video is by default received as one GOP per second regardless of the framerate, and thus is stored as a single record in the media database per second no matter the framerate. So with MPEG it is the bitrate that mostly matters and not the framerate.

With MJPEG each individual image is stored as a separate record in the media database. So with MJPEG the framerate has a much higher impact on the disk performance than the bitrate.

Recording video only

Below tables show the general disk recommendations for various scenarios.

The following is assumed:

Pre-buffer is running in memory
Only video is recorded - no audio or metadata
Video is only recorded when motion is detected
Recordings are not archived
When using HDDs, they run at 10.000 RPM or more and have low average access time
Storage is configured using RAID 1

Note: If enabling archiving, the recommendations in above tables can still be used by dividing the number of cameras listed in the tables by two.

Recording video, audio, and metadata

When enabling recording of audio and metadata as well, the load on the storage system increases – not so much because of the increase in total bitrate being recorded, but because each new audio and metadata device produces data that also needs to be stored on the disk. Because there are more devices, there will be an increase in non-sequential disk access which causes HDDs to perform slower.

So when recording audio and metadata in addition to video, a faster storage system is needed – even though the bitrate percentagewise doesn’t increase much.

Below tables show the general disk recommendations for various scenarios.

The following is assumed:

Pre-buffer is running in memory
A camera, a microphone, a speaker, and a metadata source are listed as a ‘unit’ in the tables
Video, audio and metadata are only recorded when motion is detected on the video stream
Recordings are not archived
When using HDDs, they run at 10.000 RPM or more and have low average access time
Storage is configured using RAID 1

Note: If enabling archiving, the recommendations in above tables can still be used by dividing the number of cameras listed in the tables by two.

Archiving

As with the recording database, the recommendations for using archiving depend on the needs and requirements.

In general, with many devices and long retention times, it is recommended to use archiving. Below tables provide a general recommendation for when to use archiving in various scenarios. Archiving is recommended for boxes with a checkmark.

A picture containing text, screenshot, cabinet

Description automatically generated

Assuming the storage system for the archive is fast enough to sequentially write the needed data in the time between archives, then all disk and storage technologies as well as RAID levels are equally suitable for storing the archive database.

Reducing framerate when archiving

If using MJPEG and storing the recordings for a long period, reducing the framerate over time can be a good way to reduce the amount of storage needed.

When using MPEG-4, H.264, H.265 codecs, the reduction in needed storage is typically smaller because the keyframe in each GOP can use up to 50-80% of the data in the GOP. Furthermore, the process of archiving the recordings puts an extra load on the recording servers and storage system, so in most cases the benefit does not outweigh the cost.

Multiple storage configurations

When storing recordings for more than 20,000 ‘device-days’ it is recommended to define two or more storage configurations in the recording server and distribute the devices across them. These storage configurations do not need to record to separate disks. They can use the same disk(s) as long as the recordings are stored in different folders.

The reason for this recommendation is that large scale and long-term testing has shown that disk performance in Microsoft Windows slowly degrades once a very high number of folders and subfolders exist in the path used for recording.

A ‘device-day’ is equal to a single device storing recordings for one day. This means that ‘device-days’ can be calculated by multiplying number of devices by the retention time.

For example, 400 cameras, 400 microphones, 400 speakers and 30 days of retention in the same storage configuration (400 + 400 + 400) * 30 = 36,000 ‘device-days’.

If two storage configurations are used and share the same drive, each running 200 cameras, 200 microphones and 200 speakers, the ‘device-days’ will drop to 18,000 per storage configuration ensuring constant high performance.

Number of recording servers

In XProtect VMS installations with thousands of cameras it may seem like a good idea to run as many cameras per recording server as possible to keep the number of servers and storage systems as low as possible.

However, although a single recording server can run more than 500 cameras with related audio and metadata devices enabled, it is often more cost efficient to run fewer cameras per server.

The reason for this is that the ratio between cost and performance for the servers and storage systems doesn’t scale linearly. Moreover, the performance of the individual disks in the storage system also doesn’t scale linearly with more devices recording to it as the disk access becomes even more fragmented and non-sequential.

With that said, it doesn’t mean that a recording server should run as few cameras as possible as that also becomes expensive and requires a lot of servers and physical space as well as maintenance. Depending on configuration, the sweet spot between performance and cost is typically in the 200-300 cameras per recording server range.

Another advantage of not running as many cameras as possible per recording server is that it makes it cheaper to grow the installation with more recording servers, enable use of failover recording servers or replace broken servers.

Dual stream recording

All XProtect VMS products support dual stream recording which is a feature where it can be enabled to record two different streams from the same camera. When dual stream recording is enabled, the storage system perspective is that it corresponds to adding an additional camera to the VMS as the second recorded stream will have its own media database, and thus increase the storage system load.

When enabled, the XProtect VMS Smart Client can automatically select which recorded stream to playback, or alternatively, the user can manually select which one to playback.

Video codecs

Typical video codecs used in XProtect VMS installations nowadays are: MJPEG, H.264 & H.265 - including smart variants – like, for instance Axis Zipstream. Assuming the same image resolution and framerate are used, the five codecs have the following characteristics.

	MJPEG	H.264	H.264 Smart codec	H.265	H.265 Smart codec
Format	Single image	GOP	GOP	GOP	GOP
Latency	Low	Medium	Medium	Medium	Medium
Bandwidth & Storage needs	Very high	Medium-Low	Low	Low	Very Low
Suited for manually controlled PTZ cameras	Yes	Not optimal	Not optimal	Not optimal	Not optimal
Processing needed for decoding	Low	Medium	Medium	Medium-High depending on CPU/GPU	Medium-High depending on CPU/GPU

H.264/H.265 and their smart variants are normally the best choice of codec. However, in some cases like for instance with manually controlled PTZ cameras, MJPEG may provide a smaller latency and better user experience when viewing the video feed live. In this case it is recommended to configure two streams for the PTZ camera. One stream using MJPEG for live viewing, and a second stream using H.264/H.265 for recording.

GOP length

In the XProtect VMS, the default GOP length for MPEG-4, H.264 and H.265 is 1 second. The GOP length can be adjusted as needed. When doing so it will have an impact on the XProtect VMS compared to the standard 1 second GOP.

	Shorter GOP e.g. 0.5 second long	Standard GOP 1 second long	Longer GOP e.g. 2 second long
Bandwidth	More needed	Standard	Less needed
Storage space	More needed	Standard	Less needed
Storage performance	Lower	Standard	Higher
Video quality	Potentially higher	Standard	Potentially lower
Load of doing VMD on keyframes	Higher - VMD is done twice as often	Standard	Lower – VMD is done half as often
Load of doing VMD at 1 second intervals	Standard	Standard	Extremely high – cannot be done on keyframes only, so all video frames must be decoded
Decode and show live video	Faster to show initial video	Standard	Slower to show initial video
Decode and display a random recorded image	Less resources needed and faster to display	Standard	More resources needed and slower to display