22461 - SCTE Broadband - Dec2025 COMPLETE v1

TECHNICAL

to, at most, a few hundred video clip samples.

To determine how well Umetrix Video Non-Reference Compression compares to the full-reference VMAF metric, we use a four-step process. First, we start by creating a score baseline data set containing several thousand video clips. We begin with source videos that contain a wide variety of scene types. These scenes vary in image complexity, lighting, colour and other attributes. Each clip is encoded multiple times at varying levels of compression to produce a large data set containing the types of video anomalies that are produced by compression encoding. As the degree of compression increases, the encoder typically ignores fine details and uses larger block sizes, causing blurriness and blockiness in the rendered video. Next, we generate a VMAF score for every clip in the data set by passing the data set through the VMAF full-reference algorithm. This gives us a baseline data set of video clips with VMAF scores for a wide variety of scene types and compression levels. The baseline data set also contains the reference video associated with each of the encoded clips. Chart 1 shows the distribution of VMAF values for the video clips in the score baseline dataset. In the third step, we run the encoded clips in the baseline data set through our own Umetrix Video Non-Reference Compression model to produce a video quality score for each clip. At this stage, like VMAF, Umetrix Video is producing a 1-to-100 score. Unlike VMAF, of course, Umetrix only “sees” the compressed clip, not the reference video. Lastly, we observe the correlation between the intended score (VMAF score) and our own Umetrix score for each of the thousand compressed clips

blockiness, blurriness or choppy motion for what they are. Today, using machine- learning technology, systems are able to automatically recognise these artifacts just by evaluating the displayed video, and then scoring the video with a metric that correlates tightly to human perceptual scores. This is what Spirent’s Umetrix Video Non-Reference (NR) system does. The “record once, score many” architecture supports packet, frame, full-reference and non-reference algorithms that can be combined for specific video environments and use cases. Spirent offers non- reference models for detection of a variety of artifacts, including NR-Compression, NR-Buffering, NR-Scaling and NR-Live.

Spirent has created a method by which a very large training data set can be produced. This is critical, because machine-learning systems improve, in terms of prediction precision, as the training data set size increases. Rather than a training data set with a few key types of artifact samples spread across a few select types of scenes, our training data set covers a vast array of artifact samples, with each type of artifact and each degree of distortion being represented in a wide variety of video content. Rather than using a data set limited to hundreds of samples, Spirent’s training data set contains literally hundreds of thousands of sample videos. That’s the key to having a successful machine learning system like BRISQUE that is able to recognise image artifacts in the varied content found in the real world.

How Do Spirent’s NR Algorithms Work?

We start with a form of artificial intelligence known as machine-learning. Broadly speaking, this means that we’ve taken a support vector-based supervised learning system and trained it to solve a particular problem. The algorithm “sees” the rendered video, typically as a digital video stream such as HDMI. By repeatedly giving the algorithm sample video clips and corresponding (desired) video quality scores, it builds up knowledge akin to “If I see this, then the score must be that.” Each particular NR model (Compression, Buffering, Scaling, etc.) is trained on a pertinent set of artifacts. Our Umetrix Video Non-Reference models are currently built on a variation of the BRISQUE (Blind/Referenceless Image Spatial QUality Evaluator) video quality assessment model. BRISQUE is a state-of-the-art natural scene statistic- based blind quality assessment tool developed at the University of Texas at Austin’s Laboratory for Image and Video Engineering (LIVE). BRISQUE has become one of the most-used quality assessment tools in broadcast and content production environments. Spirent’s BRISQUE implementation, like all machine-learning-based systems, needs to be trained. To accomplish this training, we develop a training dataset containing a large number of video sample clips along with associated video quality scores. Traditionally, training data sets are based on human subjective mean opinion scores. The issue with such data sets is that human scoring is expensive, difficult and slow, and therefore necessarily limited

And How Well Do They Work?

To determine how well the Umetrix Video Non-Reference Compression model works, we’ll need to compare it to a well-known metric. One of the best in the industry today is VMAF, the Video Multi-method Assessment Fusion metric developed by Netflix and the University of Southern California. VMAF is itself a video scoring system that combines human perceptual vision modeling with artificial intelligence to produce a 1-to-100 scale quality score. Note that VMAF relies on a pristine reference video for comparison. VMAF has been shown to be superior to many other algorithms in terms of its ability to produce a score that is well-correlated to how people rate video quality.

Chart 1 – Score baseline data set VMAF score distribution

94

DECEMBER 2025 Volume 47 No.4

Made with FlippingBook - Online magazine maker