Written by Ken Haren, Director of Product Management for Cloud Monitoring Services
Background
In the past we’ve talked about how we’re entering the next phase of OTT monitoring. Here, we introduce an interesting use case that you may not think of right away when you consider a stream monitoring service like Telestream Cloud Stream Monitor – load testing origin servers and critical infrastructure in the distribution chain.
One of the key capabilities that we’ve designed Telestream Cloud Stream Monitor for is simulating the viewer experience across different regions. Stream Monitor customers easily spin up synthetic clients to create streaming session data in any of over 90 global regions. As Stream Monitor continuously pulls all stream variants concurrently, you can review comprehensive metrics on stream health and delivery performance for all variants in the ABR ladder, and you can visualize the difference in performance of those variants in each monitored region. This is a powerful way to validate that, in aggregate, the video delivery pipeline is performing as expected. However, what we had not initially anticipated, but have increasingly seen being leveraged by our customers, is the ability of the service to effectively scale concurrent streaming sessions for load testing origin servers and critical infrastructure in the delivery chain.
Streaming Delivery Infrastructure
While no two streaming services are built exactly the same, we consistently see some common themes. It’s typical to see video streaming services and customer managed infrastructure to deploy with the following components:
- Live Encode/Transcode
- Live Packager / JITP
- Live Origin
- Origin Shield / Load Balancer
- CDN
While there may be additional functions like ad insertion, DRM, authentication, personalization, geo-restrictions, etc… the ability of the live origin to sustain throughput and to fully understand the effectiveness of the caching network or networks in front of the origin is fundamental. If traffic overwhelms the live origin, all viewers everywhere will be impacted. Here is a quick snapshot of these fundamental components, with a load balanced pair of origin servers and two CDN partners pulling from the live origin.
In the past, the live origin was primarily a function of the CDN. However, for a variety of reasons, customers are increasingly taking on owning the content origin. This can be to more efficiently facilitate a multi-CDN delivery strategy, tighter integration into catch-up/replay windowing, managing ad insertion workflow, customizing regional localization of streams or modeling sophisticated rights policies like black-out restrictions. It has become incumbent to not only monitor streaming operations, but to develop a thorough understanding of the performance thresholds the infrastructure can support while continuing to deliver a high-quality audience experience.
The above diagram represents a relatively basic infrastructure setup but is emblematic of what we see in many of our customers environments. Stream Monitor is usually leveraged to monitor streaming health at the content edge (far right of above diagram) by pulling the streams through regional edge cache’s using the CDN playback URLs. Customers will also typically point Stream Monitor at the origin address to validate asset availability at the origin. In this way, performance degradation detected at the CDN edge can be correlated. Tracing stream performance back to the content origin, issues can be identified as localized to a specific region, a specific CDN provider, or service wide problems. This kind of root cause analysis is a key value of Stream Monitor.
Load Testing Origin Throughput
One thing that has become apparent is that it is not always easy to understand what the limits are to the designed architecture. As our customers are deploying their own live and on-demand origin servers, or leveraging 3rd party services, testing performance thresholds, load balancing and scale strategies has become a bigger need. Stream Monitor can help with this.
A recent customer engagement provides some useful insights into how exactly Stream Monitor can, in a very controlled manner, drive traffic to the content origin to test the overall stream throughput that the origin infrastructure supports. In this use case, the customer has modeled a series of MPEG DASH and HLS streaming assets in Stream Monitor.
Each asset is concurrently pulled from 6 discreet monitoring points with a wide geographic distribution, with as many as 90 discreet locations available to choose from. Synthetic clients are deployed into public cloud data centers where they are not very bandwidth constrained and can be used to simulate many concurrent sessions each. This was essential in testing the cloud provider load balancer, helping to identify discrepancies with performance in different regions based on which origin server was receiving client requests. The locations chosen were:
- Canada (AWS)
- Canada Central (Azure)
- Canada East (Azure)
- Montreal Canada (GCP)
- Moncks Corner, SC, USA (GCP)
- The Dalles, OR, USA (GCP)
A mix of HLS and DASH assets that typify audience streams are used, with each asset consisting of a variety of streaming variants, ranging from a maximum 30Mbps UHD stream down the ABR ladder to a minimum of a 3Mbps 720p stream. In total, each monitoring point would pull close to 50Mbps of streaming video for every asset it is monitoring. A collection of 4 DASH and 2 HLS assets are modeled into a single channel, and as each channel session is started, the following is dynamically provisioned:
- Monitoring DASH Assets 1-4 in the 6 selected regions
- Monitoring HLS Assets 1-2 in the 6 selected regions
Each time a new channel was started, the bandwidth seen at the load balancer and on the origin servers would increase accordingly. With the above noted configuration, each channel spun up was generating approximately 2Gbps of sustained traffic on the load balancer. By organizing the testing environment in this way, the customer was able to easily throttle the load up and down, with a maximum desired test configuration of 10Gbps. Each time a new channel was started, the load would be measured, and the behavior of the infrastructure observed. In addition to providing load generation, Stream Monitor is also continuously collecting and analyzing asset availability and streaming performance metrics for each asset variant in every monitoring location. This analysis is then presented both within the Stream Monitor real-time dashboards and as customizable reports accessed via the REST interface.
Load Testing Origin Servers Performance Results
Customers can leverage Stream Monitor to dynamically scale up load testing origin servers and infrastructure operations, generating requests across a wide swath of geographic regions and quickly identifying the overall performance of the infrastructure layer. In the above example, a key feature was to be able to start a monitoring session, validate the impacts of configuration changes made to optimize performance, identify additional changes that need to be made, rinse and repeat. Since Stream Monitor usage is purely consumption based, the customer is free to spin up a test, shut it down, reconfigure and restart at will. Fast failing infrastructure configuration changes led to rapidly developing key insights into origin server performance when under load. Load testing targets of 10Gbps were met and sustained for the final 24-hour test run, and analysis revealed that a series of optimizations to the load balancer configuration and origin offload cache layers had dramatically improved the reliability and overall performance of the origin environment.
Rather than guessing what might be causing performance issues, the customer is getting live status updates and detailed reports that measure impacts of optimizations in real-time. By filtering for specific conditions in the report data and identifying which hosts were responding to client requests when degraded stream performance was identified, the customer is able to greatly simplify what had been a complicated matrix of understanding impacts associated with load, geography, bitrate, format, etc… It’s now easy to pivot on data and correlate the Stream Monitor reports with additional data sources like origin server logs and load balancer telemetry. The ability to compare responding host data, correlated with measured network QOS KPIs on a per-request basis delivers deep insights into exactly what matrix scenarios consistently lead to service degradation and developing rules and optimizations to avoid these.
Customers have also applied similar load and cache server testing that has revealed impacts that bitrate choices within variants have on cache optimization. An important secondary feature is that, in addition to network QOS and asset availability KPIs, Stream Monitor is measuring audio / video signal quality. This means that another dimension can be added to the stream delivery optimization matrix, and that impacts on perceived quality can be measured before introducing these changes to the audience. Not only does this result in improved streaming performance, but this has helped customers identify streaming optimizations that lead to improved cache efficiency and lowered CDN delivery costs without negatively impacting the audience experience. A real win/win for our customer and their viewers tuning in.
Summary
While Stream Monitor was not originally intended as a load testing service, it’s proving to be an excellent use case that delivers immediate benefits. A combination of flexible synthetic client monitoring deployed in any region, provisioned at runtime and with a pay-only-for-what-you-use pricing model means that it’s easy for customers to develop and execute load testing plans. The depth of analytics data that Stream Monitor produces, in concert with other relevant infrastructure data sources delivers actionable insights necessary to understanding what can be reasonably be expected of your streaming infrastructure as audience load increases. Optimizing cache strategies, load balancing algorithms and origin scaling rules can be instantly tested with the same monitoring service used by your operations teams to monitor production streams, without disruption or sacrificing visibility. Telestream is digging into this use case and working closely with customers to continue to enhance the service for this kind of load analysis and other promising applications specific to geo-fencing and ad insertion policy. If you’re managing a content origin or planning on it and interested in developing a solid understanding of what the performance envelope it can sustain, Telestream Cloud Stream Monitor may be a good solution for you!
To learn more about the Telestream Cloud Stream Monitor service: