Streaming

Introduction

This pages discusses two strategies to fix transmission problems. Meaning, the video on receiving side is not displayed as expected because of stutters, smears, ghosting, black image, and so on. This can be difficult to fix and sometimes parameters have dependencies. Tuning one side influences another.

Roughly speaking, two things can be done: Reduce the amount of data, and "shape" the data evenly over time/ The first is easy, but may affect image quality. Still, it's the preferred level to fix things. It's easily done, explained and maintained.

The second strategy, "smooth traffic over time", can achieve miracles but is in practice not easy at all, even if it's just one single setting on the (Axis) camera. It requires sufficient understanding of the networking, which is normally not the case. We all expect transparent behavior of the switch. This camera does 5 Mbit/s, the switch is 100 Mbit/s, what's the problem?

Most of us don't know much about switches and because of that we have high expectations. Let's compare with cars. Our understanding of cars is completely different. A car will have in its datasheet certain figures for fuel consumption, a maximum load and a maximum speed. This data, though slightly positive, will prove correct when tested. But nobody expects cars to meet the figures all at once. Nobody is surprised when a car doesn't reach its advertised mileage when loaded to maximum capacity. Or carry that weight on a very steep hill. Or achieve maximum speed under those circumstances. Cars come in different qualities and it is even true for the high-end ones.

Like cars, switches are devices which can be pushed to one limit or another. Unlike cars you can't sense their struggle. But a certain jitter or a pixelisation in a videostream doesn't always justify this 'fix your network' helpdesk response. You may be cyberdriving a load uphill and not all specifications can be met at the same time.

The analogy is not perfect but I hope you get it.

Reduce data

Reducing the amount of data is a tempting quickfix for a lot of problems. Results are quickly assessed, and when everyone is satisfied, why dig deeper. Unfortunately, changes like reducing resolution or increasing compression degrade the image immediately. There are other parameters which act more subtle. The purpose of this section is to collect them all so you can start off with the ones which don't hurt so much. Or combine them, a bit of each in a bitrate-reduction cocktail, to reduce data with (near-)invisible effects.

One downside is that bitrate as-is may only affect the situation indirectly. The problem may not be the amount of data, but its bursting nature, see ''Spread traffic over time". But that's difficult, so it makes sense to first try easier ways which just reduce the amounts of bits.

Parameter	Description
Smart codec	Smart codecs 'intelligently' adapt compression parameters. Continuously, from frame to frame and in a scene dependent way. The result is still a compliant stream. Parameters listed elsewhere in this overview are tuned automatically, like: Compression GOP length Framerate One characteristic though is that when a scene gets busier the smart codec loses opportunity to optimize, down to a point where it can't provide a bitrate reduction anymore Because of that, smart codecs are hardly any help to remove traffic peaks on individual cameras. But as they can optimize away 50-80% of data in a typical surveillance scene, it belongs on top of this list. There is a cumulative effect. At one time, a few cameras may have a busy scene, but others haven't. The combined load on the network is still reduced. The big exception probably being trains and train platforms.
h.264 profile	Setting the highest possible h.264 profile is a 'free lunch' setting: less bits without visible impact. The profile (baseline, main, high) is an indicator for the subset of compression technologies that is applied. High uses the most advanced options and yields the smallest stream. It is widely supported by clients so it should be used. One consideration to not use High profile could be that high profile allows for B-frames, which are bad for latency. But most IP cameras don't use B-frames because of that reason, so it is safe.
h.265	h.265 takes more computing resources to decode compared to h.264. This may be a reason not to use it. But the resulting stream is smaller compared to h.264, so when it is an option it should be considered
Image stabilization	Less movement means less changes to encode: less bits. If camera movement plays a role and image stabilization is available, it is also a near 'free lunch' setting.
Color quantization	Axis cameras can compress colors deeper than brightness. A 10% 'fee lunch' bitrate reduction is possible because the human eye won't notice: the eye is more sensitive to brightness than it is to color. Unfortunately the 'picture parameter set' to describe the resulting stream becomes more complex and some software don't support the resulting stream. Therefore this setting is hidden rather deeply.
GOP length	An I-frame is typically a lot larger than a P-frame and also a lot more work to process on client-side. So, a longer GOP can bring a lot of benefit. The most subtle way is a dynamic GOP like offered by some smart codecs. Then, the camera will decide itself when a GOP can be longer and when it should stay at default in order to preserve image quality.
Sharpness	H.264 is designed to efficiently compress natural scenes. Natural scenes have mostly soft transitions. Sharp lines therefore take more bits. Especially overview scenes allow for reduced sharpness without really impacting perceived image quality
Contrast	Likewise, but more difficult to judge.The image becomes more dull quickly and it is easy to perceive that as a loss when comparing the images. But small reductions still leave a usable image and take away bits
Saturation	Likewise, the combined use of sharpness. contrast and saturation can reduce bitrate considerable in scenes that allow for it. If the forensic value is less important, one could tune down sharpness, contrast and color on the camera side to save bits, and increase it on receiver side to achieve the desired look of the image.

Other options affect the image more visibly:


Compression	Increasing compression directly reduces the bitstream. Small compression increases can be usefull to provide just the bitrate reduction needed at acceptable image quality. Larger increases affect the image more negatively compared to what they reduce in bitrate.
WDR	WDR preserves detail. Detail implies a bigger stream so switching off WDR often reduces the bitrate. Obviously a last resort option, or a no-go, where WDR is actually needed. But sometimes it isn't, one can still see the city square and accept the "blown-out" white sky.
Resolution	Less pixels means less data, but also less detail.
Focus	It would be weird to set a camera out of focus. It roughly compares to lowering the resolution, which does not feel weird. So it deserved mentioning. It is obviously also a last resort option. In practice, resolution is one of the first settings that is touched because of the big effect. Nobody would touch the focus because of it's disastrous effect. But when you scale up a lower resolution image, you will see it is not so different from out-of-focus.
MBR	Maximum bitrate sets a hard maximum on the bitrate, at which level either compression (increase) or framerate (decrease), or both, is strongly adjusted to keep the limit. MBR is very useful, if not mandatory, for network capacity management. It makes sense to have a hard maximum on all devices to guarantee collective throughput. An individual image will be 'destroyed' in order to keep a system working. Typical values should be quite a bit higher than the average size of the stream. MBR by itself is not a good setting to tune an individual camera because quality is lost when it is needed most: when something moves. For storage size control it often better to enable the smart codec. Irrelevant data is then cut away, making room for free running variable bitrate when something happens.
Framerate	Decreasing framerate means less data to encode, which means less bits. The effect is unfortunately very visible. And there are some less obvious considerations: Compressed framesize increases: the difference between frames increases because more happens during the time between them. This means, the bitrate goes down but not linear with the framerate (slower) Cameras capture at a fixed rate, typically 25 fps in 50Hz countries and 30fps in 60Hz countries. Lower framerates are achieved by selectively dropping images from that captured stream. This implies not all framerates are equal. For example, 15 fps is a good choice in a 60Hz country, because every other frame is dropped and the remaining frames are evenly spaced 66.67milliseconds apart. But the same 15 fps is a poor choice in a 50hz country because the camera can only approximate by dropping frames on irregular basis, leading to stutter with interframe times altenating between 40 and 80ms. Axis cameras support a parametervalue in the stream URL like fps=25/2 or fps=25/3, which specifies an explicit drop factor, here leading to 12.5 and 8.33 fps respectively, evenly spaced. Unfortunately no management system allows to configure such values.

Spread traffic over time

One less-acknowlegded problem is that a network may have ample capacity for the average bitrate, but not for the actual bitrate. The two can differ a lot. For example, a camera generates a stream around 2Mbit/second. When looking at such a stream for longer period of time, its traffic may look like this:

As you might recognise, this screenshot is made using Wireshark. It mentions 2 * 10⁷ bits per 10 seconds = 2 Mbit/s. This is an average value, it is not all what the switch needs to deal with at a given point in time. On a smaller timescale, the speed doesn't vary around 2 Mbit/s. It varies between zero and the linespeed, 100Mbit/s. Or better: it is either zero (most of the time) or it tries to get as close to 100Mbit/s as it can (when there is a frame to transmit).

Here we see the same data zoomed-in. You see spikes surrounded by a lot of silence.

Max capacity is sooner than you think

Now imagine a 100Mbit/s switch with 8 ports. 7 ports connect to cameras and one port is the uplink. What is the data we can expect on the uplink? A naive calculation would be: 7 times 2 Mbit/s = 14 Mbit/s. While true on average this is not true at a point in time. More likely is a value between 0 Mbit/s (no camera is transmitting) and 700 Mbit/s (all cameras are transmitting)!

What can a switch do while it faces 700Mbit/s ingress traffic but can only send it out at 100Mbit/s? It can buffer or drop. Roughly, the more expensive the switch, the more buffer memory it has and the data will be sent out eventually, with a bit of added jitter. But when it doesn't have sufficient memory, all it can do is drop the packets.

Let's look at the Reduce Data section again. All the settings discussed are either image settings, so they influence what the encoder will see, or they are encoder settings, influencing what the encoder will emit. None of these settings directly relate the transport itself. They influence how much data will be generated, not how it is transported. Let's compare the data with boxes: the image- and encoder-settings influence how many boxes of data will be generated. Then, the boxes need to be transported. Let's compare the transport with trucks on a highway. The default behavior is that trucks will be loaded immediately with all boxes and they will drive at full speed. From practical commuting experience we know that's fine when the road is empty but it leads to congestion when its not.

Reducing speed helps. Many switches support adjustable linespeed per port and the following trick works:

Make sure the total traffic of a camera will not exceed 10 Mbit/s
Set the all switch ports except the uplink from Auto negotiate to 10 Mbit/s

Now the collective traffic is 'shaped' to not exceed 70-Mbit/s and it will pass the uplink without problem.

What will happen when a camera generates more than 10Mbit/s? Buffering inside camera memory will take place. For short durations this is fine. But when it happens too often memory will fill up and something needs to happen. One possibility is that frames are dropped. This happens before RTP packetization. That means frames drop out of the sequence but you can't conclude that from the RTP sequence numbers.The clients will not able to deduce data is lost. And, because the device operates in less well-tested low-memory state it may eventually crash.

How can it work at all?

You might wonder how the 8-port switch can work at all when data comes in 7 times faster than it goes out. The answer is: statistics. It doesn't happen so often. IP Cameras are 'free running' devices, their image capture is not hardwired to some shared signal. They emit their data at slightly different times so their collective load is effectively, you could say by chance, averaged out. The 700Mbit/s is a theoretical maximum with a tiny chance of being hit. Of course in the example trouble starts already a collective 100Mbit/s, not 700Mbit/s. Collective bandwidth may be higher at one time than another. For example, as clock crystals are not identical each camera drifts in time at slightly different pace. What's 40 milliseconds for one camera. might be 39.99998 for another. As each camera emits its data at slighly different intervals every once in a while some cameras will generate a collective peak.

The real trouble starts when the scene changes. A typical scenario is multiple cameras viewing the same scene. For example a city square with a lot of flags. he wind starts to blow and all flags start waving. Bandwidth rises on all cameras at the same time. The same happens in a train with all doors opening. This increases the chance of hitting uplink capacity.

Gigabit

Gigabit switches obviously solve a lot of these problems because most cameras have a 100Mbit/s connector. Still, a 24 port version may approach the danger zone again and now that cameras with Gbit connector appear, even if these can't generate the full 1 Gbit/s of data, the problem will be back again.

How to solve this?

This section is specific to Axis cameras, which have a 'traffic shaping' feature. Other brands may or may not have a similar functionality.

The mentioned trick with 10Mbit/s linespeed works, but it is very rigid. It is not necessery to limit the collective load below 100 Mbit/s. It is seldom the case all cameras are streaming at maximum capacity. Here is a number without research to back it, but practical experience showed that limiting the collective load to 2 times the max capacity is enough. When one camera is more important than others it could be given a higher limit than the others.

The traffic shaping feature can be enabled in the lowlevel settings menu (plain config), bandwidth group. It accepts a string value, you can set it to e.g. "20mbit" after which the device makes sure it never exceeds 20Mbit/s, even on the sub-millisecond level. It is a global bottleneck and it is important to make sure the camera will never have a need to exceed that limit. It's a 'dangerous' setting and you will be reaching out for the factory default button when not done right. I recommend a workflow like this:

Tune down the stream to make it as small as possible given the required image quality, let's assume you end up at 4 Mbit/s, which is probably quite high for a fixed camera

Apply an MBR of 8 Mbit/s. It's twice the normal bitrate so it will seldom kick in to destroy the image but will protect network throughput when it's needed

Take every other required precaution to prevent overruns. The biggest risk a webbrowser starting a JPEG stream, this must be avoided. On Axis firmware which does that, you should configure a Live view profile with a small and low fps image

Apply the trafficshaping limit of 20Mbit/s

A side effect is a slightly increased latency as packets leave the camera later than they would have done without traffic shaping. As a typical management system has a few hundreds of milliseconds latency itself the impact isn't huge but not negligable either. For example: a 250 KB I-frame would take around 20 milliseconds on 100 Mbit/s, and around 100 milliseconds on 20Mbit/s.

Suppose no buffering at the receiver takes place, the first one will be hardly noticeable but the second one gives a visible interruption.

TCP

So far we haven't discussed TCP. TCP is designed to workaround intermittent capacity problems. You will understand by now it is completely normal for a network under load to drop packets occasionally and TCP is there to solve that. When the limits are only occasionally hit and sufficient spare bandwidth is normally available, the retransmit function of TCP will simply retry the dropped packets and all will be fine, at the price of jitter.

Only when the amount of drops becomes high and insufficient free capacity for the retransmits is available, TCP can't repair the problems anymore and eventually becomes counter-productive. IP cameras are exceptionally good at creating such conditions.

Summary

A short recap and conclusion of the solutions prevented:

Decreasing the size of streams is a method to reduce network problems, but it works indirect: by lowering the data volume you lower the chance of capacity limits being hit. Many settings to reduce data are easy to apply and understand.
TCP can correct for transport problems as long as there is sufficient free capacity for applying these corrections (retransmits). TCP may have impact on jitter because of other optimization functions in the protocol.
Traffic shaping directly reduces the possibility of capacity limits being hit. But it acts as a global restriction on the camera and increases latency. When more data is generated than can stream out, devices may become in unresponsive, crash (older devices with small memories) or exhibit extreme latencies (recent models with lots of memory).

Trafficshaping makes sense in environments with special complications (wireless, ADSL, limited capacity links) and presence of staff with sufficient IT skills. In that case it can make the difference between no video at all and a very good stream.