Mon, Aug 29, 2022
In the past few years, video analytics, also known as video content analysis or intelligent video analytics, has attracted increasing interest from both industry and the academic world. Thanks to the popularization of deep learning, video analytics has introduced the automation of tasks that were once the exclusive purview of humans.
Recent improvements in video analytics have been a game-changer, ranging from applications that monitor traffic jams and alert in real-time, to others that analyze customers’ flow in retail to maximize sales, along with other more well-known scenarios such as facial recognition or smart parking.
This kind of technology looks great, but how does it work and how can it benefit your business?
In this guide, you'll discover the basic concept of video analytics, how it's used in the real world to automate processes and gain valuable insights, and what you should consider when implementing intelligent video analytics solutions in your organization.
The main goal of video analytics is to automatically recognize temporal and spatial events in videos. A person who moves suspiciously, traffic signs that are not obeyed, the sudden appearance of flames and smoke; these are just a few examples of what a video analytics solution can detect.
Usually, these systems perform real-time monitoring in which objects, object attributes, movement patterns, or behavior related to the monitored environment are detected. However, video analytics can also be used to analyze historical data to mine insights. This forensic analysis task can detect trends and patterns that answer business questions such as:
Some applications in the field of video analytics are widely known to the general public. One such example is video surveillance, a task that has existed for approximately 50 years. In principle, the idea is simple: install cameras strategically to allow human operators to control what happens in a room, area, or public space.
In practice, however, it is a task that is far from simple. An operator is usually responsible for more than one camera and, as several studies have shown, upping the number of cameras to be monitored adversely affects the operator’s performance. In other words, even if a large amount of hardware is available and generating signals, a bottleneck is formed when it is time to process those signals due to human limitations.
Video analysis software can contribute in a major way by providing a means of accurately dealing with volumes of information.
Machine learning and, in particular, the spectacular development of deep learning approaches, has revolutionized video analytics.
The use of Deep Neural Networks (DNNs) has made it possible to train video analysis systems that mimic human behavior, resulting in a paradigm shift. It started with systems based on classic computer vision techniques (e.g. triggering an alert if the camera image gets too dark or changes drastically) and moved to systems capable of identifying specific objects in an image and tracking their path.
For example, Optical Character Recognition (OCR) has been used for decades to extract text from images. In principle, it could suffice to apply OCR algorithms directly to an image of a license plate to discern its number. In the previous paradigm, this might work if the camera was positioned in such a way that, at the time of executing the OCR, we were certain that we were filming a license plate.
A real-world application of this would be the recognition of license plates at parking facilities, where the camera is located near the gates and could film the license plate when the car stops. However, running OCR constantly on images from a traffic camera is not reliable: if the OCR returns a result, how can we be sure that it really corresponds to a license plate?
In the new paradigm, models based on deep learning are able to identify the exact area of an image in which license plates appear. With this information, OCR is applied only to the exact region in question, leading to reliable results.
Historically, healthcare institutions have invested large amounts of money in video surveillance solutions to ensure the safety of their patients, staff, and visitors, at levels that are often regulated by strict legislation. Theft, infant abduction, and drug diversion are some of the most common problems addressed by surveillance systems.
In addition to facilitating surveillance tasks, video analytics allows us to go further, by exploiting the data collected in order to achieve business goals. For example, a video analytics solution could detect when a patient has not been checked on according to their needs and alert the staff. Analysis of patient and visitor traffic can be extremely valuable in determining ways to shorten wait times, while ensuring clear access to the emergency area.
At-home monitoring of older adults or people with health issues is another example of an application that provides great value. For instance, falls are a major cause of injury and death in older people. Although personal medical devices can detect falls, they must be worn and are frequently disregarded by the consumer. A video analytics solution can process the signals of home cameras to detect in real time if a person has fallen. With proper setup, such a system could also determine if a person took a given medication when they were supposed to, for instance.
Mental healthcare is another area in which video analytics can make significant contributions. Systems that analyze facial expressions, body posture, and gaze can be developed to assist clinicians in the evaluation of patients. Such a system is able to detect emotions from body language and micro-expressions, offering clinicians objective information that can confirm their hypotheses or give them new clues.
The University at Buffalo developed a smartphone application designed to help detect autism spectrum disorder (ASD) in children. Using only the smartphone camera, the app tracks facial expression and gaze attention of a child looking at pictures of social scenes (showing multiple people). The app monitors the eye movements and can accurately detect children with ASD since their eye movements are different from those of a person without autism.
Video analytics has proven to be a tremendous help in the area of transport, aiding in the development of smart cities.
An increase in traffic, especially in urban areas, can result in an increase in accidents and traffic jams if adequate traffic management measures are not taken. Intelligent video analysis solutions can play a key role in this scenario.
Traffic analysis can be used to dynamically adjust traffic light control systems and to monitor traffic jams. It can also be useful in detecting dangerous situations in real time, such as a vehicle stopped in an unauthorized space on the highway, someone driving in the wrong direction, a vehicle moving erratically, or vehicles that have been in an accident. In the case of an accident, these systems are helpful in collecting evidence in case of litigation.
At Tryolabs we developed a video analytics platform to detect pedestrian’s misbehaviors on street videos and to be able to process high volumes of data. This project gave relevant statistics to the client to be able to take actions in areas where misbehaviors were abundant generating traffic issues.
Vehicle counting, or differentiating between cars, trucks, buses, taxis, and so on, generates high-value statistics used to obtain insights about traffic. Installing speed cameras allows for precise control of drivers en masse. Automatic license plate recognition identifies cars that commit an infraction or, thanks to real-time searching, spots a vehicle that has been stolen or used in a crime.
Instead of using sensors in each parking space, a smart parking system based on video analytics helps drivers find a vacant spot by analyzing images from security cameras.
These are just some examples of the contributions that video analysis technology can make to build safer cities that are more pleasant to live in.
A great example of video analytics used to solve real-world problems is the one of the city of New York. In order to better understand major traffic events, the New York City Department of Transportation used video analytics and machine learning to detect traffic jams, weather patterns, parking violations and more. The cameras capture the activities, process them and send real-time alerts to city officials.
The use of machine learning, and video analytics in particular, in the retail sector has been one of the most important technological trends in recent years.
Brick and mortar retailers can use video analytics to understand who their customers are and how they behave.
State-of-the-art algorithms are able to recognize faces and determine people’s key characteristics such as gender and age. These algorithms can also track customers' journeys through stores and analyze navigational routes to detect walking patterns. Adding in the detection of direction of gaze, retailers can identify how long a customer looks at a certain product and finally answer a crucial question: where is the best place to put items in order to maximize sales and improve customer experience?
A lot of actionable information can be gathered with a video analytics solution, such as: number of customers, customer's characteristics, duration of visit, and walking patterns. All of this data can be analyzed while taking into account its temporal nature, in order to optimize the organization of the store according to the day of the week, the seasons of the year, or holidays . In this way, a retailer can get an extremely accurate sense of who their customers are, when they visit their store, and how they behave once inside.
A typical solution of counting entrances and exits of customers in a store can give useful information to calculate high impact metrics such as conversion rates. This approach can be leveraged by previously installed security cameras making it fast and cost effective to deploy.
Video analytics is also great for developing anti-theft mechanisms. For instance, face recognition algorithms can be trained to spot known shoplifters or spot in real-time a person hiding an item in their backpack.
What is more, information extracted from video analytics can serve as input data for training machine learning models, which aim to solve larger challenges. As an example, walking patterns and the number of people in the store, can be useful information to add to machine learning powered solutions for demand forecasting, price optimization and inventory forecasting.
Amazon Go is how Amazon entered the grocery industry. It attempts to simplify the customers' shopping experience by avoiding checkouts and letting the customers just walk out of the grocery store, automatically charging them according to what they grabbed. It has been around for several years now, and it is still a disruptive solution. Amazon Go leverages an accurate video analysis software based on several cameras to track the customers' behavior in the store. This software, combined with several sensors placed around the store, lets Amazon Go make confident decisions when it comes to charging users for their purchases.
Video surveillance is an old task of the security domain. However, from the time that systems were monitored exclusively by humans to current solutions based on video analytics, much water has passed under the bridge.
Facial and license plate recognition (LPR) techniques can be used to identify people and vehicles in real-time and make appropriate decisions. For instance, it’s possible to search for a suspect both in real-time and in stored video footage, or to recognize authorized personnel and grant access to a secured facility.
Crowd management is another key function of security systems. Cutting edge video analysis tools can make a big difference in places such as shopping malls, hospitals, stadiums, and airports. These tools can provide an estimated crowd count in real time and trigger alerts when a threshold is reached or surpassed. They can also analyze crowd flow to detect movement in unwanted or prohibited directions.
In the video above, a surveillance system was trained to recognize people in real-time. This lays the groundwork for obtaining other results. The most immediate: a count of the number of people passing by daily. More advanced goals, based on historical data, might be to determine the "normal" flow of people according to the day of the week and time and generate alerts in case of unusual traffic. If the monitored area is pedestrian-only, the system could be trained to detect unauthorized objects such as motorcycles or cars and, again, trigger some kind of alert.
This is one of the great advantages of these approaches: video content analysis systems can be trained to detect specific events, sometimes with a high degree of sophistication. One such example is to detect fires as soon as possible. Or, in the case of airports, to raise an alert when someone enters a forbidden area or walks against the direction intended for passengers. Another great use case is the real-time detection of unattended baggage in a public space.
As for classic tasks such as intruder detection, they can be performed robustly, thanks to algorithms that can filter out motion caused by wind, rain, snow, or animals.
The functionality offered by intelligent video analysis grows day by day in the security domain, and this is a trend that will continue in the future.
The Danish football club Brondby was the first soccer club to officially introduce facial recognition technology in 2019 to improve safety on matchdays at its stadium. The system identifies banned people from attending games and enables staff to prevent them from entering the stadium.
It has been a long since data arrived in sports. From soccer coaches to personal trainers, from professional athletes to beginners, everyone is leveraging data to achieve better results.
Soccer matches statistics, such as ball possession or counting the number of passes, have become a default tool for coaches to understand their team's performance. Studies have been made analyzing the importance of ball possession in UEFA Champions League matches, concluding that teams with more ball possession won 49.2%, drew 22.0%, and lost 28.7% of the matches overall, exceeding the winning rates of their rivals. If you are interested in this topic, at Tryolabs we have a tutorial on how to automatically measure soccer ball possession with AI and video analytics.
Understanding an athlete's pose when practicing sports is essential for improving the technique. Video analytics solutions can give this information to athletes or coaches to make it easier to achieve their goals. Also, pose information can be used to prevent injuries by understanding if there are any risky moves.
Video analytics solutions can also be leveraged to understand how the opponents play. Learning their game can help to build effective counters to their strategies. Solutions may range from automatically selecting relevant plays in a match to giving useful statistics to understand the opponent's weaknesses.
In the UK, soccer teams are competing with each other not only in the Premier League but also in the race to have the best possible data. From hiring rocket scientists to chess champions and even using missile tech, more than scouts, teams have started looking for engineers, mathematicians, physicists, and experts in statistics or algorithms.
Some teams, such as Arsenal, have their own in-house data company, while many others rely on third-party companies to give them all the necessary data. This data is used for every single decision: to hire players and coaches, to know what are the best positions in the field for every player, and to track youngsters’ performance in their loans, to name a few.
Let's take a look at a general scheme of how a video analytics solution works. Depending on the particular use case, the architecture of a solution may vary, but the scheme remains the same.
Video content analysis can be done in two different ways: in real time, by configuring the system to trigger alerts for specific events and incidents that unfold in the moment, or in post processing, by performing advanced searches to facilitate forensic analysis tasks.
The data being analyzed can come from various streaming video sources. The most common are CCTV cameras, traffic cameras and online video feeds. However, any video source that uses the appropriate protocol (e.g. RTSP: real-time streaming protocol or HTTP) can generally be integrated into the solution.
A key goal is coverage: we need to have a clear view of the entire area, and from various angles, where the events being monitored might occur. Remember, more data is better, given that it can be processed.
Video analysis software can be run centrally on servers that are generally located in the monitoring station, which is known as central processing. Or, it can be embedded in the cameras themselves, a strategy known as edge processing.
The choice of cameras should be carefully considered when designing a solution. A lot of legacy software was developed with central processing capabilities only. In recent years, though, it is not uncommon to come across hybrid solutions. In fact, a good practice is to concentrate, whenever possible, real-time processing on cameras and forensic analysis functionalities on the central server.
With a hybrid approach, the processing performed by the cameras reduces the data being processed by the central servers, which otherwise could require extensive processing capabilities and bandwidth as the number of cameras increases. In addition, it is possible to configure the software to only send data about suspicious events to the server over the network, reducing network traffic and the need for storage.
Meanwhile, centralizing the data for forensic analysis allows for multiple search and analysis tools to be used, from general algorithms to ad-hoc implementations, all utilizing different sets of parameters that help to balance the noise and silence in the results obtained. Essentially, you can enter in your own algorithms to get the desired results, which is a particularly flexible and attractive scheme.
Once the physical architecture is planned for and installed, it is necessary to define the scenarios on which you want to focus and then train the models that are going to detect the target events.
Vehicle crashes? Crowd flow? Facial recognition at a retail store to recognize known shoplifters? Each scenario leads to a series of basic tasks that the system must know how to perform.
An example: detect vehicles, eventually recognize their type (e.g. motorcycle, car, truck), track their trajectory frame by frame, and then study the evolution of those paths to detect a possible crash.
The most frequent, basic tasks in video analytics are:
To know more about the basic tasks performed and the types of algorithms that are used to develop video analysis software, we recommend you read this introductory guide to computer vision. More specifically, if you want to dive deeper into object detection and tracking tasks, you can refer to our step-by-step tutorial..
Training models from scratch requires considerable effort. Luckily, there are a fair amount of resources available that make this a less burdensome task.
There are several pre-trained models available for tasks such as image classification, object detection, and facial recognition, which, thanks to transfer learning techniques, allow for the adaptation (fine tuning) of a model for a given use case. This is much less expensive than a complete training.
Finally, open source projects have been increasingly published in recent years by the community to facilitate the building of custom video analysis systems. Relying on computer vision libraries, such as the ones presented in the following section, greatly helps build solutions faster and with more accuracy.
In virtually all cases, a human is needed to monitor the alerts generated by a video analysis system and decide what should be done, if anything. In this sense, these systems act as valuable support for operators, helping them to detect events that might otherwise be overlooked or take a long time to detect manually.
There's no well-established library for video analytics at the moment. The ones that exist are usually some implementation of a research paper, so they tend to be hard to use in a practical context. In other cases, the libraries are meant to be easy to use but perform poorly.
The best option is to hunt for object-tracking or pose-tracking libraries and create something custom.
At Tryolabs, we use image-level algorithms like object detection and pose estimation to perform video analytics, then add our own tracking algorithm layer over them and proceed from there.
The Open Source Computer Vision Library (OpenCV) is the most well-known computer vision library. It contains a comprehensive set of machine learning algorithms to perform common tasks such as image classification, face recognition, and object detection and tracking. It is widely used by companies and research groups, as it can be used via its native C++ interface, or though Java and Python wrappers.
Since it is a general computer vision library, it is possible to implement a video analysis system with OpenCV. However, as it is not a specialized video analytics library, it may be more interesting to turn to other available libraries (depending on the use case). In general, OpenCV is a great tool for approaching classical computer vision tasks and also for pre processing and post processing tasks.
As mentioned before, at Tryolabs we use object detection and pose estimation algorithms and add tracking on top them to create video analytics solutions. To achieve this we’ve built Norfair, a customizable lightweight Python library for real-time multi-object tracking. Using Norfair you can add tracking capabilities to any detector with just a few lines of code.
Norfair is highly customizable letting users define their own distance functions, it is modular since it can be easily inserted into complex video processing pipelines and it is fast as the only thing bounding inference speed is the detection network.
Norfair not only lets you track simple bounding boxes but is also compatible with keypoints and even 3D objects. You can also accurately track objects even if the camera is moving by estimating camera motion, potentially accounting for pan, tilt, rotation, movement in any direction, and zoom. Re-identification (ReID) is also supported, allowing the inclusion of appearance embeddings to achieve a more robust tracking system.
Back in 2016, Joseph Redmon et al. published the first single-stage object detector, You Only Look Once: Unified, Real-Time Object Detection, at the CVPR conference. YOLO was designed with both speed and accuracy in mind, which is why it is one of the most popular object detection models for production environments. YOLO is not only a model but a family of object detection models. Over the years, several modifications have been made to the original architecture to achieve even better results. YOLOv4, YOLOv5, YOLOv7, and YOLOX are some of the most popular variations, and this evolution is not to be stopped soon.
The authors of YOLOv7 (2022) open-sourced the implementation using PyTorch. This code allows for quickly developing video analytics solutions by making pre trained object detection models available for users. Another great advantage of YOLOv7's implementation is that it can be extended for pose estimation and instance segmentation tasks.
There is a plethora of off-the-shelf solutions in video analytics, from classic security systems to more complex scenarios such as smart home or healthcare applications.
If your use case is satisfied by one of these standard solutions, they may be an option for you. Be aware that, in general, some kind of adaptation or parameterization of the software has to be done and these solutions only allow customization to a certain degree.
However, most companies aim to gain specific insights to reach individual goals with a video analytics solution, which requires more optimized software. In this case, the ideal solution is to turn to a company specializing in video analytics services, such as we do here at Tryolabs. A custom solution is likely to be more accurate and can address unusual or extremely particular use cases.
Video analytics solutions are invaluable in helping us in our daily tasks. There are a vast number of sectors that can benefit from this technology, especially as the complexity of potential applications has been growing in recent years.
From smart cities, to security controls in hospitals and airports, to people tracking for retail and shopping centers, the field of video analytics enables processes that are simultaneously more effective and less tedious for humans, and less expensive for companies.
We hope you enjoyed this post, and that you gained a better understanding of what video analytics is all about, how it works, and how you can leverage it in your organization in order to automate processes and gain valuable insights to make better decisions.
We have been developing Machine Learning solutions since 2010. Partnering up with companies in different industries let us better understand their challenges and how they can use data to drive business results. Please don’t hesitate to drop us a line if you have any questions or comments about it.