At Tryolabs, we are always on the look for the next big thing in detection models. This search usually involves fiddling with newly released research repositories and mixing and matching ideas between them. One thing that has come up time and time again has been evaluating how these new detectors, combined with other models like Person ReID networks, work for tracking in video. To make this process easier we have built, and are now sharing, our tracking code in the form of an open source library we call Norfair.
Norfair is a lightweight, customizable object tracking library that can work with most detectors. We have tested it with object detectors, pose estimators and instance segmentation models, but it is built to work with anything that outputs
(x, y) coordinates on images.
The way Norfair achieves this is by making the user define the function that calculates the distance between already tracked objects and the detections provided by the detector. This function can be a simple one liner defining the euclidean distance between points, or a complex function using external data such as embeddings taken from the detector itself, or an external Person ReID model used in conjunction with the detector. This makes Norfair heavily customizable, and work on all types of detectors.
The following is an example of a particularly simple distance function calculating the Euclidean distance between tracked objects and detections. This is possibly the simplest distance function you could use in Norfair, as it uses just one single point per detection/object.
def euclidean_distance(detection, tracked_object): return np.linalg.norm(detection.points - tracked_object.estimate)
As an example we use Detectron2 to get the single point detections to use with this distance function. We just use the centroids of the bounding boxes around cars as our detections, and get the following results.
On the left, you can see the points we get from Detectron2, and on the right how Norfair tracks them assigning a unique identifier through time. Even a straightforward distance function like this one can work when the tracking needed is simple.
Norfair also provides several useful tools for creating a video inference loop. Here is what the full code for creating the previous example looks like, including the code needed to set up Detectron2.
import cv2import numpy as npfrom detectron2.config import get_cfgfrom detectron2.engine import DefaultPredictorfrom norfair import Detection, Tracker, Video, draw_tracked_objects# Set up Detectron2 object detectorcfg = get_cfg()cfg.merge_from_file("faster_rcnn_R_50_FPN_3x.yaml")cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"detector = DefaultPredictor(cfg)# Norfairvideo = Video(input_path="video.mp4")tracker = Tracker(distance_function=euclidean_distance, distance_threshold=20)for frame in video:detections = detector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))detections = [Detection(p) for p in detections['instances'].pred_boxes.get_centers().cpu().numpy()]tracked_objects = tracker.update(detections=detections)draw_tracked_objects(frame, tracked_objects)video.write(frame)
The video and drawing tools use OpenCV frames, so they are compatible with most Python video code available online. The point tracking is based on SORT generalized to detections consisting of a dynamically changing amount of points per detection.
We provide an ever-growing list of examples of how Norfair can be used to add tracking capabilities to several different detectors.
Norfair is an easy way to try tracking on any detector, or even try new ideas by creating your own custom tracker. Norfair powers several video analytics applications. You can check its performance on the face mask detection tool we developed.
We will be honored to receive feedback by the community, and gladly welcome more collaborators to the project. Follow us on Twitter to get notified as we continue to add new features and examples to Norfair!