Read time: 3 minutes

Releasing Norfair: an open source library for object tracking

At Tryolabs, we are always on the look for the next big thing in detection models. This search usually involves fiddling with newly released research repositories and mixing and matching ideas between them. One thing that has come up time and time again has been evaluating how these new detectors, combined with other models like Person ReID networks, work for tracking in video. To make this process easier we have built, and are now sharing, our tracking code in the form of an open source library we call Norfair.

Norfair is a lightweight, customizable object tracking library that can work with most detectors. We have tested it with object detectors, pose estimators and instance segmentation models, but it is built to work with anything that outputs (x, y) coordinates on images.

The way Norfair achieves this is by making the user define the function that calculates the distance between already tracked objects and the detections provided by the detector. This function can be a simple one liner defining the euclidean distance between points, or a complex function using external data such as embeddings taken from the detector itself, or an external Person ReID model used in conjunction with the detector. This makes Norfair heavily customizable, and work on all types of detectors.

Usage

The following is an example of a particularly simple distance function calculating the Euclidean distance between tracked objects and detections. This is possibly the simplest distance function you could use in Norfair, as it uses just one single point per detection/object.

def euclidean_distance(detection, tracked_object):
    return np.linalg.norm(detection.points - tracked_object.estimate)

As an example we use Detectron2 to get the single point detections to use with this distance function. We just use the centroids of the bounding boxes around cars as our detections, and get the following results.

https://github.com/tryolabs/norfair/raw/master/docs/traffic.gif

On the left, you can see the points we get from Detectron2, and on the right how Norfair tracks them assigning a unique identifier through time. Even a straightforward distance function like this one can work when the tracking needed is simple.

Norfair also provides several useful tools for creating a video inference loop. Here is what the full code for creating the previous example looks like, including the code needed to set up Detectron2.

import cv2
import numpy as np
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

from norfair import Detection, Tracker, Video, draw_tracked_objects

# Set up Detectron2 object detector
cfg = get_cfg()
cfg.merge_from_file("faster_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"
detector = DefaultPredictor(cfg)

# Norfair
video = Video(input_path="video.mp4")
tracker = Tracker(distance_function=euclidean_distance, distance_threshold=20)

for frame in video:
    detections = detector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    detections = [Detection(p) for p in detections['instances'].pred_boxes.get_centers().cpu().numpy()]
    tracked_objects = tracker.update(detections=detections)
    draw_tracked_objects(frame, tracked_objects)
    video.write(frame)


The video and drawing tools use OpenCV frames, so they are compatible with most Python video code available online. The point tracking is based on SORT generalized to detections consisting of a dynamically changing amount of points per detection.

Examples

We provide an ever-growing list of examples of how Norfair can be used to add tracking capabilities to several different detectors.

  1. Simple tracking of cars using Detectron2.
  2. Simple tracking of cars using YOLOv4.
  3. Simple tracking pedestrians using AlphaPose.
  4. Speed up inference by extrapolating detections using OpenPose.

https://github.com/tryolabs/norfair/raw/master/docs/openpose_skip_3_frames.gif

What about that name?

Following Tryolabs’ tradition of naming things after elements of the Metroid world (eg.: Luminoth) we settled on the name Norfair, the underground volcanic area on planet Zebes!

Conclusion

Norfair is an easy way to try tracking on any detector, or even try new ideas by creating your own custom tracker. Norfair powers several video analytics applications. You can check its performance on the face mask detection tool we developed.

We will be honored to receive feedback by the community, and gladly welcome more collaborators to the project. Follow us on Twitter to get notified as we continue to add new features and examples to Norfair!

Norfair logo

Like what you read?

Subscribe to our newsletter and get updates on Deep Learning, NLP, Computer Vision & Python.

No spam, ever. We'll never share your email address and you can opt out at any time.
Comments powered by Disqus

Get in touch

Do you have a project in mind?
We'd love to e-meet you!

Thanks for reaching out!

We'll reply as soon as possible.

And in the meantime?
Check out our blog to see what we're currently working on.