Authors(SORT): Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft.
Authors(DEEP SORT): Nicolai Wojke, Alex Bewley, and Dietrich Paulus
Presentation by: George Hendrick
Blog post by: Obiora Odugu
Link to Paper: Read the Paper
SORT (Simple Online and Realtime Tracking) introduces a fast, lightweight object tracking algorithm that works in real-time by using a tracking-by-detection paradigm. It combines object detection (e.g., using Faster R-CNN), motion prediction via the Kalman filter, and data association using the Hungarian algorithm. While SORT achieves impressive speed and accuracy, it struggles with identity switches, especially during occlusions or when objects are close together. Deep SORT extends SORT by incorporating appearance information through deep learning. It uses a convolutional neural network (CNN) trained on a re-identification dataset to extract appearance features, which are then combined with motion data for better data association. This significantly reduces identity switches and improves tracking through occlusions, although it introduces computational overhead and requires a modern GPU for real-time performance.
Q1: Do you believe Deep SORT should be considered an upgrade of SORT and should replace it in most scenarios? Is the speed tradeoff worth the greater tracking capability during periods of occlusion?
Bassel and Aleksandar: Bassel and Aleksandar discussed how in AV occlusion might not even be an issue and the need for solving the identity switching can be resolved compared to the application where deep sort and sort were tried on.
Sujan: Sujan talks about how deep sort can fit into privacy areas and would better fit such application while sort can be used in Avs further buttressing Bassel and Aleksandar comment.
Professor: Professor Xugui asked if the papers are from the same group.
George: The first paper (SORT) introduced a simpler object tracking method using Kalman filtering and the Hungarian algorithm. The second paper (Deep SORT) extended it with appearance-based tracking. The first author is the same person
Aleksandar Avdalovic: is the function on page 18 focuses on minimization or maximization and what λ (lambda) represent.
George: George responded by referring him to the paper for details. The professor added that λ depends on the loss function and is often a parameter that is tuned experimentally.
Sujan Gyawali: How is Deep SORT used in real-world scenarios and how it is different from YOLO.
George: George explained that it is primarily used for tracking humans in videos, but can be applied to general object tracking particularly in AV. He went on to clarify that YOLO is for object detection, while Deep SORT integrates object detection with motion prediction and tracking. Professor added that the general process for AV involves: sensor data → deep learning model for object detection → motion tracking system for object tracking → planning system → low-level control (braking, gas, etc.). He concludes that better tracking systems lead to better vehicle planning.
Ruslan: Ruslan mentioned that SORT is good for real-time tracking, while Deep SORT is better for accuracy.
George: George clarified that Deep SORT can also be used in real-time, but it is slower than sort.