Object tracking

deep sort
deep sort people tracking, based on Yolo. Pose_estimation for Alphapose based tracking of people only.

https://github.com/mkocabas/multi-person-tracker uses SORT for multiperson tracking.

https://github.com/JunweiLiang/Object_Detection_Tracking We utilize state-of-the-art object detection and tracking algorithm in surveillance videos. Our best object detection model basically uses Faster RCNN with a backbone of Resnet-101 with dilated CNN and FPN. The tracking algo (Deep SORT) uses ROI features from the object detection model. The ActEV trained models are good for small object detection in outdoor scenes. For indoor cameras, COCO trained models are better

https://github.com/ZQPei/deep_sort_pytorch This is an implement of MOT(Multiple Object Tracking) tracking algorithm deep sort. Deep sort is basicly the same with sort but added a CNN model to extract features in image of human part bounded by a detector. This CNN model is indeed a RE-ID model and the detector used in PAPER is FasterRCNN, and the original source code is HERE. However in original code, the CNN model is implemented with tensorflow, which I'm not familier with. SO I re-implemented the CNN feature extraction model with PyTorch, and changed the CNN model a little bit. Also, I use YOLOv3 to generate bboxes instead of FasterRCNN.

Smorodov
https://github.com/Smorodov/Multitarget-tracker from Kalman tracking. Ref medium.com hal24k-techblog track objects

Chained tracking
https://github.com/pjl1995/CTracker Official implementation in PyTorch of Chained-Tracker as described in Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking.https://www.youtube.com/watch?v=UovwAgKys88

Globaltrack
https://github.com/huanglianghua/GlobalTrack Extremely simple tracking process, with NO motion model, NO online learning, NO punishment on position or scale changes, NO scale smoothing and NO trajectory refinement. Outperforms SPLT (ICCV19), SiamRPN, ATOM and MBMD on TLP benchmark (avg. 13,529 frames per video) by MORE THAN 11% (absolute gain). https://www.youtube.com/watch?v=na0H3u4cLqY&feature=youtu.be

DAN
https://github.com/shijieS/SST (Deep Affinity Network), single shot tracking, based on SSD. Evaluate on https://motchallenge.net/data/MOT17/ and https://detrac-db.rit.albany.edu/

Jeremy Cohen
https://github.com/kcg2015/Vehicle-Detection-and-Tracking/ vehicle tracking from https://towardsdatascience.com/computer-vision-for-tracking-8220759eee85

Papers
https://github.com/foolwood/benchmark_results visual tracking paper list of supervised and unsupervised tracking.
 * https://github.com/researchmm/SiamDW We are the Winner of VOT-19 RGB-D challenge
 * https://github.com/iiau-tracker/SPLT
 * https://github.com/vision4robotics/ARCF-tracker Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking (ICCV 2019)
 * https://github.com/ZikunZhou/TADT-python Existing deep trackers mainly use convolutional neural networks pre-trained for generic object recognition task for representations. Despite demonstrated successes for numerous vision tasks, the contributions of using pre-trained deep features for visual tracking are not as significant as that for object recognition. The key issue is that in visual tracking the targets of interest can be arbitrary object class with arbitrary forms. See https://xinli-zn.github.io/TADT-project-page/
 * https://github.com/XU-TIANYANG/GFS-DCF matlab

Face tracking
See Facial Recognition tiny faces detection, extracts hundreds of faces from image.

https://github.com/shunzhang876/AdaptiveFeatureLearning from https://github.com/shijieS/ComputerVisionSummarization further list of visual tracking papers and repos. See tiny faces github repo.

https://sites.google.com/site/shunzhang876/eccv16_facetracking/

Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to signiﬁcant variations in scale, pose, expression, illumination, and make-up. Low- level features used in existing multi-target tracking methods are not effective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-speciﬁc face features using convolutional neural networks (CNNs). Unlike existing CNN-based approaches that are only trained on large-scale face image datasets ofﬂine, we further adapt the pre-trained face CNN to speciﬁc videos using automatically discovered training samples from tracklets. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function. With the learned discriminative features, we apply the Hungarian algorithm to link tracklets within each shot and the hierarchical clustering algorithm to link tracklets across multiple shots to form ﬁnal trajectories. We extensively evaluate the proposed algorithm on a set of TV sitcoms and music videos and demonstrate signiﬁcant performance improvement over existing techniques. Multi-face Tracking

Unsupervised tracking

 * https://github.com/594422814/UDT_pytorch pytorch version
 * https://github.com/594422814/UDT We propose an unsupervised visual tracking method in this paper. Different from existing approaches using extensive annotated data for supervised learning, our model is trained on large-scale unlabeled videos in an unsupervised manner. Our motivation is that a robust tracker should be effective in both the forward and backward ways, i.e., the tracker can forward localize the target object in successive frames and backtrace to its initial position in the first frame. We build our method on a Siamese correlation filter network, which is trained using raw videos without labels. Meanwhile, we propose a multiple-frame validation and a cost-sensitive loss to further facilitate the unsupervised learning. Without bells and whistles, our unsupervised tracker achieves the baseline accuracy of fully-supervised trackers, which require complete and accurate labels for training. Furthermore, unsupervised framework exhibits potential in leveraging unlabeled or weakly labeled data to further improve the tracking accuracy. Linked from https://github.com/foolwood/benchmark_results visual tracking paper list.

zhengthomastang
https://github.com/zhengthomastang/2018AICity_TeamUW tracking cars and their speeds at https://www.youtube.com/watch?v=_i4numqiv7Y Single-camera and Inter-camera Vehicle Tracking and 3D Speed Estimation Based on Fusion of Visual and Semantic Features (Winner of Track 1 and Track 3 at the AI City Challenge Workshop in CVPR 2018). https://www.aicitychallenge.org/.

Uses https://github.com/AlexXiao95/Multi-camera-Vehicle-Tracking-and-Reidentification We achieved Multi-Camera Vehicle Tracking and Re-identification based on a fusion of histogram-based adaptive appearance models, DCNN features, detected license plates, detected car types and traveling time information.
 * https://github.com/zhengthomastang/MOT_Kalman Kalman tracking

Pytracking
https://github.com/visionml/pytracking Visual tracking library based on PyTorch.

Staple
Correlation Filter-based trackers have recently achieved excellent performance, showing great robustness to challenging situations exhibiting motion blur and illumination changes. However, since the model that they learn depends strongly on the spatial layout of the tracked object, they are notoriously sensitive to deformation. Models based on colour statistics have complementary traits: they cope well with variation in shape, but suffer when illumination is not consistent throughout a sequence. Moreover, colour distributions alone can be insufficiently discriminative. In this paper, we show that a simple tracker combining complementary cues in a ridge regression framework can operate faster than 80 FPS and outperform not only all entries in the popular VOT14 competition, but also recent and far more sophisticated trackers according to multiple benchmarks.

Staple: "Complementary Learners for Real-Time Tracking" from http://www.robots.ox.ac.uk/~luca/staple.html and matlab verion https://github.com/bertinetto/staple

C++ version: https://github.com/xuduo35/STAPLE/blob/master/README.md uses https://github.com/foolwood/DAT

foolwood
https://github.com/foolwood/deepmask-pytorch

Kalman
https://github.com/abewley/sort tracks any object with yolo as implemented by Fotache repo(Pytorch)

Pose tracking
https://github.com/NVlabs/Deep_Object_Pose This is the official DOPE ROS package for detection and 6-DoF pose estimation of known objects from an RGB camera. The network has been trained on the following YCB objects: cracker box, sugar box, tomato soup can, mustard bottle, potted meat can, and gelatin box

Combine the 6D object pose estimation code with Chris Annin six axis robot to automate Greenhouse pepper growing.

https://github.com/j96w/6-PACK from "6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints" from paperswithcode.com. https://www.youtube.com/watch?v=8Xb6dazqj10

https://github.com/hughw19/NOCS_CVPR2019 Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation at https://arxiv.org/pdf/1901.02970.pdf The goal of this paper is to estimate the 6D pose anddimensions of unseen object instances in an RGB-D im-age. Contrary to “instance-level” 6D pose estimation tasks,our problem assumes that no exact object CAD models areavailable during either training or testing time.

Time cycle
https://github.com/xiaolonw/TimeCycle We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.

filter flow
https://github.com/aimerykong/predictive-filter-flow learning with Predictive Filter Flow (PFF) for various vision tasks. PFF is a framework not only supporting self/fully/un-supervised learning on images and videos, but also providing better interpretability that one is able to track every single pixel's movement and its kernels in constructing the output.

Siamese tracking
https://github.com/AlexeyAB/DaSiamRPN Siamese Networks for Visual Object Tracking won the VOT 2018 challenge.

Goturn tracking
https://github.com/nrupatunga/PY-GOTURN/ python version try on ubuntu 16 or 14 if 18 doesn't work

https://github.com/davheld/GOTURN c++ version.

https://www.learnopencv.com/goturn-deep-learning-based-object-tracking/

https://davheld.github.io/GOTURN/GOTURN.pdf

https://davheld.github.io/

deep lk object tracking

General repos
https://github.com/ido90/AyalonRoad Since the small, crowded cars in the videos were failed to be detected by several out-of-the-box detectors, I manually tagged the vehicles within 15 frames and trained a dedicatedly-designed CNN (in the general spirit of Faster RCNN) consisting of pre-trained Resnet34 layers (chosen with accordance to the desired feature-map cell size and receptive field), location-based network (to incorporate road-map information), and a detection & location head.

Since the low frame-rate could not guarantee intersection between the bounding-boxes of the same vehicle in adjacent frames, I replaced the assignment mechanism of SORT tracker with a location-based probabilistic model implemented through a Kalman filter. The resulted traffic data were transformed from pixels to meters units and organized in both spatial and vehicle-oriented structures. Several research questions were addressed, e.g. regarding the relations between speed/density/flux (the fundamental traffic diagram), daily and temporal patterns and the effects of lane-transitions.

links
Yolo_training

Mdnet tracking

Facial Recognition

Neural papers with code

Person reidentification

intel openvino