Object tracking

deep sort
deep sort people tracking, based on Yolo

DAN
https://github.com/shijieS/SST (Deep Affinity Network), single shot tracking, based on SSD. Evaluate on https://motchallenge.net/data/MOT17/ and https://detrac-db.rit.albany.edu/

Papers
https://github.com/foolwood/benchmark_results visual tracking paper list of supervised and unsupervised tracking.
 * https://github.com/researchmm/SiamDW We are the Winner of VOT-19 RGB-D challenge
 * https://github.com/iiau-tracker/SPLT
 * https://github.com/vision4robotics/ARCF-tracker Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking (ICCV 2019)
 * https://github.com/ZikunZhou/TADT-python Existing deep trackers mainly use convolutional neural networks pre-trained for generic object recognition task for representations. Despite demonstrated successes for numerous vision tasks, the contributions of using pre-trained deep features for visual tracking are not as significant as that for object recognition. The key issue is that in visual tracking the targets of interest can be arbitrary object class with arbitrary forms. See https://xinli-zn.github.io/TADT-project-page/
 * https://github.com/XU-TIANYANG/GFS-DCF matlab

Face tracking
https://github.com/shunzhang876/AdaptiveFeatureLearning from https://github.com/shijieS/ComputerVisionSummarization further list of visual tracking papers and repos. See tiny faces github repo.

https://sites.google.com/site/shunzhang876/eccv16_facetracking/

Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to signiﬁcant variations in scale, pose, expression, illumination, and make-up. Low- level features used in existing multi-target tracking methods are not effective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-speciﬁc face features using convolutional neural networks (CNNs). Unlike existing CNN-based approaches that are only trained on large-scale face image datasets ofﬂine, we further adapt the pre-trained face CNN to speciﬁc videos using automatically discovered training samples from tracklets. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function. With the learned discriminative features, we apply the Hungarian algorithm to link tracklets within each shot and the hierarchical clustering algorithm to link tracklets across multiple shots to form ﬁnal trajectories. We extensively evaluate the proposed algorithm on a set of TV sitcoms and music videos and demonstrate signiﬁcant performance improvement over existing techniques. Multi-face Tracking

Unsupervised tracking

 * https://github.com/594422814/UDT_pytorch pytorch version
 * https://github.com/594422814/UDT We propose an unsupervised visual tracking method in this paper. Different from existing approaches using extensive annotated data for supervised learning, our model is trained on large-scale unlabeled videos in an unsupervised manner. Our motivation is that a robust tracker should be effective in both the forward and backward ways, i.e., the tracker can forward localize the target object in successive frames and backtrace to its initial position in the first frame. We build our method on a Siamese correlation filter network, which is trained using raw videos without labels. Meanwhile, we propose a multiple-frame validation and a cost-sensitive loss to further facilitate the unsupervised learning. Without bells and whistles, our unsupervised tracker achieves the baseline accuracy of fully-supervised trackers, which require complete and accurate labels for training. Furthermore, unsupervised framework exhibits potential in leveraging unlabeled or weakly labeled data to further improve the tracking accuracy. Linked from https://github.com/foolwood/benchmark_results visual tracking paper list.

zhengthomastang
https://github.com/zhengthomastang/2018AICity_TeamUW tracking cars and their speeds at https://www.youtube.com/watch?v=_i4numqiv7Y Single-camera and Inter-camera Vehicle Tracking and 3D Speed Estimation Based on Fusion of Visual and Semantic Features (Winner of Track 1 and Track 3 at the AI City Challenge Workshop in CVPR 2018). https://www.aicitychallenge.org/.

Uses https://github.com/AlexXiao95/Multi-camera-Vehicle-Tracking-and-Reidentification We achieved Multi-Camera Vehicle Tracking and Re-identification based on a fusion of histogram-based adaptive appearance models, DCNN features, detected license plates, detected car types and traveling time information.
 * https://github.com/zhengthomastang/MOT_Kalman Kalman tracking

Pytracking
https://github.com/visionml/pytracking Visual tracking library based on PyTorch.

Kalman
https://github.com/abewley/sort tracks any object with yolo as implemented by Fotache repo(Pytorch)

Pose tracking
https://github.com/NVlabs/Deep_Object_Pose This is the official DOPE ROS package for detection and 6-DoF pose estimation of known objects from an RGB camera. The network has been trained on the following YCB objects: cracker box, sugar box, tomato soup can, mustard bottle, potted meat can, and gelatin box

Combine the 6D object pose estimation code with Chris Annin six axis robot to automate Greenhouse pepper growing.

https://github.com/j96w/6-PACK from "6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints" from paperswithcode.com. https://www.youtube.com/watch?v=8Xb6dazqj10

https://github.com/hughw19/NOCS_CVPR2019 Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation at https://arxiv.org/pdf/1901.02970.pdf The goal of this paper is to estimate the 6D pose anddimensions of unseen object instances in an RGB-D im-age. Contrary to “instance-level” 6D pose estimation tasks,our problem assumes that no exact object CAD models areavailable during either training or testing time.

Time cycle
https://github.com/xiaolonw/TimeCycle We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.

filter flow
https://github.com/aimerykong/predictive-filter-flow learning with Predictive Filter Flow (PFF) for various vision tasks. PFF is a framework not only supporting self/fully/un-supervised learning on images and videos, but also providing better interpretability that one is able to track every single pixel's movement and its kernels in constructing the output.

Siamese tracking
https://github.com/AlexeyAB/DaSiamRPN Siamese Networks for Visual Object Tracking won the VOT 2018 challenge.

Goturn tracking
https://github.com/nrupatunga/PY-GOTURN/ python version try on ubuntu 16 or 14 if 18 doesn't work

https://github.com/davheld/GOTURN c++ version.

https://www.learnopencv.com/goturn-deep-learning-based-object-tracking/

https://davheld.github.io/GOTURN/GOTURN.pdf

https://davheld.github.io/

deep lk object tracking

General repos
https://github.com/ido90/AyalonRoad Since the small, crowded cars in the videos were failed to be detected by several out-of-the-box detectors, I manually tagged the vehicles within 15 frames and trained a dedicatedly-designed CNN (in the general spirit of Faster RCNN) consisting of pre-trained Resnet34 layers (chosen with accordance to the desired feature-map cell size and receptive field), location-based network (to incorporate road-map information), and a detection & location head.

Since the low frame-rate could not guarantee intersection between the bounding-boxes of the same vehicle in adjacent frames, I replaced the assignment mechanism of SORT tracker with a location-based probabilistic model implemented through a Kalman filter. The resulted traffic data were transformed from pixels to meters units and organized in both spatial and vehicle-oriented structures. Several research questions were addressed, e.g. regarding the relations between speed/density/flux (the fundamental traffic diagram), daily and temporal patterns and the effects of lane-transitions.

links
Yolo_training

Mdnet tracking

Facial Recognition

Neural papers with code