Semantic segmentation

video panoptic segmentation
https://github.com/mcahny/vps video semantic segmentation, video instance segmentation

curated list
https://github.com/mrgloom/awesome-semantic-segmentation list of github repo's implementing academic papers.

Solo
Solo

Deep feature flow
https://github.com/msracver/Deep-Feature-Flow

Instance segmentation
https://github.com/dbolya/yolact We present a simple, fully-convolutional model for real-time instance segmentation that achieves 29.8 mAP on MS COCO at 33 fps evaluated on a single Titan Xp, which is significantly faster than any previous competitive approach. Moreover, we obtain this result after training on only one GPU. We accomplish this by breaking instance segmentation into two parallel subtasks: (1) generating a set of prototype masks and (2) predicting per-instance mask coefficients. Then we produce instance masks by linearly combining the prototypes with the mask coefficients.

LedgerBox

LedgerBox shows you how to extract text from PDFs and convert it to CSV with AI and document intelligence.

Convert PDFs to Excel with AI

nanonets.com
https://medium.com/nanonets/how-to-do-image-segmentation-using-deep-learning-c673cc5862ef solves a Udacity course problem.

https://nanonets.com/ learn your drone to count the number of solar panels.

https://github.com/CSAILVision/semantic-segmentation-pytorch of http://sceneparsing.csail.mit.edu/

https://github.com/NVlabs/SPADE Semantic Image Synthesis with Spatially-Adaptive Normalization. We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the deep network, which is then processed through stacks of convolution, normalization, and nonlinearity layers. We show that this is suboptimal as the normalization layers tend to ``wash away'' semantic information. To address the issue, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned transformation. Experiments on several challenging datasets demonstrate the advantage of the proposed method over existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows user control over both semantic and style as synthesizing images. by https://github.com/junyanz

deeplab
https://github.com/sthalles/deeplab_v3

https://medium.com/free-code-camp/diving-into-deep-convolutional-semantic-segmentation-networks-and-deeplab-v3-4f094fa387df

parts
https://github.com/NVlabs/SCOPS

https://varunjampani.github.io/scops/ Parts provide a intermediate representation of objects that is robust with respect to the camera, pose and appearance variations. Existing works on part segmentation is dominated by supervised approaches that rely on large amounts of manual annotations and can not generalize to unseen object categories. We propose a self-supervised deep learning approach for part segmentation, where we devise several loss functions that aids in predicting part segments that are geometrically concentrated, robust to object variations and are also semantically consistent across different object instances. Extensive experiments on different types of image collections demonstrate that our approach can produce part segments that adhere to object boundaries and also more semantically consistent across object instances compared to existing self-supervised techniques.

https://varunjampani.github.io/codes semantic segmentation repos as released by nvidia labs.

pythia
https://github.com/facebookresearch/pythia Pythia is a modular framework for supercharging vision and language research built on top of PyTorch.

pac
For super image resolution upscaling. https://github.com/Yijunmaverick/DeepJointFilter

https://suhangpro.github.io/pac/ Convolutions are the fundamental building block of CNNs. The fact that their weights are spatially shared is one of the main reasons for their widespread use, but it also is a major limitation, as it makes convolutions content agnostic.

We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially-varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling. PAC also offers an effective alternative to fully-connected CRF (Full-CRF), called PAC-CRF, which performs competitively, while being considerably faster. In addition, we also demonstrate that PAC can be used as a drop-in replacement for convolution layers in pre-trained networks, resulting in consistent performance improvements.

maps
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix Pulls street level lines from satellite image.

We provide PyTorch implementations for both unpaired and paired image-to-image translation. The code was written by Jun-Yan Zhu and Taesung Park, and supported by Tongzhou Wang. This PyTorch implementation produces results comparable to or better than our original Torch software. If you would like to reproduce the same results as in the papers, check out the original CycleGAN Torch and pix2pix Torch code

Note: The current software works well with PyTorch 0.41+. Check out the older branch that supports PyTorch 0.1-0.3. You may find useful information in training/test tips and frequently asked questions. To implement custom models and datasets, check out our templates. To help users better understand and adapt our codebase, we provide an overview of the code structure of this repository.

other
https://github.com/msracver/FCIS

deep feature
https://github.com/msracver/Deep-Feature-Flow Deep Feature Flow(https://arxiv.org/abs/1611.07715) is initially described in a CVPR 2017 paper. It provides a simple, fast, accurate, and end-to-end framework for video recognition (e.g., object detection and semantic segmentation in videos). It is worth noting that:

Deep Feature Flow significantly speeds up video recognition by applying the heavy-weight image recognition network (e.g., ResNet-101) on sparse key frames, and propagating the recognition outputs (feature maps) to the other frames by the light-weight flow network (e.g., FlowNet).

The entire system is end-to-end trained for the task of video recognition, which is vital for improving the recognition accuracy. Directly adopting state-of-the-art flow estimation methods without end-to-end training would deliver noticable worse results. Deep Feature Flow can easily make use of sparsely annotated video recognition datasets, where only a small portion of the frames are annotated with ground-truth labels.

daijifeng
https://github.com/daijifeng001/MNC based on https://github.com/rbgirshick/py-faster-rcnn MNC is an instance-aware semantic segmentation system based on deep convolutional networks, which won the first place in COCO segmentation challenge 2015, and test at a fraction of a second per image. We decompose the task of instance-aware semantic segmentation into related sub-tasks, which are solved by multi-task network cascades (MNC) with shared features. The entire MNC network is trained end-to-end with error gradients across cascaded stages.

links
Neural kpu

chat gpt for accounting

Image resolution