1,438 research outputs found
Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions
Deep learning has seen rapid growth in recent years and achieved
state-of-the-art performance in a wide range of applications. However, training
models typically requires expensive and time-consuming collection of large
quantities of labeled data. This is particularly true within the scope of
medical imaging analysis (MIA), where data are limited and labels are expensive
to be acquired. Thus, label-efficient deep learning methods are developed to
make comprehensive use of the labeled data as well as the abundance of
unlabeled and weak-labeled data. In this survey, we extensively investigated
over 300 recent papers to provide a comprehensive overview of recent progress
on label-efficient learning strategies in MIA. We first present the background
of label-efficient learning and categorize the approaches into different
schemes. Next, we examine the current state-of-the-art methods in detail
through each scheme. Specifically, we provide an in-depth investigation,
covering not only canonical semi-supervised, self-supervised, and
multi-instance learning schemes, but also recently emerged active and
annotation-efficient learning strategies. Moreover, as a comprehensive
contribution to the field, this survey not only elucidates the commonalities
and unique features of the surveyed methods but also presents a detailed
analysis of the current challenges in the field and suggests potential avenues
for future research.Comment: Update Few-shot Method
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
State-of-the-art deep learning models are often trained with a large amountof costly labeled training data. However, requiring exhaustive manualannotations may degrade the model's generalizability in the limited-labelregime. Semi-supervised learning and unsupervised learning offer promisingparadigms to learn from an abundance of unlabeled visual data. Recent progressin these paradigms has indicated the strong benefits of leveraging unlabeleddata to improve model generalization and provide better model initialization.In this survey, we review the recent advanced deep learning algorithms onsemi-supervised learning (SSL) and unsupervised learning (UL) for visualrecognition from a unified perspective. To offer a holistic understanding ofthe state-of-the-art in these areas, we propose a unified taxonomy. Wecategorize existing representative SSL and UL with comprehensive and insightfulanalysis to highlight their design rationales in different learning scenariosand applications in different computer vision tasks. Lastly, we discuss theemerging trends and open challenges in SSL and UL to shed light on futurecritical research directions.<br
Semi-Weakly Supervised Learning for Label-efficient Semantic Segmentation in Expert-driven Domains
Unter Zuhilfenahme von Deep Learning haben semantische Segmentierungssysteme beeindruckende Ergebnisse erzielt, allerdings auf der Grundlage von überwachtem Lernen, das durch die Verfügbarkeit kostspieliger, pixelweise annotierter Bilder limitiert ist.
Bei der Untersuchung der Performance dieser Segmentierungssysteme in Kontexten, in denen kaum Annotationen vorhanden sind, bleiben sie hinter den hohen Erwartungen, die durch die Performance in annotationsreichen Szenarien geschürt werden, zurück.
Dieses Dilemma wiegt besonders schwer, wenn die Annotationen von lange geschultem Personal, z.B. Medizinern, Prozessexperten oder Wissenschaftlern, erstellt werden müssen.
Um gut funktionierende Segmentierungsmodelle in diese annotationsarmen, Experten-angetriebenen Domänen zu bringen, sind neue Lösungen nötig.
Zu diesem Zweck untersuchen wir zunächst, wie schlecht aktuelle Segmentierungsmodelle mit extrem annotationsarmen Szenarien in Experten-angetriebenen Bildgebungsdomänen zurechtkommen.
Daran schließt sich direkt die Frage an, ob die kostspielige pixelweise Annotation, mit der Segmentierungsmodelle in der Regel trainiert werden, gänzlich umgangen werden kann, oder ob sie umgekehrt ein Kosten-effektiver Anstoß sein kann, um die Segmentierung in Gang zu bringen, wenn sie sparsam eingestetzt wird.
Danach gehen wir auf die Frage ein, ob verschiedene Arten von Annotationen, schwache- und pixelweise Annotationen mit unterschiedlich hohen Kosten, gemeinsam genutzt werden können, um den Annotationsprozess flexibler zu gestalten.
Experten-angetriebene Domänen haben oft nicht nur einen Annotationsmangel, sondern auch völlig andere Bildeigenschaften, beispielsweise volumetrische Bild-Daten.
Der Übergang von der 2D- zur 3D-semantischen Segmentierung führt zu voxelweisen Annotationsprozessen, was den nötigen Zeitaufwand für die Annotierung mit der zusätzlichen Dimension multipliziert.
Um zu einer handlicheren Annotation zu gelangen, untersuchen wir Trainingsstrategien für Segmentierungsmodelle, die nur preiswertere, partielle Annotationen oder rohe, nicht annotierte Volumina benötigen.
Dieser Wechsel in der Art der Überwachung im Training macht die Anwendung der Volumensegmentierung in Experten-angetriebenen Domänen realistischer, da die Annotationskosten drastisch gesenkt werden und die Annotatoren von Volumina-Annotationen befreit werden, welche naturgemäß auch eine Menge visuell redundanter Regionen enthalten würden.
Schließlich stellen wir die Frage, ob es möglich ist, die Annotations-Experten von der strikten Anforderung zu befreien, einen einzigen, spezifischen Annotationstyp liefern zu müssen, und eine Trainingsstrategie zu entwickeln, die mit einer breiten Vielfalt semantischer Information funktioniert.
Eine solche Methode wurde hierzu entwickelt und in unserer umfangreichen experimentellen Evaluierung kommen interessante Eigenschaften verschiedener Annotationstypen-Mixe in Bezug auf deren Segmentierungsperformance ans Licht.
Unsere Untersuchungen führten zu neuen Forschungsrichtungen in der semi-weakly überwachten Segmentierung, zu neuartigen, annotationseffizienteren Methoden und Trainingsstrategien sowie zu experimentellen Erkenntnissen, zur Verbesserung von Annotationsprozessen, indem diese annotationseffizient, expertenzentriert und flexibel gestaltet werden
On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey
Stereo matching is one of the longest-standing problems in computer vision
with close to 40 years of studies and research. Throughout the years the
paradigm has shifted from local, pixel-level decision to various forms of
discrete and continuous optimization to data-driven, learning-based methods.
Recently, the rise of machine learning and the rapid proliferation of deep
learning enhanced stereo matching with new exciting trends and applications
unthinkable until a few years ago. Interestingly, the relationship between
these two worlds is two-way. While machine, and especially deep, learning
advanced the state-of-the-art in stereo matching, stereo itself enabled new
ground-breaking methodologies such as self-supervised monocular depth
estimation based on deep networks. In this paper, we review recent research in
the field of learning-based depth estimation from single and binocular images
highlighting the synergies, the successes achieved so far and the open
challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial:
"Learning-based depth estimation from stereo and monocular images: successes,
limitations and future challenges"
(https://sites.google.com/view/cvpr-2019-depth-from-image/home
Segment Anything Meets Point Tracking
The Segment Anything Model (SAM) has established itself as a powerful
zero-shot image segmentation model, enabled by efficient point-centric
annotation and prompt-based models. While click and brush interactions are both
well explored in interactive image segmentation, the existing methods on videos
focus on mask annotation and propagation. This paper presents SAM-PT, a novel
method for point-centric interactive video segmentation, empowered by SAM and
long-term point tracking. SAM-PT leverages robust and sparse point selection
and propagation techniques for mask generation. Compared to traditional
object-centric mask propagation strategies, we uniquely use point propagation
to exploit local structure information agnostic to object semantics. We
highlight the merits of point-based tracking through direct evaluation on the
zero-shot open-world Unidentified Video Objects (UVO) benchmark. Our
experiments on popular video object segmentation and multi-object segmentation
tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a
point-based segmentation tracker yields better zero-shot performance and
efficient interactions. We release our code that integrates different point
trackers and video segmentation benchmarks at https://github.com/SysCV/sam-pt
Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud
Existing methods for large-scale point cloud semantic segmentation require
expensive, tedious and error-prone manual point-wise annotations. Intuitively,
weakly supervised training is a direct solution to reduce the cost of labeling.
However, for weakly supervised large-scale point cloud semantic segmentation,
too few annotations will inevitably lead to ineffective learning of network. We
propose an effective weakly supervised method containing two components to
solve the above problem. Firstly, we construct a pretext task, \textit{i.e.,}
point cloud colorization, with a self-supervised learning to transfer the
learned prior knowledge from a large amount of unlabeled point cloud to a
weakly supervised network. In this way, the representation capability of the
weakly supervised network can be improved by the guidance from a heterogeneous
task. Besides, to generate pseudo label for unlabeled data, a sparse label
propagation mechanism is proposed with the help of generated class prototypes,
which is used to measure the classification confidence of unlabeled point. Our
method is evaluated on large-scale point cloud datasets with different
scenarios including indoor and outdoor. The experimental results show the large
gain against existing weakly supervised and comparable results to fully
supervised methods\footnote{Code based on mindspore:
https://github.com/dmcv-ecnu/MindSpore\_ModelZoo/tree/main/WS3\_MindSpore}
- …