1,429 research outputs found
A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends
In today's digital age, Convolutional Neural Networks (CNNs), a subset of
Deep Learning (DL), are widely used for various computer vision tasks such as
image classification, object detection, and image segmentation. There are
numerous types of CNNs designed to meet specific needs and requirements,
including 1D, 2D, and 3D CNNs, as well as dilated, grouped, attention,
depthwise convolutions, and NAS, among others. Each type of CNN has its unique
structure and characteristics, making it suitable for specific tasks. It's
crucial to gain a thorough understanding and perform a comparative analysis of
these different CNN types to understand their strengths and weaknesses.
Furthermore, studying the performance, limitations, and practical applications
of each type of CNN can aid in the development of new and improved
architectures in the future. We also dive into the platforms and frameworks
that researchers utilize for their research or development from various
perspectives. Additionally, we explore the main research fields of CNN like 6D
vision, generative models, and meta-learning. This survey paper provides a
comprehensive examination and comparison of various CNN architectures,
highlighting their architectural differences and emphasizing their respective
advantages, disadvantages, applications, challenges, and future trends
Learn to Generalize and Adapt across Domains in Semantic Segmentation
L'abstract è presente nell'allegato / the abstract is in the attachmen
AI-Generated Images as Data Source: The Dawn of Synthetic Era
The advancement of visual intelligence is intrinsically tethered to the
availability of large-scale data. In parallel, generative Artificial
Intelligence (AI) has unlocked the potential to create synthetic images that
closely resemble real-world photographs. This prompts a compelling inquiry: how
much visual intelligence could benefit from the advance of generative AI? This
paper explores the innovative concept of harnessing these AI-generated images
as new data sources, reshaping traditional modeling paradigms in visual
intelligence. In contrast to real data, AI-generated data exhibit remarkable
advantages, including unmatched abundance and scalability, the rapid generation
of vast datasets, and the effortless simulation of edge cases. Built on the
success of generative AI models, we examine the potential of their generated
data in a range of applications, from training machine learning models to
simulating scenarios for computational modeling, testing, and validation. We
probe the technological foundations that support this groundbreaking use of
generative AI, engaging in an in-depth discussion on the ethical, legal, and
practical considerations that accompany this transformative paradigm shift.
Through an exhaustive survey of current technologies and applications, this
paper presents a comprehensive view of the synthetic era in visual
intelligence. A project associated with this paper can be found at
https://github.com/mwxely/AIGS .Comment: 20 pages, 11 figure
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
Learning from limited labeled data - Zero-Shot and Few-Shot Learning
Human beings have the remarkable ability to recognize novel visual concepts after observing only few or zero examples of them. Deep learning, however, often requires a large amount of labeled data to achieve a good performance. Labeled instances are expensive, difficult and even infeasible to obtain because the distribution of training instances among labels naturally exhibits a long tail. Therefore, it is of great interest to investigate how to learn efficiently from limited labeled data.
This thesis concerns an important subfield of learning from limited labeled data, namely, low-shot learning. The setting assumes the availability of many labeled examples from known classes and the goal is to learn novel classes from only a few~(few-shot learning) or zero~(zero-shot learning) training examples of them. To this end, we have developed a series of multi-modal learning approaches to facilitate the knowledge transfer from known classes to novel classes for a wide range of visual recognition tasks including image classification, semantic image segmentation and video action recognition. More specifically, this thesis mainly makes the following contributions. First, as there is no agreed upon zero-shot image classification benchmark, we define a new benchmark by unifying both the evaluation protocols and data splits of publicly available datasets. Second, in order to tackle the labeled data scarcity, we propose feature generation frameworks that synthesize data in the visual feature space for novel classes. Third, we extend zero-shot learning and few-shot learning to the semantic segmentation task and propose a challenging benchmark for it. We show that incorporating semantic information into a semantic segmentation network is effective in segmenting novel classes. Finally, we develop better video representation for the few-shot video classification task and leverage weakly-labeled videos by an efficient retrieval method.Menschen haben die bemerkenswerte Fähigkeit, neuartige visuelle Konzepte zu erkennen, nachdem sie nur wenige oder gar keine Beispiele davon beobachtet haben. Tiefes Lernen erfordert jedoch oft eine große Menge an beschrifteten Daten, um eine gute Leistung zu erzielen. Etikettierte Instanzen sind teuer, schwierig und sogar undurchführbar, weil die Verteilung der Trainingsinstanzen auf die Etiketten naturgemäß einen langen Schwanz aufweist. Daher ist es von großem Interesse zu untersuchen, wie man effizient aus begrenzten gelabelten Daten lernen kann. Diese These betrifft einen wichtigen Teilbereich des Lernens aus begrenzt gelabelten Daten, nämlich das Low-Shot-Lernen. Das Setting setzt die Verfügbarkeit vieler gelabelter Beispiele aus bekannten Klassen voraus, und das Ziel ist es, neuartige Klassen aus nur wenigen (few-shot learning) oder null (zero-shot learning) Trainingsbeispielen davon zu lernen. Zu diesem Zweck haben wir eine Reihe von multimodalen Lernansätzen entwickelt, um den Wissenstransfer von bekannten Klassen zu neuartigen Klassen für ein breites Spektrum von visuellen Erkennungsaufgaben zu erleichtern, darunter Bildklassifizierung, semantische Bildsegmentierung und Videoaktionserkennung. Genauer gesagt, leistet diese Arbeit hauptsächlich die folgenden Beiträge. Da es keinen vereinbarten Benchmark für die Zero-Shot- Bildklassifikation gibt, definieren wir zunächst einen neuen Benchmark, indem wir sowohl die Evaluierungsprotokolle als auch die Datensplits öffentlich zugänglicher Datensätze vereinheitlichen. Zweitens schlagen wir zur Bewältigung der etikettierten Datenknappheit einen Rahmen für die Generierung von Merkmalen vor, der Daten im visuellen Merkmalsraum für neuartige Klassen synthetisiert. Drittens dehnen wir das Zero-Shot-Lernen und das few-Shot-Lernen auf die semantische Segmentierungsaufgabe aus und schlagen dafür einen anspruchsvollen Benchmark vor. Wir zeigen, dass die Einbindung semantischer Informationen in ein semantisches Segmentierungsnetz bei der Segmentierung neuartiger Klassen effektiv ist. Schließlich entwickeln wir eine bessere Videodarstellung für die Klassifizierungsaufgabe ”few-shot video” und nutzen schwach markierte Videos durch eine effiziente Abrufmethode.Max Planck Institute Informatic
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
- …