216 research outputs found
Cross-Modal Concept Learning and Inference for Vision-Language Models
Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP,
establish the correlation between texts and images, achieving remarkable
success on various downstream tasks with fine-tuning. In existing fine-tuning
methods, the class-specific text description is matched against the whole
image. We recognize that this whole image matching is not effective since
images from the same class often contain a set of different semantic objects,
and an object further consists of a set of semantic parts or concepts.
Individual semantic parts or concepts may appear in image samples from
different classes. To address this issue, in this paper, we develop a new
method called cross-model concept learning and inference (CCLI). Using the
powerful text-image correlation capability of CLIP, our method automatically
learns a large set of distinctive visual concepts from images using a set of
semantic text concepts. Based on these visual concepts, we construct a
discriminative representation of images and learn a concept inference network
to perform downstream image classification tasks, such as few-shot learning and
domain generalization. Extensive experimental results demonstrate that our CCLI
method is able to improve the performance upon the current state-of-the-art
methods by large margins, for example, by up to 8.0% improvement on few-shot
learning and by up to 1.3% for domain generalization
A Survey on Self-Supervised Representation Learning
Learning meaningful representations is at the heart of many tasks in the
field of modern machine learning. Recently, a lot of methods were introduced
that allow learning of image representations without supervision. These
representations can then be used in downstream tasks like classification or
object detection. The quality of these representations is close to supervised
learning, while no labeled images are needed. This survey paper provides a
comprehensive review of these methods in a unified notation, points out
similarities and differences of these methods, and proposes a taxonomy which
sets these methods in relation to each other. Furthermore, our survey
summarizes the most-recent experimental results reported in the literature in
form of a meta-study. Our survey is intended as a starting point for
researchers and practitioners who want to dive into the field of representation
learning
CLIP-S: Language-Guided Self-Supervised Semantic Segmentation
Existing semantic segmentation approaches are often limited by costly
pixel-wise annotations and predefined classes. In this work, we present
CLIP-S that leverages self-supervised pixel representation learning and
vision-language models to enable various semantic segmentation tasks (e.g.,
unsupervised, transfer learning, language-driven segmentation) without any
human annotations and unknown class information. We first learn pixel
embeddings with pixel-segment contrastive learning from different augmented
views of images. To further improve the pixel embeddings and enable
language-driven semantic segmentation, we design two types of consistency
guided by vision-language models: 1) embedding consistency, aligning our pixel
embeddings to the joint feature space of a pre-trained vision-language model,
CLIP; and 2) semantic consistency, forcing our model to make the same
predictions as CLIP over a set of carefully designed target classes with both
known and unknown prototypes. Thus, CLIP-S enables a new task of class-free
semantic segmentation where no unknown class information is needed during
training. As a result, our approach shows consistent and substantial
performance improvement over four popular benchmarks compared with the
state-of-the-art unsupervised and language-driven semantic segmentation
methods. More importantly, our method outperforms these methods on unknown
class recognition by a large margin.Comment: The IEEE/CVF Conference on Computer Vision and Pattern Recognition
202
Your representations are in the network: composable and parallel adaptation for large scale models
We propose InCA, a lightweight method for transfer learning that
cross-attends to any activation layer of a pre-trained model. During training,
InCA uses a single forward pass to extract multiple activations, which are
passed to external cross-attention adapters, trained anew and combined or
selected for downstream tasks. We show that, even when selecting a single
top-scoring adapter, InCA achieves performance comparable to full fine-tuning,
at a cost comparable to fine-tuning just the last layer. For example, with a
cross-attention probe 1.3% the size of a pre-trained ViT-L/16 model, we achieve
performance within 0.2% of the full fine-tuning paragon at a computational
training cost of 51% of the baseline, on average across 11 downstream
classification. Unlike other forms of efficient adaptation, InCA does not
require backpropagating through the pre-trained model, thus leaving its
execution unaltered at both training and inference. The versatility of InCA is
best illustrated in fine-grained tasks, which may require accessing information
absent in the last layer but accessible in intermediate layer activations.
Since the backbone is fixed, InCA allows parallel ensembling as well as
parallel execution of multiple tasks. InCA achieves state-of-the-art
performance in the ImageNet-to-Sketch multi-task benchmark.Comment: Accepted to NeurIPS 202
Information Refinement Technologies for Crisis Informatics: User Expectations and Design Implications for Social Media and Mobile Apps in Crises
In the past 20 years, mobile technologies and social media have not only been established in everyday life, but also in crises, disasters, and emergencies. Especially large-scale events, such as 2012 Hurricane Sandy or the 2013 European Floods, showed that citizens are not passive victims but active participants utilizing mobile and social information and communication technologies (ICT) for crisis response (Reuter, Hughes, et al., 2018). Accordingly, the research field of crisis informatics emerged as a multidisciplinary field which combines computing and social science knowledge of disasters and is rooted in disciplines such as human-computer interaction (HCI), computer science (CS), computer supported cooperative work (CSCW), and information systems (IS). While citizens use personal ICT to respond to a disaster to cope with uncertainty, emergency services such as fire and police departments started using available online data to increase situational awareness and improve decision making for a better crisis response (Palen & Anderson, 2016). When looking at even larger crises, such as the ongoing COVID-19 pandemic, it becomes apparent the challenges of crisis informatics are amplified (Xie et al., 2020). Notably, information is often not available in perfect shape to assist crisis response: the dissemination of high-volume, heterogeneous and highly semantic data by citizens, often referred to as big social data (Olshannikova et al., 2017), poses challenges for emergency services in terms of access, quality and quantity of information. In order to achieve situational awareness or even actionable information, meaning the right information for the right person at the right time (Zade et al., 2018), information must be refined according to event-based factors, organizational requirements, societal boundary conditions and technical feasibility. In order to research the topic of information refinement, this dissertation combines the methodological framework of design case studies (Wulf et al., 2011) with principles of design science research (Hevner et al., 2004). These extended design case studies consist of four phases, each contributing to research with distinct results. This thesis first reviews existing research on use, role, and perception patterns in crisis informatics, emphasizing the increasing potentials of public participation in crisis response using social media. Then, empirical studies conducted with the German population reveal positive attitudes and increasing use of mobile and social technologies during crises, but also highlight barriers of use and expectations towards emergency services to monitor and interact in media. The findings led to the design of innovative ICT artefacts, including visual guidelines for citizens’ use of social media in emergencies (SMG), an emergency service web interface for aggregating mobile and social data (ESI), an efficient algorithm for detecting relevant information in social media (SMO), and a mobile app for bidirectional communication between emergency services and citizens (112.social). The evaluation of artefacts involved the participation of end-users in the application field of crisis management, pointing out potentials for future improvements and research potentials. The thesis concludes with a framework on information refinement for crisis informatics, integrating event-based, organizational, societal, and technological perspectives
YEARBOOK 2019/2020. Arts Museology and Curatorship
Yearbook is the first collection of AMaC’s student projects developed during the first two years of the course. AMaC is a Master’s degree in Arts, Museology and Curatorship with a clear mission: to educate and train professionals with creative and research skills essential to developing successful arts and cultural heritage strategies. This broad and demanding field requires an engagement with the current debate on common goods, the identity of communities, access to heritage art, and the impact of the arts on society
From backend to Dashmobile: expanding the horizons of the drone engineering ecosystem
The Drone Engineering Ecosystem (DEE) project is a groundbreaking initiative that aims to simplify access to the world of drones and promote their responsible use, particularly in the educational domain. Traditional methods of drone control and interaction have been complex and fragmented. The DEE project seeks to overcome these challenges by integrating various technologies, including Python, Tkinter, FastAPI, MongoDB, Flutter, and Dart, to create a cohesive and user-friendly ecosystem. The project began with a comprehensive analysis of the existing drone ecosystem, identifying its limitations and areas for improvement. This was followed by the formulation of clear objectives and a detailed work plan, visualized through a Gantt chart. The development process encompassed the creation of a robust backend, significant enhancements to the existing dashboard, and the development of a mobile application using Flutter. One of the main challenges was the integration of new technologies like Flutter and Dart, which were learned specifically for this project. Rigorous testing and user experience evaluation were integral to ensuring the system's functionality and usability. The project's success not only achieved most of the set goals but also opened new avenues for future exploration and development in the drone technology field. My passion for programming and the application of my Telecommunications Engineering bachelor's degree were key drivers in the project's success. The project represents a significant contribution to the field, providing a platform for learning and promoting the responsible use of drones. It also reflects my commitment to challenging myself and applying my academic knowledge to real-world problems.Objectius de Desenvolupament Sostenible::4 - EducaciĂł de Qualita
Scalable and fault-tolerant data stream processing on multi-core architectures
With increasing data volumes and velocity, many applications are shifting from the classical “process-after-store” paradigm to a stream processing model: data is produced and consumed as continuous streams. Stream processing captures latency-sensitive applications as diverse as credit card fraud detection and high-frequency trading. These applications are expressed as queries of algebraic operations (e.g., aggregation) over the most recent data using windows, i.e., finite evolving views over the input streams. To guarantee correct results, streaming applications require precise window semantics (e.g., temporal ordering) for operations that maintain state.
While high processing throughput and low latency are performance desiderata for stateful streaming applications, achieving both poses challenges. Computing the state of overlapping windows causes redundant aggregation operations: incremental execution (i.e., reusing previous results) reduces latency but prevents parallelization; at the same time, parallelizing window execution for stateful operations with precise semantics demands ordering guarantees and state access coordination. Finally, streams and state must be recovered to produce consistent and repeatable results in the event of failures.
Given the rise of shared-memory multi-core CPU architectures and high-speed networking, we argue that it is possible to address these challenges in a single node without compromising window semantics, performance, or fault-tolerance. In this thesis, we analyze, design, and implement stream processing engines (SPEs) that achieve high performance on multi-core architectures. To this end, we introduce new approaches for in-memory processing that address the previous challenges: (i) for overlapping windows, we provide a family of window aggregation techniques that enable computation sharing based on the algebraic properties of aggregation functions; (ii) for parallel window execution, we balance parallelism and incremental execution by developing abstractions for both and combining them to a novel design; and (iii) for reliable single-node execution, we enable strong fault-tolerance guarantees without sacrificing performance by reducing the required disk I/O bandwidth using a novel persistence model. We combine the above to implement an SPE that processes hundreds of millions of tuples per second with sub-second latencies. These results reveal the opportunity to reduce resource and maintenance footprint by replacing cluster-based SPEs with single-node deployments.Open Acces
Rethinking CycleGAN: Improving Quality of GANs for Unpaired Image-to-Image Translation
An unpaired image-to-image (I2I) translation technique seeks to find a
mapping between two domains of data in a fully unsupervised manner. While the
initial solutions to the I2I problem were provided by the generative
adversarial neural networks (GANs), currently, diffusion models (DM) hold the
state-of-the-art status on the I2I translation benchmarks in terms of FID. Yet,
they suffer from some limitations, such as not using data from the source
domain during the training, or maintaining consistency of the source and
translated images only via simple pixel-wise errors. This work revisits the
classic CycleGAN model and equips it with recent advancements in model
architectures and model training procedures. The revised model is shown to
significantly outperform other advanced GAN- and DM-based competitors on a
variety of benchmarks. In the case of Male2Female translation of CelebA, the
model achieves over 40% improvement in FID score compared to the
state-of-the-art results. This work also demonstrates the ineffectiveness of
the pixel-wise I2I translation faithfulness metrics and suggests their
revision. The code and trained models are available at
https://github.com/LS4GAN/uvcgan
Recommended from our members
Computational Methods in Multi-Messenger Astrophysics using Gravitational Waves and High Energy Neutrinos
This dissertation seeks to describe advancements made in computational methods for multi-messenger astrophysics (MMA) using gravitational waves GW and neutrinos during Advanced LIGO (aLIGO)’s first through third observing runs (O1-O3) and, looking forward, to describe novel computational techniques suited to the challenges of both the burgeoning MMA field and high-performance computing as a whole.
The first two chapters provide an overview of MMA as it pertains to gravitational wave/high energy neutrino (GWHEN) searches, including a summary of expected astrophysical sources as well as GW, neutrino, and gamma-ray detectors used in their detection. These are followed in the third chapter by an in-depth discussion of LIGO’s timing system, particularly the diagnostic subsystem, describing both its role in MMA searches and the author’s contributions to the system itself.
The fourth chapter provides a detailed description of the Low-Latency Algorithm for Multi-messenger Astrophysics (LLAMA), the GWHEN pipeline developed by the author and used in O2 and O3. Relevant past multi-messenger searches are described first, followed by the O2 and O3 analysis methods, the pipeline’s performance, scientific results, and finally, an in-depth account of the library’s structure and functionality. In particular, the author’s high-performance multi-order coordinates (MOC) HEALPix image analysis library, HPMOC, is described. HPMOC increases performance of HEALPix image manipulations by several orders of magnitude vs. naive single-resolution approaches while presenting a simple high-level interface and should prove useful for diverse future MMA searches. The performance improvements it provides for LLAMA are also covered.
The final chapter of this dissertation builds on the approaches taken in developing HPMOC, presenting several novel methods for efficiently storing and analyzing large data sets, with applications to MMA and other data-intensive fields. A family of depth-first multi-resolution ordering of HEALPix images — DEPTH9, DEPTH19, and DEPTH40 — is defined, along with algorithms and use cases where it can improve on current approaches, including high-speed streaming calculations suitable for serverless compute or FPGAs.
For performance-constrained analyses on HEALPix data (e.g. image analysis in multi-messenger search pipelines) using SIMD processors, breadth-first data structures can provide short-circuiting calculations in a data-parallel way on compressed data; a simple compression method is described with application to further improving LLAMA performance.
A new storage scheme and associated algorithms for efficiently compressing and contracting tensors of varying sparsity is presented; these demuxed tensors (D-Tensors) have equivalent asymptotic time and space complexity to optimal representations of both dense and sparse matrices, and could be used as a universal drop-in replacement to reduce code complexity and developer effort while improving performance of existing non-optimized numerical code. Finally, the big bucket hash table (B-Table), a novel type of hash table making guarantees on data layout (vs. load factor), is described, along with optimizations it allows for (like hardware acceleration, online rebuilds, and hard realtime applications) that are not possible with existing hash table approaches. These innovations are presented in the hope that some will prove useful for improving future MMA searches and other data-intensive applications
- …