479 research outputs found
Recommended from our members
Interpretable Machine Learning Architectures for Efficient Signal Detection with Applications to Gravitational Wave Astronomy
Deep learning has seen rapid evolution in the past decade, accomplishing tasks that were previously unimaginable. At the same time, researchers strive to better understand and interpret the underlying mechanisms of the deep models, which are often justifiably regarded as "black boxes". Overcoming this deficiency will not only serve to suggest better learning architectures and training methods, but also extend deep learning to scenarios where interpretability is key to the application. One such scenario is signal detection and estimation, with gravitational wave detection as a specific example, where classic methods are often preferred for their interpretability. Nonetheless, while classic statistical detection methods such as matched filtering excel in their simplicity and intuitiveness, they can be suboptimal in terms of both accuracy and computational efficiency. Therefore, it is appealing to have methods that achieve ``the best of both worlds'', namely enjoying simultaneously excellent performance and interpretability.
In this thesis, we aim to bridge this gap between modern deep learning and classic statistical detection, by revisiting the signal detection problem from a new perspective. First, to address the perceived distinction in interpretability between classic matched filtering and deep learning, we state the intrinsic connections between the two families of methods, and identify how trainable networks can address the structural limitations of matched filtering. Based on these ideas, we propose two trainable architectures that are constructed based on matched filtering, but with learnable templates and adaptivity to unknown noise distributions, and therefore higher detection accuracy. We next turn our attention toward improving the computational efficiency of detection, where we aim to design architectures that leverage structures within the problem for efficiency gains. By leveraging the statistical structure of class imbalance, we integrate hierarchical detection into trainable networks, and use a novel loss function which explicitly encodes both detection accuracy and efficiency. Furthermore, by leveraging the geometric structure of the signal set, we consider using signal space optimization as an alternative computational primitive for detection, which is intuitively more efficient than covering with a template bank. We theoretical prove the efficiency gain by analyzing Riemannian gradient descent on the signal manifold, which reveals an exponential improvement in efficiency over matched filtering. We also propose a practical trainable architecture for template optimization, which makes use of signal embedding and kernel interpolation.
We demonstrate the performance of all proposed architectures on the task of gravitational wave detection in astrophysics, where matched filtering is the current method of choice. The architectures are also widely applicable to general signal or pattern detection tasks, which we exemplify with the handwritten digit recognition task using the template optimization architecture. Together, we hope the this work useful to scientists and engineers seeking machine learning architectures with high performance and interpretability, and contribute to our understanding of deep learning as a whole
Advances and Applications of DSmT for Information Fusion. Collected Works, Volume 5
This fifth volume on Advances and Applications of DSmT for Information Fusion collects theoretical and applied contributions of researchers working in different fields of applications and in mathematics, and is available in open-access. The collected contributions of this volume have either been published or presented after disseminating the fourth volume in 2015 in international conferences, seminars, workshops and journals, or they are new. The contributions of each part of this volume are chronologically ordered.
First Part of this book presents some theoretical advances on DSmT, dealing mainly with modified Proportional Conflict Redistribution Rules (PCR) of combination with degree of intersection, coarsening techniques, interval calculus for PCR thanks to set inversion via interval analysis (SIVIA), rough set classifiers, canonical decomposition of dichotomous belief functions, fast PCR fusion, fast inter-criteria analysis with PCR, and improved PCR5 and PCR6 rules preserving the (quasi-)neutrality of (quasi-)vacuous belief assignment in the fusion of sources of evidence with their Matlab codes.
Because more applications of DSmT have emerged in the past years since the apparition of the fourth book of DSmT in 2015, the second part of this volume is about selected applications of DSmT mainly in building change detection, object recognition, quality of data association in tracking, perception in robotics, risk assessment for torrent protection and multi-criteria decision-making, multi-modal image fusion, coarsening techniques, recommender system, levee characterization and assessment, human heading perception, trust assessment, robotics, biometrics, failure detection, GPS systems, inter-criteria analysis, group decision, human activity recognition, storm prediction, data association for autonomous vehicles, identification of maritime vessels, fusion of support vector machines (SVM), Silx-Furtif RUST code library for information fusion including PCR rules, and network for ship classification.
Finally, the third part presents interesting contributions related to belief functions in general published or presented along the years since 2015. These contributions are related with decision-making under uncertainty, belief approximations, probability transformations, new distances between belief functions, non-classical multi-criteria decision-making problems with belief functions, generalization of Bayes theorem, image processing, data association, entropy and cross-entropy measures, fuzzy evidence numbers, negator of belief mass, human activity recognition, information fusion for breast cancer therapy, imbalanced data classification, and hybrid techniques mixing deep learning with belief functions as well
Beam scanning by liquid-crystal biasing in a modified SIW structure
A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium
Synthetic Aperture Radar (SAR) Meets Deep Learning
This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports
Novel neural architectures & algorithms for efficient inference
In the last decade, the machine learning universe embraced deep neural networks (DNNs) wholeheartedly with the advent of neural architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, etc. These models have empowered many applications, such as ChatGPT, Imagen, etc., and have achieved state-of-the-art (SOTA) performance on many vision, speech, and language modeling tasks. However, SOTA performance comes with various issues, such as large model size, compute-intensive training, increased inference latency, higher working memory, etc. This thesis aims at improving the resource efficiency of neural architectures, i.e., significantly reducing the computational, storage, and energy consumption of a DNN without any significant loss in performance.
Towards this goal, we explore novel neural architectures as well as training algorithms that allow low-capacity models to achieve near SOTA performance. We divide this thesis into two dimensions: \textit{Efficient Low Complexity Models}, and \textit{Input Hardness Adaptive Models}.
Along the first dimension, i.e., \textit{Efficient Low Complexity Models}, we improve DNN performance by addressing instabilities in the existing architectures and training methods. We propose novel neural architectures inspired by ordinary differential equations (ODEs) to reinforce input signals and attend to salient feature regions. In addition, we show that carefully designed training schemes improve the performance of existing neural networks. We divide this exploration into two parts:
\textsc{(a) Efficient Low Complexity RNNs.} We improve RNN resource efficiency by addressing poor gradients, noise amplifications, and BPTT training issues. First, we improve RNNs by solving ODEs that eliminate vanishing and exploding gradients during the training. To do so, we present Incremental Recurrent Neural Networks (iRNNs) that keep track of increments in the equilibrium surface. Next, we propose Time Adaptive RNNs that mitigate the noise propagation issue in RNNs by modulating the time constants in the ODE-based transition function. We empirically demonstrate the superiority of ODE-based neural architectures over existing RNNs. Finally, we propose Forward Propagation Through Time (FPTT) algorithm for training RNNs. We show that FPTT yields significant gains compared to the more conventional Backward Propagation Through Time (BPTT) scheme.
\textsc{(b) Efficient Low Complexity CNNs.} Next, we improve CNN architectures by reducing their resource usage. They require greater depth to generate high-level features, resulting in computationally expensive models. We design a novel residual block, the Global layer, that constrains the input and output features by approximately solving partial differential equations (PDEs). It yields better receptive fields than traditional convolutional blocks and thus results in shallower networks. Further, we reduce the model footprint by enforcing a novel inductive bias that formulates the output of a residual block as a spatial interpolation between high-compute anchor pixels and low-compute cheaper pixels. This results in spatially interpolated convolutional blocks (SI-CNNs) that have better compute and performance trade-offs. Finally, we propose an algorithm that enforces various distributional constraints during training in order to achieve better generalization. We refer to this scheme as distributionally constrained learning (DCL).
In the second dimension, i.e., \textit{Input Hardness Adaptive Models}, we introduce the notion of the hardness of any input relative to any architecture. In the first dimension, a neural network allocates the same resources, such as compute, storage, and working memory, for all the inputs. It inherently assumes that all examples are equally hard for a model. In this dimension, we challenge this assumption using input hardness as our reasoning that some inputs are relatively easy for a network to predict compared to others. Input hardness enables us to create selective classifiers wherein a low-capacity network handles simple inputs while abstaining from a prediction on the complex inputs. Next, we create hybrid models that route the hard inputs from the low-capacity abstaining network to a high-capacity expert model. We design various architectures that adhere to this hybrid inference style. Further, input hardness enables us to selectively distill the knowledge of a high-capacity model into a low-capacity model by cleverly discarding hard inputs during the distillation procedure.
Finally, we conclude this thesis by sketching out various interesting future research directions that emerge as an extension of different ideas explored in this work
Point Cloud Processing for Environmental Analysis in Autonomous Driving using Deep Learning
Autonomous self-driving cars need a very precise perception system of their environment, working for every conceivable scenario. Therefore, different kinds of sensor types, such as lidar scanners, are in use. This thesis contributes highly efficient algorithms for 3D object recognition to the scientific community. It provides a Deep Neural Network with specific layers and a novel loss to safely localize and estimate the orientation of objects from point clouds originating from lidar sensors. First, a single-shot 3D object detector is developed that outputs dense predictions in only one forward pass. Next, this detector is refined by fusing complementary semantic features from cameras and joint probabilistic tracking to stabilize predictions and filter outliers. The last part presents an evaluation of data from automotive-grade lidar scanners. A Generative Adversarial Network is also being developed as an alternative for target-specific artificial data generation.One of the main objectives of leading automotive companies is autonomous self-driving cars. They need a very precise perception system of their environment, working for every conceivable scenario. Therefore, different kinds of sensor types are in use. Besides cameras, lidar scanners became very important. The development in that field is significant for future applications and system integration because lidar offers a more accurate depth representation, independent from environmental illumination. Especially algorithms and machine learning approaches, including Deep Learning and Artificial Intelligence based on raw laser scanner data, are very important due to the long range and three-dimensional resolution of the measured point clouds. Consequently, a broad field of research with many challenges and unsolved tasks has been established. This thesis aims to address this deficit and contribute highly efficient algorithms for 3D object recognition to the scientific community. It provides a Deep Neural Network with specific layers and a novel loss to safely localize and estimate the orientation of objects from point clouds. First, a single shot 3D object detector is developed that outputs dense predictions in only one forward pass. Next, this detector is refined by fusing complementary semantic features from cameras and a joint probabilistic tracking to stabilize predictions and filter outliers. In the last part, a concept for deployment into an existing test vehicle focuses on the semi-automated generation of a suitable dataset. Subsequently, an evaluation of data from automotive-grade lidar scanners is presented. A Generative Adversarial Network is also being developed as an alternative for target-specific artificial data generation. Experiments on the acquired application-specific and benchmark datasets show that the presented methods compete with a variety of state-of-the-art algorithms while being trimmed down to efficiency for use in self-driving cars. Furthermore, they include an extensive set of standard evaluation metrics and results to form a solid baseline for future research.Eines der Hauptziele führender Automobilhersteller sind autonome Fahrzeuge. Sie benötigen ein sehr präzises System für die Wahrnehmung der Umgebung, dass für jedes denkbare Szenario überall auf der Welt funktioniert. Daher sind verschiedene Arten von Sensoren im Einsatz, sodass neben Kameras u. a. auch Lidar Sensoren ein wichtiger Bestandteil sind. Die Entwicklung auf diesem Gebiet ist für künftige Anwendungen von höchster Bedeutung, da Lidare eine genauere, von der Umgebungsbeleuchtung unabhängige, Tiefendarstellung bieten. Insbesondere Algorithmen und maschinelle Lernansätze wie Deep Learning, die Rohdaten über Lernzprozesse direkt verarbeiten können, sind aufgrund der großen Reichweite und der dreidimensionalen Auflösung der gemessenen Punktwolken sehr wichtig. Somit hat sich ein weites Forschungsfeld mit vielen Herausforderungen und ungelösten Problemen etabliert. Diese Arbeit zielt darauf ab, dieses Defizit zu verringern und effiziente Algorithmen zur 3D-Objekterkennung zu entwickeln. Sie stellt ein tiefes Neuronales Netzwerk mit spezifischen Schichten und einer neuartigen Fehlerfunktion zur sicheren Lokalisierung und Schätzung der Orientierung von Objekten aus Punktwolken bereit. Zunächst wird ein 3D-Detektor entwickelt, der in nur einem Vorwärtsdurchlauf aus einer Punktwolke alle Objekte detektiert. Anschließend wird dieser Detektor durch die Fusion von komplementären semantischen Merkmalen aus Kamerabildern und einem gemeinsamen probabilistischen Tracking verfeinert, um die Detektionen zu stabilisieren und Ausreißer zu filtern. Im letzten Teil wird ein Konzept für den Einsatz in einem bestehenden Testfahrzeug vorgestellt, das sich auf die halbautomatische Generierung eines geeigneten Datensatzes konzentriert. Hierbei wird eine Auswertung auf Daten von Automotive-Lidaren vorgestellt. Als Alternative zur zielgerichteten künstlichen Datengenerierung wird ein weiteres generatives Neuronales Netzwerk untersucht. Experimente mit den erzeugten anwendungsspezifischen- und Benchmark-Datensätzen zeigen, dass sich die vorgestellten Methoden mit dem Stand der Technik messen können und gleichzeitig auf Effizienz für den Einsatz in selbstfahrenden Autos optimiert sind. Darüber hinaus enthalten sie einen umfangreichen Satz an Evaluierungsmetriken und -ergebnissen, die eine solide Grundlage für die zukünftige Forschung bilden
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Face Image and Video Analysis in Biometrics and Health Applications
Computer Vision (CV) enables computers and systems to derive meaningful information from acquired visual inputs, such as images and videos, and make decisions based on the extracted information. Its goal is to acquire, process, analyze, and understand the information by developing a theoretical and algorithmic model. Biometrics are distinctive and measurable human characteristics used to label or describe individuals by combining computer vision with knowledge of human physiology (e.g., face, iris, fingerprint) and behavior (e.g., gait, gaze, voice). Face is one of the most informative biometric traits. Many studies have investigated the human face from the perspectives of various different disciplines, ranging from computer vision, deep learning, to neuroscience and biometrics. In this work, we analyze the face characteristics from digital images and videos in the areas of morphing attack and defense, and autism diagnosis. For face morphing attacks generation, we proposed a transformer based generative adversarial network to generate more visually realistic morphing attacks by combining different losses, such as face matching distance, facial landmark based loss, perceptual loss and pixel-wise mean square error. In face morphing attack detection study, we designed a fusion-based few-shot learning (FSL) method to learn discriminative features from face images for few-shot morphing attack detection (FS-MAD), and extend the current binary detection into multiclass classification, namely, few-shot morphing attack fingerprinting (FS-MAF). In the autism diagnosis study, we developed a discriminative few shot learning method to analyze hour-long video data and explored the fusion of facial dynamics for facial trait classification of autism spectrum disorder (ASD) in three severity levels. The results show outstanding performance of the proposed fusion-based few-shot framework on the dataset. Besides, we further explored the possibility of performing face micro- expression spotting and feature analysis on autism video data to classify ASD and control groups. The results indicate the effectiveness of subtle facial expression changes on autism diagnosis
Transfomer Models: From Model Inspection to Applications in Patents
L'elaborazione del linguaggio naturale viene utilizzata per affrontare diversi compiti, sia di tipo linguistico, come ad esempio l'etichettatura della parte del discorso, il parsing delle dipendenze, sia più specifiche, come ad esempio la traduzione automatica e l'analisi del sentimento. Per affrontare questi compiti, nel tempo sono stati sviluppati approcci dedicati.Una metodologia che aumenta le prestazioni in tutti questi casi in modo unificato è la modellazione linguistica, che consiste nel preaddestrare un modello per sostituire i token mascherati in grandi quantità di testo, in modo casuale all'interno di pezzi di testo o in modo sequenziale uno dopo l'altro, per sviluppare rappresentazioni di uso generale che possono essere utilizzate per migliorare le prestazioni in molti compiti contemporaneamente.L'architettura di rete neurale che attualmente svolge al meglio questo compito è il transformer, inoltre, le dimensioni del modello e la quantità dei dati sono essenziali per lo sviluppo di rappresentazioni ricche di informazioni. La disponibilità di insiemi di dati su larga scala e l'uso di modelli con miliardi di parametri sono attualmente il percorso più efficace verso una migliore rappresentazione del testo.Tuttavia, i modelli di grandi dimensioni comportano una maggiore difficoltà nell'interpretazione dell'output che forniscono. Per questo motivo, sono stati condotti diversi studi per indagare le rappresentazioni fornite da modelli di transformers.In questa tesi indago questi modelli da diversi punti di vista, studiando le proprietà linguistiche delle rappresentazioni fornite da BERT, per capire se le informazioni che codifica sono localizzate all'interno di specifiche elementi della rappresentazione vettoriale. A tal fine, identifico pesi speciali che mostrano un'elevata rilevanza per diversi compiti di sondaggio linguistico. In seguito, analizzo la causa di questi particolari pesi e li collego alla distribuzione dei token e ai token speciali.Per completare questa analisi generale ed estenderla a casi d'uso più specifici, studio l'efficacia di questi modelli sui brevetti. Utilizzo modelli dedicati, per identificare entità specifiche del dominio, come le tecnologie o per segmentare il testo dei brevetti. Studio sempre l'analisi delle prestazioni integrandola con accurate misurazioni dei dati e delle proprietà del modello per capire se le conclusioni tratte per i modelli generici valgono anche in questo contesto.Natural Language Processing is used to address several tasks, linguistic related ones, e.g. part of speech tagging, dependency parsing, and downstream tasks, e.g. machine translation, sentiment analysis. To tackle these tasks, dedicated approaches have been developed over time.A methodology that increases performance on all tasks in a unified manner is language modeling, this is done by pre-training a model to replace masked tokens in large amounts of text, either randomly within chunks of text or sequentially one after the other, to develop general purpose representations that can be used to improve performance in many downstream tasks at once.The neural network architecture currently best performing this task is the transformer, moreover, model size and data scale are essential to the development of information-rich representations. The availability of large scale datasets and the use of models with billions of parameters is currently the most effective path towards better representations of text.However, with large models, comes the difficulty in interpreting the output they provide. Therefore, several studies have been carried out to investigate the representations provided by transformers models trained on large scale datasets.In this thesis I investigate these models from several perspectives, I study the linguistic properties of the representations provided by BERT, a language model mostly trained on the English Wikipedia, to understand if the information it codifies is localized within specific entries of the vector representation. Doing this I identify special weights that show high relevance to several distinct linguistic probing tasks. Subsequently, I investigate the cause of these special weights, and link them to token distribution and special tokens.To complement this general purpose analysis and extend it to more specific use cases, given the wide range of applications for language models, I study their effectiveness on technical documentation, specifically, patents. I use both general purpose and dedicated models, to identify domain-specific entities such as users of the inventions and technologies or to segment patents text. I always study performance analysis complementing it with careful measurements of data and model properties to understand if the conclusions drawn for general purpose models hold in this context as well
- …