207 research outputs found
Tensor-variate machine learning on graphs
Traditional machine learning algorithms are facing significant challenges as the world enters the era of big data, with a dramatic expansion in volume and range of applications and an increase in the variety of data sources. The large- and multi-dimensional nature of data often increases the computational costs associated with their processing and raises the risks of model over-fitting - a phenomenon known as the curse of dimensionality. To this end, tensors have become a subject of great interest in the data analytics community, owing to their remarkable ability to super-compress high-dimensional data into a low-rank format, while retaining the original data structure and interpretability. This leads to a significant reduction in computational costs, from an exponential complexity to a linear one in the data dimensions.
An additional challenge when processing modern big data is that they often reside on irregular domains and exhibit relational structures, which violates the regular grid assumptions of traditional machine learning models. To this end, there has been an increasing amount of research in generalizing traditional learning algorithms to graph data. This allows for the processing of graph signals while accounting for the underlying relational structure, such as user interactions in social networks, vehicle flows in traffic networks, transactions in supply chains, chemical bonds in proteins, and trading data in financial networks, to name a few.
Although promising results have been achieved in these fields, there is a void in literature when it comes to the conjoint treatment of tensors and graphs for data analytics. Solutions in this area are increasingly urgent, as modern big data is both large-dimensional and irregular in structure. To this end, the goal of this thesis is to explore machine learning methods that can fully exploit the advantages of both tensors and graphs. In particular, the following approaches are introduced: (i) Graph-regularized tensor regression framework for modelling high-dimensional data while accounting for the underlying graph structure; (ii) Tensor-algebraic approach for computing efficient convolution on graphs; (iii) Graph tensor network framework for designing neural learning systems which is both general enough to describe most existing neural network architectures and flexible enough to model large-dimensional data on any and many irregular domains. The considered frameworks were employed in several real-world applications, including air quality forecasting, protein classification, and financial modelling. Experimental results validate the advantages of the proposed methods, which achieved better or comparable performance against state-of-the-art models. Additionally, these methods benefit from increased interpretability and reduced computational costs, which are crucial for tackling the challenges posed by the era of big data.Open Acces
Contributions to improve the technologies supporting unmanned aircraft operations
Mención Internacional en el título de doctorUnmanned Aerial Vehicles (UAVs), in their smaller versions known as drones, are becoming increasingly important in today's societies. The systems that make them up present a multitude of challenges, of which error can be considered the common denominator. The perception of the environment is measured by sensors that have errors, the models that interpret the information and/or define behaviors are approximations of the world and therefore also have errors. Explaining error allows extending the limits of deterministic models to address real-world problems. The performance of the technologies embedded in drones depends on our ability to understand, model, and control the error of the systems that integrate them, as well as new technologies that may emerge.
Flight controllers integrate various subsystems that are generally dependent on other systems. One example is the guidance systems. These systems provide the engine's propulsion controller with the necessary information to accomplish a desired mission. For this purpose, the flight controller is made up of a control law for the guidance system that reacts to the information perceived by the perception and navigation systems. The error of any of the subsystems propagates through the ecosystem of the controller, so the study of each of them is essential.
On the other hand, among the strategies for error control are state-space estimators, where the Kalman filter has been a great ally of engineers since its appearance in the 1960s. Kalman filters are at the heart of information fusion systems, minimizing the error covariance of the system and allowing the measured states to be filtered and estimated in the absence of observations. State Space Models (SSM) are developed based on a set of hypotheses for modeling the world. Among the assumptions are that the models of the world must be linear, Markovian, and that the error of their models must be Gaussian. In general, systems are not linear, so linearization are performed on models that are already approximations of the world. In other cases, the noise to be controlled is not Gaussian, but it is approximated to that distribution in order to be able to deal with it. On the other hand, many systems are not Markovian, i.e., their states do not depend only on the previous state, but there are other dependencies that state space models cannot handle.
This thesis deals a collection of studies in which error is formulated and reduced. First, the error in a computer vision-based precision landing system is studied, then estimation and filtering problems from the deep learning approach are addressed. Finally, classification concepts with deep learning over trajectories are studied. The first case of the collection xviiistudies
the consequences of error propagation in a machine vision-based precision landing system. This paper proposes a set of strategies to reduce the impact on the guidance system, and ultimately reduce the error. The next two studies approach the estimation and filtering problem from the deep learning approach, where error is a function to be minimized by learning. The last case of the collection deals with a trajectory classification problem with real data. This work completes the two main fields in deep learning, regression and classification, where the error is considered as a probability function of class membership.Los vehículos aéreos no tripulados (UAV) en sus versiones de pequeño tamaño conocidos como drones, van tomando protagonismo en las sociedades actuales. Los sistemas que los componen presentan multitud de retos entre los cuales el error se puede considerar como el denominador común. La percepción del entorno se mide mediante sensores que tienen error, los modelos que interpretan la información y/o definen comportamientos son aproximaciones del mundo y por consiguiente también presentan error. Explicar el error permite extender los límites de los modelos deterministas para abordar problemas del mundo real. El rendimiento de las tecnologías embarcadas en los drones, dependen de nuestra capacidad de comprender, modelar y controlar el error de los sistemas que los integran, así como de las nuevas tecnologías que puedan surgir.
Los controladores de vuelo integran diferentes subsistemas los cuales generalmente son dependientes de otros sistemas. Un caso de esta situación son los sistemas de guiado. Estos sistemas son los encargados de proporcionar al controlador de los motores información necesaria para cumplir con una misión deseada. Para ello se componen de una ley de control de guiado que reacciona a la información percibida por los sistemas de percepción y navegación. El error de cualquiera de estos sistemas se propaga por el ecosistema del controlador siendo vital su estudio.
Por otro lado, entre las estrategias para abordar el control del error se encuentran los estimadores en espacios de estados, donde el filtro de Kalman desde su aparición en los años 60, ha sido y continúa siendo un gran aliado para los ingenieros. Los filtros de Kalman son el corazón de los sistemas de fusión de información, los cuales minimizan la covarianza del error del sistema, permitiendo filtrar los estados medidos y estimarlos cuando no se tienen observaciones. Los modelos de espacios de estados se desarrollan en base a un conjunto de hipótesis para modelar el mundo. Entre las hipótesis se encuentra que los modelos del mundo han de ser lineales, markovianos y que el error de sus modelos ha de ser gaussiano. Generalmente los sistemas no son lineales por lo que se realizan linealizaciones sobre modelos que a su vez ya son aproximaciones del mundo. En otros casos el ruido que se desea controlar no es gaussiano, pero se aproxima a esta distribución para poder abordarlo. Por otro lado, multitud de sistemas no son markovianos, es decir, sus estados no solo dependen del estado anterior, sino que existen otras dependencias que los modelos de espacio de estados no son capaces de abordar. Esta tesis aborda un compendio de estudios sobre los que se formula y reduce el error. En primer lugar, se estudia el error en un sistema de aterrizaje de precisión basado en visión por computador. Después se plantean problemas de estimación y filtrado desde la aproximación del aprendizaje profundo. Por último, se estudian los conceptos de clasificación con aprendizaje profundo sobre trayectorias. El primer caso del compendio estudia las consecuencias de la propagación del error de un sistema de aterrizaje de precisión basado en visión artificial. En este trabajo se propone un conjunto de estrategias para reducir el impacto sobre el sistema de guiado, y en última instancia reducir el error. Los siguientes dos estudios abordan el problema de estimación y filtrado desde la perspectiva del aprendizaje profundo, donde el error es una función que minimizar mediante aprendizaje. El último caso del compendio aborda un problema de clasificación de trayectorias con datos reales. Con este trabajo se completan los dos campos principales en aprendizaje profundo, regresión y clasificación, donde se plantea el error como una función de probabilidad de pertenencia a una clase.I would like to thank the Ministry of Science and Innovation for granting me the funding with reference PRE2018-086793, associated to the project TEC2017-88048-C2-2-R, which provide me the opportunity to carry out all my PhD. activities, including completing an international research internship.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Antonio Berlanga de Jesús.- Secretario: Daniel Arias Medina.- Vocal: Alejandro Martínez Cav
T-RECX: Tiny-Resource Efficient Convolutional neural networks with early-eXit
Deploying Machine learning (ML) on milliwatt-scale edge devices (tinyML) is
gaining popularity due to recent breakthroughs in ML and Internet of Things
(IoT). Most tinyML research focuses on model compression techniques that trade
accuracy (and model capacity) for compact models to fit into the KB-sized
tiny-edge devices. In this paper, we show how such models can be enhanced by
the addition of an early exit intermediate classifier. If the intermediate
classifier exhibits sufficient confidence in its prediction, the network exits
early thereby, resulting in considerable savings in time. Although early exit
classifiers have been proposed in previous work, these previous proposals focus
on large networks, making their techniques suboptimal/impractical for tinyML
applications. Our technique is optimized specifically for tiny-CNN sized
models. In addition, we present a method to alleviate the effect of network
overthinking by leveraging the representations learned by the early exit. We
evaluate T-RecX on three CNNs from the MLPerf tiny benchmark suite for image
classification, keyword spotting and visual wake word detection tasks. Our
results show that T-RecX 1) improves the accuracy of baseline network, 2)
achieves 31.58% average reduction in FLOPS in exchange for one percent accuracy
across all evaluated models. Furthermore, we show that our methods consistently
outperform popular prior works on the tiny-CNNs we evaluate.Comment: Accepted at 20th ACM International Conference on Computing Frontier
Rethinking FPGA Architectures for Deep Neural Network applications
The prominence of machine learning-powered solutions instituted an unprecedented trend of integration into virtually all applications with a broad range of deployment constraints from tiny embedded systems to large-scale warehouse computing machines. While recent research confirms the edges of using contemporary FPGAs to deploy or accelerate machine learning applications, especially where the latency and energy consumption are strictly limited, their pre-machine learning optimised architectures remain a barrier to the overall efficiency and performance.
Realizing this shortcoming, this thesis demonstrates an architectural study aiming at solutions that enable hidden potentials in the FPGA technology, primarily for machine learning algorithms. Particularly, it shows how slight alterations to the state-of-the-art architectures could significantly enhance the FPGAs toward becoming more machine learning-friendly while maintaining the near-promised performance for the rest of the applications. Eventually, it presents a novel systematic approach to deriving new block architectures guided by designing limitations and machine learning algorithm characteristics through benchmarking.
First, through three modifications to Xilinx DSP48E2 blocks, an enhanced digital signal processing (DSP) block for important computations in embedded deep neural network (DNN) accelerators is described. Then, two tiers of modifications to FPGA logic cell architecture are explained that deliver a variety of performance and utilisation benefits with only minor area overheads. Eventually, with the goal of exploring this new design space in a methodical manner, a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations is first proposed. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then suggested together with a family of new embedded blocks, called MLBlocks
Synergies between Numerical Methods for Kinetic Equations and Neural Networks
The overarching theme of this work is the efficient computation of large-scale systems. Here we deal with two types of mathematical challenges, which are quite different at first glance but offer similar opportunities and challenges upon closer examination.
Physical descriptions of phenomena and their mathematical modeling are performed on diverse scales, ranging from nano-scale interactions of single atoms to the macroscopic dynamics of the earth\u27s atmosphere. We consider such systems of interacting particles and explore methods to simulate them efficiently and accurately, with a focus on the kinetic and macroscopic description of interacting particle systems.
Macroscopic governing equations describe the time evolution of a system in time and space, whereas the more fine-grained kinetic description additionally takes the particle velocity into account.
The study of discretizing kinetic equations that depend on space, time, and velocity variables is a challenge due to the need to preserve physical solution bounds, e.g. positivity, avoiding spurious artifacts and computational efficiency.
In the pursuit of overcoming the challenge of computability in both kinetic and multi-scale modeling, a wide variety of approximative methods have been established in the realm of reduced order and surrogate modeling, and model compression. For kinetic models, this may manifest in hybrid numerical solvers, that switch between macroscopic and mesoscopic simulation, asymptotic preserving schemes, that bridge the gap between both physical resolution levels, or surrogate models that operate on a kinetic level but replace computationally heavy operations of the simulation by fast approximations.
Thus, for the simulation of kinetic and multi-scale systems with a high spatial resolution and long temporal horizon, the quote by Paul Dirac is as relevant as it was almost a century ago.
The first goal of the dissertation is therefore the development of acceleration strategies for kinetic discretization methods, that preserve the structure of their governing equations. Particularly, we investigate the use of convex neural networks, to accelerate the minimal entropy closure method. Further, we develop a neural network-based hybrid solver for multi-scale systems, where kinetic and macroscopic methods are chosen based on local flow conditions.
Furthermore, we deal with the compression and efficient computation of neural networks. In the meantime, neural networks are successfully used in different forms in countless scientific works and technical systems, with well-known applications in image recognition, and computer-aided language translation, but also as surrogate models for numerical mathematics.
Although the first neural networks were already presented in the 1950s, the scientific discipline has enjoyed increasing popularity mainly during the last 15 years, since only now sufficient computing capacity is available. Remarkably, the increasing availability of computing resources is accompanied by a hunger for larger models, fueled by the common conception of machine learning practitioners and researchers that more trainable parameters equal higher performance and better generalization capabilities. The increase in model size exceeds the
growth of available computing resources by orders of magnitude. Since , the computational resources used in the largest neural network models doubled every months\footnote{\url{https://openai.com/blog/ai-and-compute/}}, opposed to Moore\u27s Law that proposes a -year doubling period in available computing power.
To some extent, Dirac\u27s statement also applies to the recent computational challenges in the machine-learning community. The desire to evaluate and train on resource-limited devices sparked interest in model compression, where neural networks are sparsified or factorized, typically after training. The second goal of this dissertation is thus a low-rank method, originating from numerical methods for kinetic equations, to compress neural networks already during training by low-rank factorization.
This dissertation thus considers synergies between kinetic models, neural networks, and numerical methods in both disciplines to develop time-, memory- and energy-efficient computational methods for both research areas
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation
The edge processing of deep neural networks (DNNs) is becoming increasingly
important due to its ability to extract valuable information directly at the
data source to minimize latency and energy consumption. Frequency-domain model
compression, such as with the Walsh-Hadamard transform (WHT), has been
identified as an efficient alternative. However, the benefits of
frequency-domain processing are often offset by the increased
multiply-accumulate (MAC) operations required. This paper proposes a novel
approach to an energy-efficient acceleration of frequency-domain neural
networks by utilizing analog-domain frequency-based tensor transformations. Our
approach offers unique opportunities to enhance computational efficiency,
resulting in several high-level advantages, including array micro-architecture
with parallelism, ADC/DAC-free analog computations, and increased output
sparsity. Our approach achieves more compact cells by eliminating the need for
trainable parameters in the transformation matrix. Moreover, our novel array
micro-architecture enables adaptive stitching of cells column-wise and
row-wise, thereby facilitating perfect parallelism in computations.
Additionally, our scheme enables ADC/DAC-free computations by training against
highly quantized matrix-vector products, leveraging the parameter-free nature
of matrix multiplications. Another crucial aspect of our design is its ability
to handle signed-bit processing for frequency-based transformations. This leads
to increased output sparsity and reduced digitization workload. On a
1616 crossbars, for 8-bit input processing, the proposed approach
achieves the energy efficiency of 1602 tera operations per second per Watt
(TOPS/W) without early termination strategy and 5311 TOPS/W with early
termination strategy at VDD = 0.8 V
Spherical and Hyperbolic Toric Topology-Based Codes On Graph Embedding for Ising MRF Models: Classical and Quantum Topology Machine Learning
The paper introduces the application of information geometry to describe the
ground states of Ising models by utilizing parity-check matrices of cyclic and
quasi-cyclic codes on toric and spherical topologies. The approach establishes
a connection between machine learning and error-correcting coding. This
proposed approach has implications for the development of new embedding methods
based on trapping sets. Statistical physics and number geometry applied for
optimize error-correcting codes, leading to these embedding and sparse
factorization methods. The paper establishes a direct connection between DNN
architecture and error-correcting coding by demonstrating how state-of-the-art
architectures (ChordMixer, Mega, Mega-chunk, CDIL, ...) from the long-range
arena can be equivalent to of block and convolutional LDPC codes (Cage-graph,
Repeat Accumulate). QC codes correspond to certain types of chemical elements,
with the carbon element being represented by the mixed automorphism
Shu-Lin-Fossorier QC-LDPC code. The connections between Belief Propagation and
the Permanent, Bethe-Permanent, Nishimori Temperature, and Bethe-Hessian Matrix
are elaborated upon in detail. The Quantum Approximate Optimization Algorithm
(QAOA) used in the Sherrington-Kirkpatrick Ising model can be seen as analogous
to the back-propagation loss function landscape in training DNNs. This
similarity creates a comparable problem with TS pseudo-codeword, resembling the
belief propagation method. Additionally, the layer depth in QAOA correlates to
the number of decoding belief propagation iterations in the Wiberg decoding
tree. Overall, this work has the potential to advance multiple fields, from
Information Theory, DNN architecture design (sparse and structured prior graph
topology), efficient hardware design for Quantum and Classical DPU/TPU (graph,
quantize and shift register architect.) to Materials Science and beyond.Comment: 71 pages, 42 Figures, 1 Table, 1 Appendix. arXiv admin note: text
overlap with arXiv:2109.08184 by other author
Unsupervised space-time learning in primary visual cortex
The mammalian visual system is an incredibly complex computation device, capable of performing the various tasks of seeing: navigation, pattern and object recognition, motor coordination, trajectory extrapolation, among others. Decades of research has shown that experience-dependent plasticity of cortical circuitry underlies the impressive ability to rapidly learn many of these tasks and to adjust as required. One particular thread of investigation has focused on unsupervised learning, wherein changes to the visual environment lead to corresponding changes in cortical circuits. The most prominent example of unsupervised learning is ocular dominance plasticity, caused by visual deprivation to one eye and leading to a dramatic re-wiring of cortex. Other examples tend to make more subtle changes to the visual environment through passive exposure to novel visual stimuli. Here, we use one such unsupervised paradigm, sequence learning, to study experience-dependent plasticity in the mouse visual system. Through a combination of theory and experiment, we argue that the mammalian visual system is an unsupervised learning device.
Beginning with a mathematical exploration of unsupervised learning in biology, engineering, and machine learning, we seek a more precise expression of our fundamental hypothesis. We draw connections between information theory, efficient coding, and common unsupervised learning algorithms such as Hebbian plasticity and principal component analysis. Efficient coding suggests a simple rule for transmitting information in the nervous system: use more spikes to encode unexpected information, and fewer spikes to encode expected information. Therefore, expectation violations ought to produce prediction errors, or brief periods of heightened firing when an unexpected event occurs. Meanwhile, modern unsupervised learning algorithms show how such expectations can be learned.
Next, we review data from decades of visual neuroscience research, highlighting the computational principles and synaptic plasticity processes that support biological learning and seeing. By tracking the flow of visual information from the retina to thalamus and primary visual cortex, we discuss how the principle of efficient coding is evident in neural activity. One common example is predictive coding in the retina, where ganglion cells with canonical center-surround receptive fields compute a prediction error, sending spikes to the central nervous system only in response to locally-unpredictable visual stimuli. This behavior can be learned through simple Hebbian plasticity mechanisms. Similar models explain much of the activity of neurons in primary visual cortex, but we also discuss ways in which the theory fails to capture the rich biological complexity.
Finally, we present novel experimental results from physiological investigations of the mouse primary visual cortex. We trained mice by passively exposing them to complex spatiotemporal patterns of light: rapidly-flashed sequences of images. We find evidence that visual cortex learns these sequences in a manner consistent with efficient coding, such that unexpected stimuli tend to elicit more firing than expected ones. Overall, we observe dramatic changes in evoked neural activity across days of passive exposure. Neural responses to the first, unexpected sequence element increase with days of training while responses at other, expected time points either decrease or stay the same. Furthermore, substituting an unexpected element for an expected one or omitting an expected element both cause brief bursts of increased firing. Our results therefore provide evidence for unsupervised learning and efficient coding in the mouse visual system, especially because unexpected events drive prediction errors. Overall, our analysis suggests novel experiments, which could be performed in the near future, and provides a useful framework to understand visual perception and learning
Integrality and cutting planes in semidefinite programming approaches for combinatorial optimization
Many real-life decision problems are discrete in nature. To solve such problems as mathematical optimization problems, integrality constraints are commonly incorporated in the model to reflect the choice of finitely many alternatives. At the same time, it is known that semidefinite programming is very suitable for obtaining strong relaxations of combinatorial optimization problems. In this dissertation, we study the interplay between semidefinite programming and integrality, where a special focus is put on the use of cutting-plane methods. Although the notions of integrality and cutting planes are well-studied in linear programming, integer semidefinite programs (ISDPs) are considered only recently. We show that manycombinatorial optimization problems can be modeled as ISDPs. Several theoretical concepts, such as the Chvátal-Gomory closure, total dual integrality and integer Lagrangian duality, are studied for the case of integer semidefinite programming. On the practical side, we introduce an improved branch-and-cut approach for ISDPs and a cutting-plane augmented Lagrangian method for solving semidefinite programs with a large number of cutting planes. Throughout the thesis, we apply our results to a wide range of combinatorial optimization problems, among which the quadratic cycle cover problem, the quadratic traveling salesman problem and the graph partition problem. Our approaches lead to novel, strong and efficient solution strategies for these problems, with the potential to be extended to other problem classes
- …