117 research outputs found

    Structure Regularization for Structured Prediction: Theories and Experiments

    Full text link
    While there are many studies on weight regularization, the study on structure regularization is rare. Many existing systems on structured prediction focus on increasing the level of structural dependencies within the model. However, this trend could have been misdirected, because our study suggests that complex structures are actually harmful to generalization ability in structured prediction. To control structure-based overfitting, we propose a structure regularization framework via \emph{structure decomposition}, which decomposes training samples into mini-samples with simpler structures, deriving a model with better generalization power. We show both theoretically and empirically that structure regularization can effectively control overfitting risk and lead to better accuracy. As a by-product, the proposed method can also substantially accelerate the training speed. The method and the theoretical results can apply to general graphical models with arbitrary structures. Experiments on well-known tasks demonstrate that our method can easily beat the benchmark systems on those highly-competitive tasks, achieving state-of-the-art accuracies yet with substantially faster training speed

    Exploiting Latent Features of Text and Graphs

    Get PDF
    As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation

    Autotuning wavefront patterns for heterogeneous architectures

    Get PDF
    Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizations are often not portable, and the whole process must be repeated when moving to a new system, or sometimes even to a different problem size. Pattern based parallel programming models were originally designed to provide programmers with an abstract layer, hiding tedious parallel boilerplate code, and allowing a focus on only application specific issues. However, the constrained algorithmic model associated with each pattern also enables the creation of pattern-specific optimization strategies. These can capture more complex variations than would be accessible by analysis of equivalent unstructured source code. These variations create complex optimization spaces. Machine learning offers well established techniques for exploring such spaces. In this thesis we use machine learning to create autotuning strategies for heterogeneous parallel implementations of applications which follow the wavefront pattern. In a wavefront, computation starts from one corner of the problem grid and proceeds diagonally like a wave to the opposite corner in either two or three dimensions. Our framework partitions and optimizes the work created by these applications across systems comprising multicore CPUs and multiple GPU accelerators. The tuning opportunities for a wavefront include controlling the amount of computation to be offloaded onto GPU accelerators, choosing the number of CPU and GPU threads to process tasks, tiling for both CPU and GPU memory structures, and trading redundant halo computation against communication for multiple GPUs. Our exhaustive search of the problem space shows that these parameters are very sensitive to the combination of architecture, wavefront instance and problem size. We design and investigate a family of autotuning strategies, targeting single and multiple CPU + GPU systems, and both two and three dimensional wavefront instances. These yield an average of 87% of the performance found by offline exhaustive search, with up to 99% in some cases

    GPU PERFORMANCE MODELLING AND OPTIMIZATION

    Get PDF
    Ph.DNUS-TU/E JOINT PH.D

    Improving decision tree and neural network learning for evolving data-streams

    Get PDF
    High-throughput real-time Big Data stream processing requires fast incremental algorithms that keep models consistent with most recent data. In this scenario, Hoeffding Trees are considered the state-of-the-art single classifier for processing data streams and they are widely used in ensemble combinations. This thesis is devoted to the improvement of the performance of algorithms for machine learning/artificial intelligence on evolving data streams. In particular, we focus on improving the Hoeffding Tree classifier and its ensemble combinations, in order to reduce its resource consumption and its response time latency, achieving better throughput when processing evolving data streams. First, this thesis presents a study on using Neural Networks (NN) as an alternative method for processing data streams. The use of random features for improving NNs training speed is proposed and important issues are highlighted about the use of NN on a data stream setup. These issues motivated this thesis to go in the direction of improving the current state-of-the-art methods: Hoeffding Trees and their ensemble combinations. Second, this thesis proposes the Echo State Hoeffding Tree (ESHT), as an extension of the Hoeffding Tree to model time-dependencies typically present in data streams. The capabilities of the new proposed architecture on both regression and classification problems are evaluated. Third, a new methodology to improve the Adaptive Random Forest (ARF) is developed. ARF has been introduced recently, and it is considered the state-of-the-art classifier in the MOA framework (a popular framework for processing evolving data streams). This thesis proposes the Elastic Swap Random Forest, an extension to ARF that reduces the number of base learners in the ensemble down to one third on average, while providing similar accuracy than the standard ARF with 100 trees. And finally, a last contribution on a multi-threaded high performance scalable ensemble design that is highly adaptable to a variety of hardware platforms, ranging from server-class to edge computing. The proposed design achieves throughput improvements of 85x (Intel i7), 143x (Intel Xeon parsing from memory), 10x (Jetson TX1, ARM) and 23x (X-Gene2, ARM) compared to single-threaded MOA on i7. In addition, the proposal achieves 75% parallel efficiency when using 24 cores on the Intel Xeon.Procesar grandes flujos de datos (Big Data Streams, BDS) en tiempo real requiere el uso de algoritmos incrementales rápidos que mantengan los modelos consistentes con los datos más recientes. En este escenario, los Hoeffding Trees (HT) se consideran el clasificador simple más avanzado para procesar BDS, razon por la cual son ampliamente usados como base a la hora de combinar clasificadores en Ensembles. Esta tesis está dedicada a la mejora del rendimiento de algoritmos para Machine Learning/Iteligencia Artificial en BDS que evolucionan con el tiempo (es decir, BDS cuya distribución estadística cambia con el tiempo). En particular, nuestro objetivo es mejorar el Hoeffding Tree y sus combinaciones en Ensembles, con el objetivo de reducir el consumo de recursos y la latencia en el tiempo de respuesta, logrando un mejor rendimiento al procesar BDS que evolucionan en el tiempo. Primero, se presenta un estudio sobre el uso de redes neuronales (NN) con parámetros aleatorios como un método alternativo para procesar BDS con el objetivo de mejorar la velocidad de entrenamiento de Nns. También se destacan problemas importantes derivados del uso de NN para BDS. Como consecuencia, esta tesis tomo la dirección de mejorar los métodos de vanguardia en BDS: Hoeffding Trees y sus combinaciones en Ensembles. Segundo, se propone el Echo State Hoeffding Tree (ESHT), como una extensión del HT para modelar las dependencias temporales típicamente presentes en BDS. La nueva arquitectura propuesta se evalúa tanto en problemas de regresión como de clasificación. Tercero, se propone una extensión para el Adaptive Random Forest (ARF), publicado recientemente y considerado como el clasificador mas potente implementado en MOA (un framework muy popular para procesar BDS). Proponemos el Elastic Swap Random Forest para reducir el número de clasificadores en el ensemble a un tercio en promedio, al tiempo se mantiene un accuracy similar a la de un ARF estándar con 100 árboles. Finalmente, la última contribución de esta tesis es una arquitectura de Ensembles multi hilo para procesar BDS. Nuestro diseño es altamente adaptable a una variedad de plataformas de hardware, que van desde servidores hasta pequeños dispositivos en el Edge Computing (pej, Internet de las Cosas). El diseño propuesto logra mejoras de rendimiento de 85x (Intel i7), 143x (análisis de Intel Xeon desde la memoria), 10x (Jetson TX1, ARM) y 23x (X-Gene2, ARM) en comparación con MOA (un solo proceso) en un Intel i7. Además, la propuesta logra una eficiencia paralela del 75 \% cuando se usan 24 núcleos en el Intel Xeon.Postprint (published version

    Towards Closing the Programmability-Efficiency Gap using Software-Defined Hardware

    Full text link
    The past decade has seen the breakdown of two important trends in the computing industry: Moore’s law, an observation that the number of transistors in a chip roughly doubles every eighteen months, and Dennard scaling, that enabled the use of these transistors within a constant power budget. This has caused a surge in domain-specific accelerators, i.e. specialized hardware that deliver significantly better energy efficiency than general-purpose processors, such as CPUs. While the performance and efficiency of such accelerators are highly desirable, the fast pace of algorithmic innovation and non-recurring engineering costs have deterred their widespread use, since they are only programmable across a narrow set of applications. This has engendered a programmability-efficiency gap across contemporary platforms. A practical solution that can close this gap is thus lucrative and is likely to engender broad impact in both academic research and the industry. This dissertation proposes such a solution with a reconfigurable Software-Defined Hardware (SDH) system that morphs parts of the hardware on-the-fly to tailor to the requirements of each application phase. This system is designed to deliver near-accelerator-level efficiency across a broad set of applications, while retaining CPU-like programmability. The dissertation first presents a fixed-function solution to accelerate sparse matrix multiplication, which forms the basis of many applications in graph analytics and scientific computing. The solution consists of a tiled hardware architecture, co-designed with the outer product algorithm for Sparse Matrix-Matrix multiplication (SpMM), that uses on-chip memory reconfiguration to accelerate each phase of the algorithm. A proof-of-concept is then presented in the form of a prototyped 40 nm Complimentary Metal-Oxide Semiconductor (CMOS) chip that demonstrates energy efficiency and performance per die area improvements of 12.6x and 17.1x over a high-end CPU, and serves as a stepping stone towards a full SDH system. The next piece of the dissertation enhances the proposed hardware with reconfigurability of the dataflow and resource sharing modes, in order to extend acceleration support to a set of common parallelizable workloads. This reconfigurability lends the system the ability to cater to discrete data access and compute patterns, such as workloads with extensive data sharing and reuse, workloads with limited reuse and streaming access patterns, among others. Moreover, this system incorporates commercial cores and a prototyped software stack for CPU-level programmability. The proposed system is evaluated on a diverse set of compute-bound and memory-bound kernels that compose applications in the domains of graph analytics, machine learning, image and language processing. The evaluation shows average performance and energy-efficiency gains of 5.0x and 18.4x over the CPU. The final part of the dissertation proposes a runtime control framework that uses low-cost monitoring of hardware performance counters to predict the next best configuration and reconfigure the hardware, upon detecting a change in phase or nature of data within the application. In comparison to prior work, this contribution targets multicore CGRAs, uses low-overhead decision tree based predictive models, and incorporates reconfiguration cost-awareness into its policies. Compared to the best-average static (non-reconfiguring) configuration, the dynamically reconfigurable system achieves a 1.6x improvement in performance-per-Watt in the Energy-Efficient mode of operation, or the same performance with 23% lower energy in the Power-Performance mode, for SpMM across a suite of real-world inputs. The proposed reconfiguration mechanism itself outperforms the state-of-the-art approach for dynamic runtime control by up to 2.9x in terms of energy-efficiency.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169859/1/subh_1.pd

    Adaptive networks for robotics and the emergence of reward anticipatory circuits

    Get PDF
    Currently the central challenge facing evolutionary robotics is to determine how best to extend the range and complexity of behaviour supported by evolved neural systems. Implicit in the work described in this thesis is the idea that this might best be achieved through devising neural circuits (tractable to evolutionary exploration) that exhibit complementary functional characteristics. We concentrate on two problem domains; locomotion and sequence learning. For locomotion we compare the use of GasNets and other adaptive networks. For sequence learning we introduce a novel connectionist model inspired by the role of dopamine in the basal ganglia (commonly interpreted as a form of reinforcement learning). This connectionist approach relies upon a new neuron model inspired by notions of energy efficient signalling. Two reward adaptive circuit variants were investigated. These were applied respectively to two learning problems; where action sequences are required to take place in a strict order, and secondly, where action sequences are robust to intermediate arbitrary states. We conclude the thesis by proposing a formal model of functional integration, encompassing locomotion and sequence learning, extending ideas proposed by W. Ross Ashby. A general model of the adaptive replicator is presented, incoporating subsystems that are tuned to continuous variation and discrete or conditional events. Comparisons are made with Ross W. Ashby's model of ultrastability and his ideas on adaptive behaviour. This model is intended to support our assertion that, GasNets (and similar networks) and reward adaptive circuits of the type presented here, are intrinsically complementary. In conclusion we present some ideas on how the co-evolution of GasNet and reward adaptive circuits might lead us to significant improvements in the synthesis of agents capable of exhibiting complex adaptive behaviour

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    corecore