323 research outputs found

    Improving A*OMP: Theoretical and Empirical Analyses With a Novel Dynamic Cost Model

    Full text link
    Best-first search has been recently utilized for compressed sensing (CS) by the A* orthogonal matching pursuit (A*OMP) algorithm. In this work, we concentrate on theoretical and empirical analyses of A*OMP. We present a restricted isometry property (RIP) based general condition for exact recovery of sparse signals via A*OMP. In addition, we develop online guarantees which promise improved recovery performance with the residue-based termination instead of the sparsity-based one. We demonstrate the recovery capabilities of A*OMP with extensive recovery simulations using the adaptive-multiplicative (AMul) cost model, which effectively compensates for the path length differences in the search tree. The presented results, involving phase transitions for different nonzero element distributions as well as recovery rates and average error, reveal not only the superior recovery accuracy of A*OMP, but also the improvements with the residue-based termination and the AMul cost model. Comparison of the run times indicate the speed up by the AMul cost model. We also demonstrate a hybrid of OMP and A?OMP to accelerate the search further. Finally, we run A*OMP on a sparse image to illustrate its recovery performance for more realistic coefcient distributions

    MIMOPack: A High Performance Computing Library for MIMO Communication Systems

    Full text link
    [EN] Nowadays, several communication standards are emerging and evolving, searching higher transmission rates, reliability and coverage. This expansion is primarily driven by the continued increase in consumption of mobile multimedia services due to the emergence of new handheld devices such as smartphones and tablets. One of the most significant techniques employed to meet these demands is the use of multiple transmit and receive antennas, known as MIMO systems. The use of this technology allows to increase the transmission rate and the quality of the transmission through the use of multiple antennas at the transmitter and receiver sides. MIMO technologies have become an essential key in several wireless standards such as WLAN, WiMAX and LTE. These technologies will be incorporated also in future standards, therefore is expected in the coming years a great deal of research in this field. Clearly, the study of MIMO systems is critical in the current investigation, however the problems that arise from this technology are very complex. High Performance Computing (HPC) systems, and specifically, modern hardware architectures as multi-core and many-cores (e.g Graphics Processing Units (GPU)) are playing a key role in the development of efficient and low-complexity algorithms for MIMO transmissions. Proof of this is that the number of scientific contributions and research projects related to its use has increased in the last years. Also, some high performance libraries have been implemented as tools for researchers involved in the development of future communication standards. Two of the most popular libraries are: IT++ that is a library based on the use of some optimized libraries for multi-core processors and the Communications System Toolbox designed for use with MATLAB, which uses GPU computing. However, there is not a library able to run on a heterogeneous platform using all the available resources. In view of the high computational requirements in MIMO application research and the shortage of tools able to satisfy them, we have made a special effort to develop a library to ease the development of adaptable parallel applications in accordance with the different architectures of the executing platform. The library, called MIMOPack, aims to implement efficiently using parallel computing, a set of functions to perform some of the critical stages of MIMO communication systems simulation. The main contribution of the thesis is the implementation of efficient Hard and Soft output detectors, since the detection stage is considered the most complex part of the communication process. These detectors are highly configurable and many of them include preprocessing techniques that reduce the computational cost and increase the performance. The proposed library shows three important features: portability, efficiency and easy of use. Current realease allows GPUs and multi-core computation, or even simultaneously, since it is designed to use on heterogeneous machines. The interface of the functions are common to all environments in order to simplify the use of the library. Moreover, some of the functions are callable from MATLAB increasing the portability of developed codes between different computing environments. According to the library design and the performance assessment, we consider that MIMOPack may facilitate industrial and academic researchers the implementation of scientific codes without having to know different programming languages and machine architectures. This will allow to include more complex algorithms in their simulations and obtain their results faster. This is particularly important in the industry, since the manufacturers work to analyze and to propose their own technologies with the aim that it will be approved as a standard. Thus allowing to enforce their intellectual property rights over their competitors, who should obtain the corresponding licenses to include these technologies into their products.[ES] En la actualidad varios estándares de comunicación están surgiendo buscando velocidades de transmisión más altas y mayor fiabilidad. Esta expansión está impulsada por el aumento en el consumo de servicios multimedia debido a la aparición de nuevos dispositivos como los smartphones y las tabletas. Una de las técnicas empleadas más importantes es el uso de múltiples antenas de transmisión y recepción, conocida como sistemas MIMO, que permite aumentar la velocidad y la calidad de la transmisión. Las tecnologías MIMO se han convertido en una parte esencial en diferentes estándares tales como WLAN, WiMAX y LTE. Estas tecnologías se incorporarán también en futuros estándares, por lo tanto, se espera en los próximos años una gran cantidad de investigación en este campo. Está claro que el estudio de los sistemas MIMO es crítico en la investigación actual, sin embargo los problemas que surgen de esta tecnología son muy complejos. La sistemas de computación de alto rendimiento, y en concreto, las arquitecturas hardware actuales como multi-core y many-core (p. ej. GPUs) están jugando un papel clave en el desarrollo de algoritmos eficientes y de baja complejidad en las transmisiones MIMO. Prueba de ello es que el número de contribuciones científicas y proyectos de investigación relacionados con su uso se han incrementado en el últimos años. Algunas librerías de alto rendimiento se están utilizando como herramientas por investigadores en el desarrollo de futuros estándares. Dos de las librerías más destacadas son: IT++ que se basa en el uso de distintas librerías optimizadas para procesadores multi-core y el paquete Communications System Toolbox diseñada para su uso con MATLAB, que utiliza computación con GPU. Sin embargo, no hay una biblioteca capaz de ejecutarse en una plataforma heterogénea. En vista de los altos requisitos computacionales en la investigación MIMO y la escasez de herramientas capaces de satisfacerlos, hemos implementado una librería que facilita el desarrollo de aplicaciones paralelas adaptables de acuerdo con las diferentes arquitecturas de la plataforma de ejecución. La librería, llamada MIMOPack, implementa de manera eficiente un conjunto de funciones para llevar a cabo algunas de las etapas críticas en la simulación de un sistema de comunicación MIMO. La principal aportación de la tesis es la implementación de detectores eficientes de salida Hard y Soft, ya que la etapa de detección es considerada la parte más compleja en el proceso de comunicación. Estos detectores son altamente configurables y muchos de ellos incluyen técnicas de preprocesamiento que reducen el coste computacional y aumentan el rendimiento. La librería propuesta tiene tres características importantes: la portabilidad, la eficiencia y facilidad de uso. La versión actual permite computación en GPU y multi-core, incluso simultáneamente, ya que está diseñada para ser utilizada sobre plataformas heterogéneas que explotan toda la capacidad computacional. Para facilitar el uso de la biblioteca, las interfaces de las funciones son comunes para todas las arquitecturas. Algunas de las funciones se pueden llamar desde MATLAB aumentando la portabilidad de códigos desarrollados entre los diferentes entornos. De acuerdo con el diseño de la biblioteca y la evaluación del rendimiento, consideramos que MIMOPack puede facilitar la implementación de códigos sin tener que saber programar con diferentes lenguajes y arquitecturas. MIMOPack permitirá incluir algoritmos más complejos en las simulaciones y obtener los resultados más rápidamente. Esto es particularmente importante en la industria, ya que los fabricantes trabajan para proponer sus propias tecnologías lo antes posible con el objetivo de que sean aprobadas como un estándar. De este modo, los fabricantes pueden hacer valer sus derechos de propiedad intelectual frente a sus competidores, quienes luego deben obtener las correspon[CA] En l'actualitat diversos estàndards de comunicació estan sorgint i evolucionant cercant velocitats de transmissió més altes i major fiabilitat. Aquesta expansió, està impulsada pel continu augment en el consum de serveis multimèdia a causa de l'aparició de nous dispositius portàtils com els smartphones i les tablets. Una de les tècniques més importants és l'ús de múltiples antenes de transmissió i recepció (MIMO) que permet augmentar la velocitat de transmissió i la qualitat de transmissió. Les tecnologies MIMO s'han convertit en una part essencial en diferents estàndards inalàmbrics, tals com WLAN, WiMAX i LTE. Aquestes tecnologies s'incorporaran també en futurs estàndards, per tant, s'espera en els pròxims anys una gran quantitat d'investigació en aquest camp. L'estudi dels sistemes MIMO és crític en la recerca actual, no obstant açó, els problemes que sorgeixen d'aquesta tecnologia són molt complexos. Els sistemes de computació d'alt rendiment com els multi-core i many-core (p. ej. GPUs)), estan jugant un paper clau en el desenvolupament d'algoritmes eficients i de baixa complexitat en les transmissions MIMO. Prova d'açò és que el nombre de contribucions científiques i projectes d'investigació relacionats amb el seu ús s'han incrementat en els últims anys. Algunes llibreries d'alt rendiment estan utilitzant-se com a eines per investigadors involucrats en el desenvolupament de futurs estàndards. Dos de les llibreries més destacades són: IT++ que és una llibreria basada en lús de diferents llibreries optimitzades per a processadors multi-core i el paquet Communications System Toolbox dissenyat per al seu ús amb MATLAB, que utilitza computació amb GPU. No obstant açò, no hi ha una biblioteca capaç d'executar-se en una plataforma heterogènia. Degut als alts requisits computacionals en la investigació MIMO i l'escacès d'eines capaces de satisfer-los, hem implementat una llibreria que facilita el desenvolupament d'aplicacions paral·leles adaptables d'acord amb les diferentes arquitectures de la plataforma d'ejecució. La llibreria, anomenada MIMOPack, implementa de manera eficient, un conjunt de funcions per dur a terme algunes de les etapes crítiques en la simulació d'un sistema de comunicació MIMO. La principal aportació de la tesi és la implementació de detectors eficients d'exida Hard i Soft, ja que l'etapa de detecció és considerada la part més complexa en el procés de comunicació. Estos detectors són altament configurables i molts d'ells inclouen tècniques de preprocessament que redueixen el cost computacional i augmenten el rendiment. La llibreria proposta té tres característiques importants: la portabilitat, l'eficiència i la facilitat d'ús. La versió actual permet computació en GPU i multi-core, fins i tot simultàniament, ja que està dissenyada per a ser utilitzada sobre plataformes heterogènies que exploten tota la capacitat computacional. Amb el fi de simplificar l'ús de la biblioteca, les interfaces de les funcions són comunes per a totes les arquitectures. Algunes de les funcions poden ser utilitzades des de MATLAB augmentant la portabilitat de còdics desenvolupats entre els diferentes entorns. D'acord amb el disseny de la biblioteca i l'evaluació del rendiment, considerem que MIMOPack pot facilitar la implementació de còdics a investigadors sense haver de saber programar amb diferents llenguatges i arquitectures. MIMOPack permetrà incloure algoritmes més complexos en les seues simulacions i obtindre els seus resultats més ràpid. Açò és particularment important en la industria, ja que els fabricants treballen per a proposar les seues pròpies tecnologies el més prompte possible amb l'objectiu que siguen aprovades com un estàndard. D'aquesta menera, els fabricants podran fer valdre els seus drets de propietat intel·lectual enfront dels seus competidors, els qui després han d'obtenir les corresponents llicències si voleRamiro Sánchez, C. (2015). MIMOPack: A High Performance Computing Library for MIMO Communication Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/53930TESISPremios Extraordinarios de tesis doctorale

    Engineering Algorithms for Dynamic and Time-Dependent Route Planning

    Get PDF
    Efficiently computing shortest paths is an essential building block of many mobility applications, most prominently route planning/navigation devices and applications. In this thesis, we apply the algorithm engineering methodology to design algorithms for route planning in dynamic (for example, considering real-time traffic) and time-dependent (for example, considering traffic predictions) problem settings. We build on and extend the popular Contraction Hierarchies (CH) speedup technique. With a few minutes of preprocessing, CH can optimally answer shortest path queries on continental-sized road networks with tens of millions of vertices and edges in less than a millisecond, i.e. around four orders of magnitude faster than Dijkstra’s algorithm. CH already has been extended to dynamic and time-dependent problem settings. However, these adaptations suffer from limitations. For example, the time-dependent variant of CH exhibits prohibitive memory consumption on large road networks with detailed traffic predictions. This thesis contains the following key contributions: First, we introduce CH-Potentials, an A*-based routing framework. CH-Potentials computes optimal distance estimates for A* using CH with a lower bound weight function derived at preprocessing time. The framework can be applied to any routing problem where appropriate lower bounds can be obtained. The achieved speedups range between one and three orders of magnitude over Dijkstra’s algorithm, depending on how tight the lower bounds are. Second, we propose several improvements to Customizable Contraction Hierarchies (CCH), the CH adaptation for dynamic route planning. Our improvements yield speedups of up to an order of magnitude. Further, we augment CCH to efficiently support essential extensions such as turn costs, alternative route computation and point-of-interest queries. Third, we present the first space-efficient, fast and exact speedup technique for time-dependent routing. Compared to the previous time-dependent variant of CH, our technique requires up to 40 times less memory, needs at most a third of the preprocessing time, and achieves only marginally slower query running times. Fourth, we generalize A* and introduce time-dependent A* potentials. This allows us to design the first approach for routing with combined live and predicted traffic, which achieves interactive running times for exact queries while allowing live traffic updates in a fraction of a minute. Fifth, we study extended problem models for routing with imperfect data and routing for truck drivers and present efficient algorithms for these variants. Sixth and finally, we present various complexity results for non-FIFO time-dependent routing and the extended problem models

    Energy-Efficient Recurrent Neural Network Accelerators for Real-Time Inference

    Full text link
    Over the past decade, Deep Learning (DL) and Deep Neural Network (DNN) have gone through a rapid development. They are now vastly applied to various applications and have profoundly changed the life of hu- man beings. As an essential element of DNN, Recurrent Neural Networks (RNN) are helpful in processing time-sequential data and are widely used in applications such as speech recognition and machine translation. RNNs are difficult to compute because of their massive arithmetic operations and large memory footprint. RNN inference workloads used to be executed on conventional general-purpose processors including Central Processing Units (CPU) and Graphics Processing Units (GPU); however, they have un- necessary hardware blocks for RNN computation such as branch predictor, caching system, making them not optimal for RNN processing. To accelerate RNN computations and outperform the performance of conventional processors, previous work focused on optimization methods on both software and hardware. On the software side, previous works mainly used model compression to reduce the memory footprint and the arithmetic operations of RNNs. On the hardware side, previous works also designed domain-specific hardware accelerators based on Field Pro- grammable Gate Arrays (FPGA) or Application Specific Integrated Circuits (ASIC) with customized hardware pipelines optimized for efficient pro- cessing of RNNs. By following this software-hardware co-design strategy, previous works achieved at least 10X speedup over conventional processors. Many previous works focused on achieving high throughput with a large batch of input streams. However, in real-time applications, such as gaming Artificial Intellegence (AI), dynamical system control, low latency is more critical. Moreover, there is a trend of offloading neural network workloads to edge devices to provide a better user experience and privacy protection. Edge devices, such as mobile phones and wearable devices, are usually resource-constrained with a tight power budget. They require RNN hard- ware that is more energy-efficient to realize both low-latency inference and long battery life. Brain neurons have sparsity in both the spatial domain and time domain. Inspired by this human nature, previous work mainly explored model compression to induce spatial sparsity in RNNs. The delta network algorithm alternatively induces temporal sparsity in RNNs and can save over 10X arithmetic operations in RNNs proven by previous works. In this work, we have proposed customized hardware accelerators to exploit temporal sparsity in Gated Recurrent Unit (GRU)-RNNs and Long Short-Term Memory (LSTM)-RNNs to achieve energy-efficient real-time RNN inference. First, we have proposed DeltaRNN, the first-ever RNN accelerator to exploit temporal sparsity in GRU-RNNs. DeltaRNN has achieved 1.2 TOp/s effective throughput with a batch size of 1, which is 15X higher than its related works. Second, we have designed EdgeDRNN to accelerate GRU-RNN edge inference. Compared to DeltaRNN, EdgeDRNN does not rely on on-chip memory to store RNN weights and focuses on reducing off-chip Dynamic Random Access Memory (DRAM) data traffic using a more scalable architecture. EdgeDRNN have realized real-time inference of large GRU-RNNs with submillisecond latency and only 2.3 W wall plug power consumption, achieving 4X higher energy efficiency than commercial edge AI platforms like NVIDIA Jetson Nano. Third, we have used DeltaRNN to realize the first-ever continuous speech recognition sys- tem with the Dynamic Audio Sensor (DAS) as the front-end. The DAS is a neuromorphic event-driven sensor that produces a stream of asyn- chronous events instead of audio data sampled at a fixed sample rate. We have also showcased how an RNN accelerator can be integrated with an event-driven sensor on the same chip to realize ultra-low-power Keyword Spotting (KWS) on the extreme edge. Fourth, we have used EdgeDRNN to control a powered robotic prosthesis using an RNN controller to replace a conventional proportional–derivative (PD) controller. EdgeDRNN has achieved 21 μs latency of running the RNN controller and could maintain stable control of the prosthesis. We have used DeltaRNN and EdgeDRNN to solve these problems to prove their value in solving real-world problems. Finally, we have applied the delta network algorithm on LSTM-RNNs and have combined it with a customized structured pruning method, called Column-Balanced Targeted Dropout (CBTD), to induce spatio-temporal sparsity in LSTM-RNNs. Then, we have proposed another FPGA-based accelerator called Spartus, the first RNN accelerator that exploits spatio- temporal sparsity. Spartus achieved 9.4 TOp/s effective throughput with a batch size of 1, the highest among present FPGA-based RNN accelerators with a power budget around 10 W. Spartus can complete the inference of an LSTM layer having 5 million parameters within 1 μs

    Angiogenesis in Spontaneous Tumors and Implications for Comparative Tumor Biology.

    Get PDF
    Blood supply is essential for development and growth of tumors and angiogenesis is the fundamental process of new blood vessel formation from preexisting ones. Angiogenesis is a prognostic indicator for a variety of tumors, and it coincides with increased shedding of neoplastic cells into the circulation and metastasis. Several molecules such as cell surface receptors, growth factors, and enzymes are involved in this process. While antiangiogenic therapy for cancer has been proposed over 20 years ago, it has garnered much controversy in recent years within the scientific community. The complex relationships between the angiogenic signaling cascade and antiangiogenic substances have indicated the angiogenic pathway as a valid target for anticancer drug development and VEGF has become the primary antiangiogenic drug target. This review discusses the basic and clinical perspectives of angiogenesis highlighting the importance of comparative biology in understanding tumor angiogenesis and the integration of these model systems for future drug development

    Fast and exact geodesic computation using Edge-based Windows Grouping.

    Get PDF
    Computing discrete geodesic distance over triangle meshes is one of the fundamental problems in computational geometry and computer graphics. As the “Big Data Era” arrives, a fast and accurate solution to the geodesic computation problem on large scale models with constantly increasing resolutions is desired. However, it is still challenging to deal with the speed, memory cost and accuracy of the geodesic computation at the same time. This thesis addresses the aforementioned challenge by proposing the Edge- based Windows Grouping (EWG) technique. With the local geodesic information encoded in a “window”, EWG groups the windows based on the mesh edges and processes them together. Thus, the interrelationships among the grouped windows can be utilized to improve the performance of geodesic computation on triangle meshes. Based on EWG, a novel exact geodesic algorithm is proposed in this thesis, which is fast, accurate and memory-efficient. This algorithm computes the geodesic distances at mesh vertices by propagating the geodesic information from the source over the entire mesh. Its high performance comes from its low computational redundancy and management overhead, which are both introduced by EWG. First, the redundant windows on an edge can be removed by comparing its distance with those of the other windows on the same edge. Second, the windows grouped on an edge usually have similar geodesic distances and can be propagated in batches efficiently. To the best of my knowledge, the proposed exact geodesic algorithm is the fastest and most memory-efficient one among all existing methods. In addition, the proposed exact geodesic algorithm is revised and employed to construct the geodesic-metric-based Voronoi diagram on triangle meshes. In this application, the geodesic computation is the bottleneck in both the time and memory costs. The proposed method achieves low memory cost from the key observation that the Voronoi diagram boundaries usually only cross a minority of the meshes’ triangles and most of the windows stored on edges are redundant. As a result, the proposed method resolves the memory bottleneck of the Voronoi diagram construction without sacrificing its speed

    Arteriogenesis – Molecular Regulation, Pathophysiology and Therapeutics I

    Get PDF

    Connecting the dots between PubMed abstracts

    Get PDF
    Background: There are now a multitude of articles published in a diversity of journals providing information about genes, proteins, pathways, and diseases. Each article investigates subsets of a biological process, but to gain insight into the functioning of a system as a whole, we must integrate information from multiple publications. Particularly, unraveling relationships between extra-cellular inputs and downstream molecular response mechanisms requires integrating conclusions from diverse publications. Methodology: We present an automated approach to biological knowledge discovery from PubMed abstracts, suitable for "connecting the dots" across the literature. We describe a storytelling algorithm that, given a start and end publication, typically with little or no overlap in content, identifies a chain of intermediate publications from one to the other, such that neighboring publications have significant content similarity. The quality of discovered stories is measured using local criteria such as the size of supporting neighborhoods for each link and the strength of individual links connecting publications, as well as global metrics of dispersion. To ensure that the story stays coherent as it meanders from one publication to another, we demonstrate the design of novel coherence and overlap filters for use as post-processing steps. Conclusions: We demonstrate the application of our storytelling algorithm to three case studies: i) a many-one study exploring relationships between multiple cellular inputs and a molecule responsible for cell-fate decisions, ii) a many-many study exploring the relationships between multiple cytokines and multiple downstream transcription factors, and iii) a one-to-one study to showcase the ability to recover a cancer related association, viz. the Warburg effect, from past literature. The storytelling pipeline helps narrow down a scientist's focus from several hundreds of thousands of relevant documents to only around a hundred stories. We argue that our approach can serve as a valuable discovery aid for hypothesis generation and connection exploration in large unstructured biological knowledge bases.Institute for Critical Technology and Applied Science, Virginia Tech, and the US National Science Foundation through grant CCF-0937133.Scopu

    Cell migration and capillary plexus formation in wounds and retinae

    Get PDF
    Cell migration is a fundamental biological phenomenon that is critical to the development and maintenance of tissues in multi-cellular organisms. This thesis presents a series of discrete mathematical models designed to study the migratory response of such cells when exposed to a variety of environmental stimuli. By applying these models to pertinent biological scenarios and benchmarking results against experimental data, novel insights are gained into the underlying cell behaviour. The process of angiogenesis is investigated first and models are developed for simulating capillary plexus expansion during both wound healing and retinal vascular development. The simulated cell migration is coupled to a detailed model of blood perfusion that allows prediction of dynamic flow-induced evolution of the nascent vascular architectures – the network topologies generated in each case are found to successfully reproduce a number of longitudinal experimental metrics. Moreover, in the case of retinal development, the resultant distributions of haematocrit and oxygen are found to be essential in generating vasculatures that resemble those observed in vivo. An alternative cell migration model is then derived that is capable of more accurately describing both individual and collective cell movement. The general model framework, which allows for biophysical cell-cell interactions and adaptive cell morphologies, is seen to have the potential for a range of applications. The value of the modelling approach is well demonstrated by benchmarking in silico cell movement against experimental data from an in vitro fibroblast scrape wound assay. The results subsequently reveal an unexplained discrepancy that provides an intriguing challenge for future studies
    corecore