Search CORE

1,670 research outputs found

Recommended from our members

Accelerating Iterative Computations for Large-Scale Data Processing

Author: Yin Jiangtao
Publication venue: ScholarWorks@UMass Amherst
Publication date: 10/11/2016
Field of study

Recent advances in sensing, storage, and networking technologies are creating massive amounts of data at an unprecedented scale and pace. Large-scale data processing is commonly leveraged to make sense of these data, which will enable companies, governments, and organizations, to make better decisions and bring convenience to our daily life. However, the massive amount of data involved makes it challenging to perform data processing in a timely manner. On the one hand, huge volumes of data might not even fit into the disk of a single machine. On the other hand, data mining and machine learning algorithms, which are usually involved in large-scale data processing, typically require time-consuming iterative computations. Therefore, it is imperative to efficiently perform iterative computations on large computer clusters or cloud using highly-parallel and shared-nothing distributed systems. This research aims to explore new forms of iterative computations that reduce unnecessary computations so as to accelerate large-scale data processing in a distributed environment. We propose the iterative computation transformation for well-known data mining and machine learning algorithms, such as expectation-maximization, nonnegative matrix factorization, belief propagation, and graph algorithms (e.g., PageRank). These algorithms have been used in a wide range of application domains. First, we show how to accelerate expectation-maximization algorithms with frequent updates in a distributed environment. Then, we illustrate the way of efficiently scaling distributed nonnegative matrix factorization with block-wise updates. Next, our approach of scaling distributed belief propagation with prioritized block updates is presented. Last, we illustrate how to efficiently perform distributed incremental computation on evolving graphs. We will elaborate how to implement these transformed iterative computations on existing distributed programming models such as the MapReduce-based model, as well as develop new scalable and efficient distributed programming models and frameworks when necessary. The goal of these supporting distributed frameworks is to lift the burden of the programmers in specifying transformation of iterative computations and communication mechanisms, and automatically optimize the execution of the computation. Our techniques are evaluated extensively to demonstrate their efficiency. While the techniques we propose are in the context of specific algorithms, they address the challenges commonly faced in many other algorithms

ScholarWorks@UMass Amherst

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

Author: Alser Mohammed
Cali Damla Senol
Cavlak Meryem Banu
Firtina Can
Kalsi Gurpreet S.
Kim Jeremie
Lindegger Joel
Luna Juan Gómez
Mutlu Onur
Pillai Kamlesh
Shahroodi Taha
Subramoney Sreenivas
Suresh Bharathwaj
Publication venue
Publication date: 21/10/2023
Field of study

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out negligible computations using a hardware-based filter, and 4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and 27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x - 1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency by 64.24x - 115.46x, 1.75x, 1.96x.Comment: Accepted to ACM TAC

arXiv.org e-Print Archive

Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements

Author: Biémont
Catlin
Cattamanchi
Crawford
Crawford
Dempster
Doss
Gagneux
Golinelli
Henrici
Huber
Kato-Maeda
Keiding
Lange
McEvoy
Minin
Rosenberg
Schwarz
Tanaka
Van Embden
Publication venue
Publication date: 29/11/2014
Field of study

Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements - important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a mutli-type branching process approximation to BDS processes and develop a corresponding expectation maximization (EM) algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply more broadly to multi-type branching processes where rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a frequently used element during estimation of epidemiological clusters of Mycobacterium tuberculosis infections.Comment: 31 pages, 7 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

PubMed Central

eScholarship - University of California

Quantitative PET image reconstruction employing nested expectation-maximization deconvolution for motion compensation

Author: Angelis
Asma
Bloomfield
Buerger
Charalampos Tsoumpas
Chun
Daou
Dawood
Dawood
Dimitrakopoulou-Strauss
El Naqa
Faber
Feng
Fürst
Gigengack
Habib Zaidi
Jakoby
Judenhofer
Karakatsanis
Karakatsanis
Karakatsanis
Karakatsanis
Karakatsanis
Karakatsanis
Karakatsanis
Karakatsanis
Karakatsanis
Kolbitsch
Könik
Lamare
Lange
Liu
Livieratos
Manber
Mohy-ud-Din
Nicolas A. Karakatsanis
Okazumi
Okazumi
Park
Petibon
Picard
Polycarpou
Qiao
Raghunath
Rahmim
Rahmim
Rahmim
Rahmim
Rahmim
Segars
Thielemans
Torizuka
Torizuka
Tsoumpas
Wang
Wang
Würslin
Xu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Bulk body motion may randomly occur during PET acquisitions introducing blurring, attenuation-emission mismatches and, in dynamic PET, discontinuities in the measured time activity curves between consecutive frames. Meanwhile, dynamic PET scans are longer, thus increasing the probability of bulk motion. In this study, we propose a streamlined 3D PET motion-compensated image reconstruction (3D-MCIR) framework, capable of robustly deconvolving intra-frame motion from a static or dynamic 3D sinogram. The presented 3D-MCIR methods need not partition the data into multiple gates, such as 4D MCIR algorithms, or access list-mode (LM) data, such as LM MCIR methods, both associated with increased computation or memory resources. The proposed algorithms can support compensation for any periodic and non-periodic motion, such as cardio-respiratory or bulk motion, the latter including rolling, twisting or drifting. Inspired from the widely adopted point-spread function (PSF) deconvolution 3D PET reconstruction techniques, here we introduce an image-based 3D generalized motion deconvolution method within the standard 3D maximum-likelihood expectation-maximization (ML-EM) reconstruction framework. In particular, we initially integrate a motion blurring kernel, accounting for every tracked motion within a frame, as an additional MLEM modeling component in the image space (integrated 3D-MCIR). Subsequently, we replaced the integrated model component with a nested iterative Richardson-Lucy (RL) image-based deconvolution method to accelerate the MLEM algorithm convergence rate (RL-3D-MCIR). The final method was evaluated with realistic simulations of whole-body dynamic PET data employing the XCAT phantom and real human bulk motion profiles, the latter estimated from volunteer dynamic MRI scans. In addition, metabolic uptake rate Ki parametric images were generated with the standard Patlak method. Our results demonstrate significant improvement in contrast-to-noise ratio (CNR) and noise-bias performance in both dynamic and parametric images. The proposed nested RL-3D-MCIR method is implemented on the Software for Tomographic Image Reconstruction (STIR) open-source platform and is scheduled for public release

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

White Rose Research Online

Archive ouverte UNIGE

Dissertations of the University of Groningen

Fitting and Interpreting Continuous-Time Latent Markov Models for Panel Data

Author: Aalen
Andersen
Andersen
Asmussen
Baum
Bladt
Bladt
Bladt
Bureau
Byrd
Cappe
Chan
Chen
Crespi
Cumani
Dempster
Eddelbuettel
Elashoff
Estenne
Faddy
Finlen Copeland
Foucher
Gentleman
Green
Guihenneuc-Jouyaux
Hobolth
Hobolth
Jackson
Jackson
Kalbfleisch
Kang
Lama
Lange
Louis
Lystig
Mandel
Marshall
McGrory
Minin
Minin
Nelder
Roberts
Steele
Titman
Titman
Varadhan
Wu
Publication venue: Collection of Biostatistics Research Archive
Publication date: 03/08/2012
Field of study

Multistate models are used to characterize disease processes within an individual. Clinical studies often observe the disease status of individuals at discrete time points, making exact times of transitions between disease states unknown. Such panel data pose considerable modeling challenges. Assuming the disease process progresses according a standard continuous-time Markov chain (CTMC) yields tractable likelihoods, but the assumption of exponential sojourn time distributions is typically unrealistic. More flexible semi-Markov models permit generic sojourn distributions yet yield intractable likelihoods for panel data in the presence of reversible transitions. One attractive alternative is to assume that the disease process is characterized by an underlying latent CTMC, with multiple latent states mapping to each disease state. These models retain analytic tractability due to the CTMC framework but allow for flexible, duration-dependent disease state sojourn distributions. We have developed a robust and efficient expectation-maximization (EM) algorithm in this context. Our complete data state space consists of the observed data and the underlying latent trajectory, yielding computationally efficient expectation and maximization steps. Our algorithm outperforms alternative methods measured in terms of time to convergence and robustness. We also examine the frequentist performance of latent CTMC point and interval estimates of disease process functionals based on simulated data. The performance of estimates depends on time, functional, and data-generating scenario. Finally, we illustrate the interpretive power of latent CTMC models for describing disease processes on a data-set of lung transplant patients. We hope our work will encourage wider use of these models in the biomedical setting

Crossref

PubMed Central

Collection Of Biostatistics Research Archive

Estimation and Control of Traffic Relying on Vehicular Connectivity

Author: Wan Nianfeng
Publication venue: Clemson University Libraries
Publication date: 01/08/2016
Field of study

Vehicular traﬃc ﬂow is essential, yet complicated to analyze. It describes the interplay among vehicles and with the infrastructure. A better understanding of traf-ﬁc would beneﬁt both individuals and the whole society in terms of improving safety, energy eﬃciency, and reducing environmental impacts. A large body of research ex-ists on estimation and control of vehicular traﬃc in which, however, vehicles were assumed not to be able to share information due to the limits of technology. With the development of wireless communication and various sensor devices, Connected Vehicles(CV) are emerging which are able to detect, access, and share information with each other and with the infrastructure in real time. Connected Vehicle Technology (CVT) has been attracting more and more attentions from diﬀerent ﬁelds. The goal of this dissertation is to develop approaches to estimate and control vehicular traﬃc as well as individual vehicles relying on CVT. On one hand, CVT sig-niﬁcantly enriches the data from individuals and the traﬃc, which contributes to the accuracy of traﬃc estimation algorithms. On the other hand, CVT enables commu-nication and information sharing between vehicles and infrastructure, and therefore allows vehicles to achieve better control and/or coordination among themselves and with smart infrastructure. The ﬁrst part of this dissertation focused on estimation of traﬃc on freeways and city streets. We use data available from on road sensors and also from probe One of the most important traﬃc performance measures is travel time. How-ever it is aﬀected by various factors, and freeways and arterials have diﬀerent travel time characteristics. In this dissertation we ﬁrst propose a stochastic model-based approach to freeway travel-time prediction. The approach uses the Link-Node Cell Transmission Model (LN-CTM) to model traﬃc and provides a probability distribu-tion for travel time. The probability distribution is generated using a Monte Carlo simulation and an Online Expectation Maximization clustering algorithm. Results show that the approach is able to generate a reasonable multimodal distribution for travel-time. For arterials, this dissertation presents methods for estimating statistics of travel time by utilizing sparse vehicular probe data. A public data feed from transit buses in the City of San Francisco is used. We divide each link into shorter segments, and propose iterative methods for allocating travel time statistics to each segment. Inspired by K-mean and Expectation Maximization (EM) algorithms, we iteratively update the mean and variance of travel time for each segment based on historical probe data until convergence. Based on segment travel time statistics, we then pro-pose a method to estimate the maximum likelihood trajectory (MLT) of a probe vehicle in between two data updates on arterial roads. The results are compared to high frequency ground truth data in multiple scenarios, which demonstrate the eﬀectiveness of the proposed approach. The second part of this dissertation emphasize on control approaches enabled by vehicular connectivity. Estimation and prediction of surrounding vehicle behaviors and upcoming traﬃc makes it possible to improve driving performance. We ﬁrst propose a Speed Advisory System for arterial roads, which utilizes upcoming traﬃ

Clemson University: TigerPrints