Search CORE

66 research outputs found

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

Author: Canon Shane
Chhugani Jatin
Demmel James
Devarakonda Aditya
Gerhardt Lisa
Gittens Alex
Harrell Jim
Kottalam Jey
Krishnamurthy Venkat
Liu Jialin
Mahoney Michael W.
Maschhoff Kristyn
Prabhat
Racah Evan
Ringenburg Michael
Sharma Pramod
Yang Jiyan
Publication venue
Publication date: 12/05/2016
Field of study

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

arXiv.org e-Print Archive

eScholarship - University of California

The Data Science Design Manual

Author: Steven S. Skiena
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 22/04/2020
Field of study

Open Library

Advances in knowledge discovery and data mining Part II

Author: CAO Tru
CHEUNG David Wai-Lok
HO Tu-Bao
LIM Ee Peng
MOTODA Hiroshi
ZHOU Zhi-Hua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

Institutional Knowledge at Singapore Management University

HKU Scholars Hub

Proceedings, MSVSCC 2014

Author: Old Dominion University Department of Modeling, Simulation & Visualization Engineering
Old Dominion University Virginia Modeling, Analysis & Simulation Center
Publication venue: ODU Digital Commons
Publication date: 11/04/2013
Field of study

Proceedings of the 8th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 17, 2014 at VMASC in Suffolk, Virginia

Old Dominion University

Deep Model for Improved Operator Function State Assessment

Author: Li Feng
Li Jiang
Schnell Tom
Wen Jonathan
Xu Roger
Zhang Guangfan
Publication venue: ODU Digital Commons
Publication date: 01/01/2014
Field of study

A deep learning framework is presented for engagement assessment using EEG signals. Deep learning is a recently developed machine learning technique and has been applied to many applications. In this paper, we proposed a deep learning strategy for operator function state (OFS) assessment. Fifteen pilots participated in a flight simulation from Seattle to Chicago. During the four-hour simulation, EEG signals were recorded for each pilot. We labeled 20- minute data as engaged and disengaged to fine-tune the deep network and utilized the remaining vast amount of unlabeled data to initialize the network. The trained deep network was then used to assess if a pilot was engaged during the four-hour simulation

Old Dominion University

Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

Author
Publication venue: AUAI Press
Publication date: 01/09/2018
Field of study

UCL Discovery

Recommended from our members

Large-scale and Deep Spatiotemporal Point-Process Models

Author: Yuan Baichuan
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Many accurate spatiotemporal data sets have recently become available for research. Real-world applications create strong demands for a better multivariate point-process modeling. In this thesis, we develop new multivariate models with generalization ability and scalability. The first two chapters provide a research background, real-world problems and a mathematical introduction to point-process models. In chapter 3, we develop a nonparametric method for multivariate spatiotemporal Hawkes processes with applications on network reconstruction. In contrast to prior work, which has often focused on exclusively temporal information, our approach uses spatiotemporal information and does not assume a specific parametric form. Our results demonstrate that, in comparison to using only temporal data, our approach yields improved network reconstruction, providing a basis for meaningful subsequent analysis---such as examinations of community structure and motifs---of the reconstructed networks. In chapter 4, we present a fast and accurate estimation method for multivariate Hawkes processes. Our method, with guaranteed consistency, combines two estimation approaches. Extensive numerical experiments, with synthetic data and real-world social network data, show that our method improves the accuracy, scalability and computational efficiency of prevailing estimation approaches. Moreover, it greatly boosts the performance of Hawkes process-based models on social network reconstruction and helps to understand the spatiotemporal triggering dynamics over social media.In chapter 5, we focus on multivariate spatial point processes, which can describe heterotopic data over space. However, highly multivariate intensities are computationally challenging due to the curse of dimensionality. To bridge this gap, we introduce a declustering-based hidden-variable model that leads to an efficient inference via a variational autoencoder (VAE). We also prove that this model is a generalization of the VAE-based model for collaborative filtering. This leads to an interesting application of spatial point-process models to recommender systems. Experimental results show the method's utility on both synthetic data and real-world data. Finally, in chapter 6, we show how multivariate point processes can be applied to opioid overdose events and real-time prediction of the hourly crime rate. In chapter 7, we discuss future directions and conclude the thesis

eScholarship - University of California

Empowering users to communicate their preferences to machine learning models in Visual Analytics

Author: Das Subhajit
Publication venue: Georgia Institute of Technology
Publication date: 10/06/2021
Field of study

Recent visual analytic (VA) systems rely on machine learning (ML) to allow users to perform a variety of data analytic tasks, e.g., biologists clustering genome samples, medical practitioners predicting the diagnosis for a new patient, ML practitioners tuning models' hyperparameter settings, etc. These VA systems support interactive construction of models to people (I call them power users) with a diverse set of expertise in ML; from non-experts, to intermediates, to expert ML users. Through my research, I designed and developed VA systems for power users empowering them to communicate their preferences to interactively construct machine learning models for their analytical tasks. In this process, I design algorithms to incorporate user interaction data in machine learning modeling pipelines. Specifically, I deployed and tested (e.g., task completion times, user satisfaction ratings, success rate in finding user-preferred models, model accuracies) two main interaction techniques, multi-model steering, and interactive objective functions to facilitate specification of user goals and objectives to underlying model(s) in VA. However, designing these VA systems for power users poses various challenges, such as addressing diversity in user expertise, metric selection, user modeling to automatically infer preferences, evaluating the success of these systems, etc. Through this work I contribute a set of VA systems that support interactive construction and selection of supervised and unsupervised models using tabular data. In addition, I also present results/findings from a design study of interactive ML in a specific domain with real users and real data.Ph.D

Scholarly Materials And Research @ Georgia Tech