66 research outputs found
Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
We explore the trade-offs of performing linear algebra using Apache Spark,
compared to traditional C and MPI implementations on HPC platforms. Spark is
designed for data analytics on cluster computing platforms with access to local
disks and is optimized for data-parallel tasks. We examine three widely-used
and important matrix factorizations: NMF (for physical plausability), PCA (for
its ubiquity) and CX (for data interpretability). We apply these methods to
TB-sized problems in particle physics, climate modeling and bioimaging. The
data matrices are tall-and-skinny which enable the algorithms to map
conveniently into Spark's data-parallel model. We perform scaling experiments
on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide
tuning guidance to obtain high performance
Advances in knowledge discovery and data mining Part II
19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
Proceedings, MSVSCC 2014
Proceedings of the 8th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 17, 2014 at VMASC in Suffolk, Virginia
Deep Model for Improved Operator Function State Assessment
A deep learning framework is presented for engagement assessment using EEG signals. Deep learning is a recently developed machine learning technique and has been applied to many applications. In this paper, we proposed a deep learning strategy for operator function state (OFS) assessment. Fifteen pilots participated in a flight simulation from Seattle to Chicago. During the four-hour simulation, EEG signals were recorded for each pilot. We labeled 20- minute data as engaged and disengaged to fine-tune the deep network and utilized the remaining vast amount of unlabeled data to initialize the network. The trained deep network was then used to assess if a pilot was engaged during the four-hour simulation
Recommended from our members
Large-scale and Deep Spatiotemporal Point-Process Models
Many accurate spatiotemporal data sets have recently become available for research. Real-world applications create strong demands for a better multivariate point-process modeling. In this thesis, we develop new multivariate models with generalization ability and scalability. The first two chapters provide a research background, real-world problems and a mathematical introduction to point-process models. In chapter 3, we develop a nonparametric method for multivariate spatiotemporal Hawkes processes with applications on network reconstruction. In contrast to prior work, which has often focused on exclusively temporal information, our approach uses spatiotemporal information and does not assume a specific parametric form. Our results demonstrate that, in comparison to using only temporal data, our approach yields improved network reconstruction, providing a basis for meaningful subsequent analysis---such as examinations of community structure and motifs---of the reconstructed networks. In chapter 4, we present a fast and accurate estimation method for multivariate Hawkes processes. Our method, with guaranteed consistency, combines two estimation approaches. Extensive numerical experiments, with synthetic data and real-world social network data, show that our method improves the accuracy, scalability and computational efficiency of prevailing estimation approaches. Moreover, it greatly boosts the performance of Hawkes process-based models on social network reconstruction and helps to understand the spatiotemporal triggering dynamics over social media.In chapter 5, we focus on multivariate spatial point processes, which can describe heterotopic data over space. However, highly multivariate intensities are computationally challenging due to the curse of dimensionality. To bridge this gap, we introduce a declustering-based hidden-variable model that leads to an efficient inference via a variational autoencoder (VAE). We also prove that this model is a generalization of the VAE-based model for collaborative filtering. This leads to an interesting application of spatial point-process models to recommender systems. Experimental results show the method's utility on both synthetic data and real-world data. Finally, in chapter 6, we show how multivariate point processes can be applied to opioid overdose events and real-time prediction of the hourly crime rate. In chapter 7, we discuss future directions and conclude the thesis
Empowering users to communicate their preferences to machine learning models in Visual Analytics
Recent visual analytic (VA) systems rely on machine learning (ML) to allow users to perform a variety of data analytic tasks, e.g., biologists clustering genome samples, medical practitioners predicting the diagnosis for a new patient, ML practitioners tuning models' hyperparameter settings, etc. These VA systems support interactive construction of models to people (I call them power users) with a diverse set of expertise in ML; from non-experts, to intermediates, to expert ML users. Through my research, I designed and developed VA systems for power users empowering them to communicate their preferences to interactively construct machine learning models for their analytical tasks. In this process, I design algorithms to incorporate user interaction data in machine learning modeling pipelines. Specifically, I deployed and tested (e.g., task completion times, user satisfaction ratings, success rate in finding user-preferred models, model accuracies) two main interaction techniques, multi-model steering, and interactive objective functions to facilitate specification of user goals and objectives to underlying model(s) in VA. However, designing these VA systems for power users poses various challenges, such as addressing diversity in user expertise, metric selection, user modeling to automatically infer preferences, evaluating the success of these systems, etc. Through this work I contribute a set of VA systems that support interactive construction and selection of supervised and unsupervised models using tabular data. In addition, I also present results/findings from a design study of interactive ML in a specific domain with real users and real data.Ph.D
- …