Search CORE

19,284 research outputs found

Topic Similarity Networks: Visual Analytics for Large Document Sets

Author: Maiya Arun S.
Rolfe Robert M.
Publication venue
Publication date: 26/09/2014
Field of study

We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to "tease out" non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData 2014

arXiv.org e-Print Archive

Crossref

A network approach for managing and processing big cancer data in clouds

Author: D Hanahan
Dimitrios Tsoumakos
EM Zdobnov
L Wang
L Wang
L Wang
M Lawrence
Moustafa Ghanem
R Chen
RA Weinberg
Wei Jie
Wei Xing
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2015
Field of study

Translational cancer research requires integrative analysis of multiple levels of big cancer data to identify and treat cancer. In order to address the issues that data is decentralised, growing and continually being updated, and the content living or archiving on different information sources partially overlaps creating redundancies as well as contradictions and inconsistencies, we develop a data network model and technology for constructing and managing big cancer data. To support our data network approach for data process and analysis, we employ a semantic content network approach and adopt the CELAR cloud platform. The prototype implementation shows that the CELAR cloud can satisfy the on-demanding needs of various data resources for management and process of big cancer data

Crossref

UWL Repository

Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU

Author: Buolamwini Joy
Caruana Rich
Ghassemi Marzyeh
Ngufor C.
Shankar Shreya
Wang Xiang
Xu Kelvin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/06/2018
Field of study

Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups. In this work, we present a two-step framework to 1) learn relevant patient subgroups, and 2) predict an outcome for separate patient populations in a multi-task framework, where each population is a separate task. We demonstrate how to discover relevant groups in an unsupervised way with a sequence-to-sequence autoencoder. We show that using these groups in a multi-task framework leads to better predictive performance of in-hospital mortality both across groups and overall. We also highlight the need for more granular evaluation of performance when dealing with heterogeneous populations.Comment: KDD 201

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Analytical Challenges in Modern Tax Administration: A Brief History of Analytics at the IRS

Author: Butler Jeff
Publication venue: Ohio State University. Moritz College of Law
Publication date: 01/01/2020
Field of study

KnowledgeBank at OSU

Learning from Multi-View Multi-Way Data via Structural Factorization Machines

Author: Cao Bokai
Ding Hao
He Lifang
Lu Chun-Ta
Yu Philip S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Real-world relations among entities can often be observed and determined by different perspectives/views. For example, the decision made by a user on whether to adopt an item relies on multiple aspects such as the contextual information of the decision, the item's attributes, the user's profile and the reviews given by other users. Different views may exhibit multi-way interactions among entities and provide complementary information. In this paper, we introduce a multi-tensor-based approach that can preserve the underlying structure of multi-view data in a generic predictive model. Specifically, we propose structural factorization machines (SFMs) that learn the common latent spaces shared by multi-view tensors and automatically adjust the importance of each view in the predictive model. Furthermore, the complexity of SFMs is linear in the number of parameters, which make SFMs suitable to large-scale problems. Extensive experiments on real-world datasets demonstrate that the proposed SFMs outperform several state-of-the-art methods in terms of prediction accuracy and computational cost.Comment: 10 page

arXiv.org e-Print Archive

Crossref