939 research outputs found
Multilayer Networks
In most natural and engineered systems, a set of entities interact with each
other in complicated patterns that can encompass multiple types of
relationships, change in time, and include other types of complications. Such
systems include multiple subsystems and layers of connectivity, and it is
important to take such "multilayer" features into account to try to improve our
understanding of complex systems. Consequently, it is necessary to generalize
"traditional" network theory by developing (and validating) a framework and
associated tools to study multilayer systems in a comprehensive fashion. The
origins of such efforts date back several decades and arose in multiple
disciplines, and now the study of multilayer networks has become one of the
most important directions in network science. In this paper, we discuss the
history of multilayer networks (and related concepts) and review the exploding
body of work on such networks. To unify the disparate terminology in the large
body of recent work, we discuss a general framework for multilayer networks,
construct a dictionary of terminology to relate the numerous existing concepts
to each other, and provide a thorough discussion that compares, contrasts, and
translates between related notions such as multilayer networks, multiplex
networks, interdependent networks, networks of networks, and many others. We
also survey and discuss existing data sets that can be represented as
multilayer networks. We review attempts to generalize single-layer-network
diagnostics to multilayer networks. We also discuss the rapidly expanding
research on multilayer-network models and notions like community structure,
connected components, tensor decompositions, and various types of dynamical
processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure
Algorithms, applications and systems towards interpretable pattern mining from multi-aspect data
How do humans move around in the urban space and how do they differ when the city undergoes terrorist attacks? How do users behave in Massive Open Online courses~(MOOCs) and how do they differ if some of them achieve certificates while some of them not? What areas in the court elite players, such as Stephen Curry, LeBron James, like to make their shots in the course of the game? How can we uncover the hidden habits that govern our online purchases? Are there unspoken agendas in how different states pass legislation of certain kinds? At the heart of these seemingly unconnected puzzles is this same mystery of multi-aspect mining, i.g., how can we mine and interpret the hidden pattern from a dataset that simultaneously reveals the associations, or changes of the associations, among various aspects of the data (e.g., a shot could be described with three aspects, player, time of the game, and area in the court)? Solving this problem could open gates to a deep understanding of underlying mechanisms for many real-world phenomena. While much of the research in multi-aspect mining contribute broad scope of innovations in the mining part, interpretation of patterns from the perspective of users (or domain experts) is often overlooked. Questions like what do they require for patterns, how good are the patterns, or how to read them, have barely been addressed. Without efficient and effective ways of involving users in the process of multi-aspect mining, the results are likely to lead to something difficult for them to comprehend.
This dissertation proposes the M^3 framework, which consists of multiplex pattern discovery, multifaceted pattern evaluation, and multipurpose pattern presentation, to tackle the challenges of multi-aspect pattern discovery. Based on this framework, we develop algorithms, applications, and analytic systems to enable interpretable pattern discovery from multi-aspect data. Following the concept of meaningful multiplex pattern discovery, we propose PairFac to close the gap between human information needs and naive mining optimization. We demonstrate its effectiveness in the context of impact discovery in the aftermath of urban disasters. We develop iDisc to target the crossing of multiplex pattern discovery with multifaceted pattern evaluation. iDisc meets the specific information need in understanding multi-level, contrastive behavior patterns. As an example, we use iDisc to predict student performance outcomes in Massive Open Online Courses given users' latent behaviors. FacIt is an interactive visual analytic system that sits at the intersection of all three components and enables for interpretable, fine-tunable, and scrutinizable pattern discovery from multi-aspect data. We demonstrate each work's significance and implications in its respective problem context. As a whole, this series of studies is an effort to instantiate the M^3 framework and push the field of multi-aspect mining towards a more human-centric process in real-world applications
Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline
From medical charts to national census, healthcare has traditionally operated
under a paper-based paradigm. However, the past decade has marked a long and
arduous transformation bringing healthcare into the digital age. Ranging from
electronic health records, to digitized imaging and laboratory reports, to
public health datasets, today, healthcare now generates an incredible amount of
digital information. Such a wealth of data presents an exciting opportunity for
integrated machine learning solutions to address problems across multiple
facets of healthcare practice and administration. Unfortunately, the ability to
derive accurate and informative insights requires more than the ability to
execute machine learning models. Rather, a deeper understanding of the data on
which the models are run is imperative for their success. While a significant
effort has been undertaken to develop models able to process the volume of data
obtained during the analysis of millions of digitalized patient records, it is
important to remember that volume represents only one aspect of the data. In
fact, drawing on data from an increasingly diverse set of sources, healthcare
data presents an incredibly complex set of attributes that must be accounted
for throughout the machine learning pipeline. This chapter focuses on
highlighting such challenges, and is broken down into three distinct
components, each representing a phase of the pipeline. We begin with attributes
of the data accounted for during preprocessing, then move to considerations
during model building, and end with challenges to the interpretation of model
output. For each component, we present a discussion around data as it relates
to the healthcare domain and offer insight into the challenges each may impose
on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20
Pages, 1 Figur
Intelligent system for time series pattern identification and prediction
Mestrado em Gestão de Sistemas de InformaçãoOs crescentes volumes de dados representam uma fonte de informação potencialmente valiosa para as empresas, mas também implicam desafios nunca antes enfrentados. Apesar da sua complexidade intrínseca, as séries temporais são um tipo de dados notavelmente relevantes para o contexto empresarial, especialmente para tarefas preditivas. Os modelos Autorregressivos Integrados de Médias Móveis (ARIMA), têm sido a abordagem mais popular para tais tarefas, porém, não estão preparados para lidar com as cada vez mais comuns séries temporais de maior dimensão ou granularidade. Assim, novas tendências de investigação envolvem a aplicação de modelos orientados a dados, como Redes Neuronais Recorrentes (RNNs), à previsão. Dada a dificuldade da previsão de séries temporais e a necessidade de ferramentas aprimoradas, o objetivo deste projeto foi a implementação dos modelos clássicos ARIMA e as arquiteturas RNN mais proeminentes, de forma automática, e o posterior uso desses modelos como base para o desenvolvimento de um sistema modular capaz de apoiar o utilizador em todo o processo de previsão. Design science research foi a abordagem metodológica adotada para alcançar os objetivos propostos e envolveu, para além da identificação dos objetivos, uma revisão aprofundada da literatura que viria a servir de suporte teórico à etapa seguinte, designadamente a execução do projeto e findou com a avaliação meticulosa do artefacto produzido. No geral todos os objetivos propostos foram alcançados, sendo os principais contributos do projeto o próprio sistema desenvolvido devido à sua utilidade prática e ainda algumas evidências empíricas que apoiam a aplicabilidade das RNNs à previsão de séries temporais.The current growing volumes of data present a source of potentially valuable information for companies, but they also pose new challenges never faced before. Despite their intrinsic complexity, time series are a notably relevant kind of data in the entrepreneurial context, especially regarding prediction tasks. The Autoregressive Integrated Moving Average (ARIMA) models have been the most popular approach for such tasks, but they do not scale well to bigger and more granular time series which are becoming increasingly common. Hence, newer research trends involve the application of data-driven models, such as Recurrent Neural Networks (RNNs), to forecasting. Therefore, given the difficulty of time series prediction and the need for improved tools, the purpose of this project was to implement the classical ARIMA models and the most prominent RNN architectures in an automated fashion and posteriorly to use such models as foundation for the development of a modular system capable of supporting the common user along the entire forecasting process. Design science research was the adopted methodology to achieve the proposed goals and it comprised the activities of goal definition, followed by a thorough literature review aimed at providing the theoretical background necessary to the subsequent step that involved the actual project execution and, finally, the careful evaluation of the produced artifact. In general, each the established goals were accomplished, and the main contributions of the project were the developed system itself due to its practical usefulness along with some empirical evidence supporting the suitability of RNNs to time series forecasting.info:eu-repo/semantics/publishedVersio
Fairness-enhancing deep learning for ride-hailing demand prediction
Short-term demand forecasting for on-demand ride-hailing services is one of
the fundamental issues in intelligent transportation systems. However, previous
travel demand forecasting research predominantly focused on improving
prediction accuracy, ignoring fairness issues such as systematic
underestimations of travel demand in disadvantaged neighborhoods. This study
investigates how to measure, evaluate, and enhance prediction fairness between
disadvantaged and privileged communities in spatial-temporal demand forecasting
of ride-hailing services. A two-pronged approach is taken to reduce the demand
prediction bias. First, we develop a novel deep learning model architecture,
named socially aware neural network (SA-Net), to integrate the
socio-demographics and ridership information for fair demand prediction through
an innovative socially-aware convolution operation. Second, we propose a
bias-mitigation regularization method to mitigate the mean percentage
prediction error gap between different groups. The experimental results,
validated on the real-world Chicago Transportation Network Company (TNC) data,
show that the de-biasing SA-Net can achieve better predictive performance in
both prediction accuracy and fairness. Specifically, the SA-Net improves
prediction accuracy for both the disadvantaged and privileged groups compared
with the state-of-the-art models. When coupled with the bias mitigation
regularization method, the de-biasing SA-Net effectively bridges the mean
percentage prediction error gap between the disadvantaged and privileged
groups, and also protects the disadvantaged regions against systematic
underestimation of TNC demand. Our proposed de-biasing method can be adopted in
many existing short-term travel demand estimation models, and can be utilized
for various other spatial-temporal prediction tasks such as crime incidents
predictions
Relation Prediction over Biomedical Knowledge Bases for Drug Repositioning
Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task using existing biomedical knowledge bases. Our approaches can be broadly labeled as link prediction or knowledge base completion in computer science literature. Specifically, first we investigate the predictive power of graph paths connecting entities in the publicly available biomedical knowledge base, SemMedDB (the entities and relations constitute a large knowledge graph as a whole). To that end, we build logistic regression models utilizing semantic graph pattern features extracted from the SemMedDB to predict treatment and causative relations in Unified Medical Language System (UMLS) Metathesaurus. Second, we study matrix and tensor factorization algorithms for predicting drug repositioning pairs in repoDB, a general purpose gold standard database of approved and failed drug–disease indications. The idea here is to predict repoDB pairs by approximating the given input matrix/tensor structure where the value of a cell represents the existence of a relation coming from SemMedDB and UMLS knowledge bases. The essential goal is to predict the test pairs that have a blank cell in the input matrix/tensor based on the shared biomedical context among existing non-blank cells. Our final approach involves graph convolutional neural networks where entities and relation types are embedded in a vector space involving neighborhood information. Basically, we minimize an objective function to guide our model to concept/relation embeddings such that distance scores for positive relation pairs are lower than those for the negative ones. Overall, our results demonstrate that recent link prediction methods applied to automatically curated, and hence imprecise, knowledge bases can nevertheless result in high accuracy drug candidate prediction with appropriate configuration of both the methods and datasets used
Towards Efficient Lifelong Machine Learning in Deep Neural Networks
Humans continually learn and adapt to new knowledge and environments throughout their lifetimes. Rarely does learning new information cause humans to catastrophically forget previous knowledge. While deep neural networks (DNNs) now rival human performance on several supervised machine perception tasks, when updated on changing data distributions, they catastrophically forget previous knowledge. Enabling DNNs to learn new information over time opens the door for new applications such as self-driving cars that adapt to seasonal changes or smartphones that adapt to changing user preferences. In this dissertation, we propose new methods and experimental paradigms for efficiently training continual DNNs without forgetting. We then apply these methods to several visual and multi-modal perception tasks including image classification, visual question answering, analogical reasoning, and attribute and relationship prediction in visual scenes
Dagstuhl News January - December 2011
"Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic
- …