46,652 research outputs found
Numeric Input Relations for Relational Learning with Applications to Community Structure Analysis
Most work in the area of statistical relational learning (SRL) is focussed on
discrete data, even though a few approaches for hybrid SRL models have been
proposed that combine numerical and discrete variables. In this paper we
distinguish numerical random variables for which a probability distribution is
defined by the model from numerical input variables that are only used for
conditioning the distribution of discrete response variables. We show how
numerical input relations can very easily be used in the Relational Bayesian
Network framework, and that existing inference and learning methods need only
minor adjustments to be applied in this generalized setting. The resulting
framework provides natural relational extensions of classical probabilistic
models for categorical data. We demonstrate the usefulness of RBN models with
numeric input relations by several examples.
In particular, we use the augmented RBN framework to define probabilistic
models for multi-relational (social) networks in which the probability of a
link between two nodes depends on numeric latent feature vectors associated
with the nodes. A generic learning procedure can be used to obtain a
maximum-likelihood fit of model parameters and latent feature values for a
variety of models that can be expressed in the high-level RBN representation.
Specifically, we propose a model that allows us to interpret learned latent
feature values as community centrality degrees by which we can identify nodes
that are central for one community, that are hubs between communities, or that
are isolated nodes. In a multi-relational setting, the model also provides a
characterization of how different relations are associated with each community
MildInt: Deep Learning-Based Multimodal Longitudinal Data Integration Framework
As large amounts of heterogeneous biomedical data become available, numerous methods for integrating such datasets have been developed to extract complementary knowledge from multiple domains of sources. Recently, a deep learning approach has shown promising results in a variety of research areas. However, applying the deep learning approach requires expertise for constructing a deep architecture that can take multimodal longitudinal data. Thus, in this paper, a deep learning-based python package for data integration is developed. The python package deep learning-based multimodal longitudinal data integration framework (MildInt) provides the preconstructed deep learning architecture for a classification task. MildInt contains two learning phases: learning feature representation from each modality of data and training a classifier for the final decision. Adopting deep architecture in the first phase leads to learning more task-relevant feature representation than a linear model. In the second phase, linear regression classifier is used for detecting and investigating biomarkers from multimodal data. Thus, by combining the linear model and the deep learning model, higher accuracy and better interpretability can be achieved. We validated the performance of our package using simulation data and real data. For the real data, as a pilot study, we used clinical and multimodal neuroimaging datasets in Alzheimer's disease to predict the disease progression. MildInt is capable of integrating multiple forms of numerical data including time series and non-time series data for extracting complementary features from the multimodal dataset
Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning
In biomedical research, many different types of patient data can be
collected, such as various types of omics data and medical imaging modalities.
Applying multi-view learning to these different sources of information can
increase the accuracy of medical classification models compared with
single-view procedures. However, collecting biomedical data can be expensive
and/or burdening for patients, so that it is important to reduce the amount of
required data collection. It is therefore necessary to develop multi-view
learning methods which can accurately identify those views that are most
important for prediction. In recent years, several biomedical studies have used
an approach known as multi-view stacking (MVS), where a model is trained on
each view separately and the resulting predictions are combined through
stacking. In these studies, MVS has been shown to increase classification
accuracy. However, the MVS framework can also be used for selecting a subset of
important views. To study the view selection potential of MVS, we develop a
special case called stacked penalized logistic regression (StaPLR). Compared
with existing view-selection methods, StaPLR can make use of faster
optimization algorithms and is easily parallelized. We show that nonnegativity
constraints on the parameters of the function which combines the views play an
important role in preventing unimportant views from entering the model. We
investigate the performance of StaPLR through simulations, and consider two
real data examples. We compare the performance of StaPLR with an existing view
selection method called the group lasso and observe that, in terms of view
selection, StaPLR is often more conservative and has a consistently lower false
positive rate.Comment: 26 pages, 9 figures. Accepted manuscrip
Data-driven design of intelligent wireless networks: an overview and tutorial
Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves
- …