176 research outputs found
Impact of imperfect test sensitivity on determining risk factors : the case of bovine tuberculosis
Background
Imperfect diagnostic testing reduces the power to detect significant predictors in classical cross-sectional studies. Assuming that the misclassification in diagnosis is random this can be dealt with by increasing the sample size of a study. However, the effects of imperfect tests in longitudinal data analyses are not as straightforward to anticipate, especially if the outcome of the test influences behaviour. The aim of this paper is to investigate the impact of imperfect test sensitivity on the determination of predictor variables in a longitudinal study.
Methodology/Principal Findings
To deal with imperfect test sensitivity affecting the response variable, we transformed the observed response variable into a set of possible temporal patterns of true disease status, whose prior probability was a function of the test sensitivity. We fitted a Bayesian discrete time survival model using an MCMC algorithm that treats the true response patterns as unknown parameters in the model. We applied our approach to epidemiological data of bovine tuberculosis outbreaks in England and investigated the effect of reduced test sensitivity in the determination of risk factors for the disease. We found that reduced test sensitivity led to changes to the collection of risk factors associated with the probability of an outbreak that were chosen in the ‘best’ model and to an increase in the uncertainty surrounding the parameter estimates for a model with a fixed set of risk factors that were associated with the response variable.
Conclusions/Significance
We propose a novel algorithm to fit discrete survival models for longitudinal data where values of the response variable are uncertain. When analysing longitudinal data, uncertainty surrounding the response variable will affect the significance of the predictors and should therefore be accounted for either at the design stage by increasing the sample size or at the post analysis stage by conducting appropriate sensitivity analyses
Transfer Learning for Content-Based Recommender Systems using Tree Matching
In this paper we present a new approach to content-based transfer learning
for solving the data sparsity problem in cases when the users' preferences in
the target domain are either scarce or unavailable, but the necessary
information on the preferences exists in another domain. We show that training
a system to use such information across domains can produce better performance.
Specifically, we represent users' behavior patterns based on topological graph
structures. Each behavior pattern represents the behavior of a set of users,
when the users' behavior is defined as the items they rated and the items'
rating values. In the next step we find a correlation between behavior patterns
in the source domain and behavior patterns in the target domain. This mapping
is considered a bridge between the two domains. Based on the correlation and
content-attributes of the items, we train a machine learning model to predict
users' ratings in the target domain. When we compare our approach to the
popularity approach and KNN-cross-domain on a real world dataset, the results
show that on an average of 83 of the cases our approach outperforms both
methods
Connections between Classical and Parametric Network Entropies
This paper explores relationships between classical and parametric measures of graph (or network) complexity. Classical measures are based on vertex decompositions induced by equivalence relations. Parametric measures, on the other hand, are constructed by using information functions to assign probabilities to the vertices. The inequalities established in this paper relating classical and parametric measures lay a foundation for systematic classification of entropy-based measures of graph complexity
Integrative Network Biology: Graph Prototyping for Co-Expression Cancer Networks
Network-based analysis has been proven useful in biologically-oriented areas, e.g., to explore the dynamics and complexity of biological networks. Investigating a set of networks allows deriving general knowledge about the underlying topological and functional properties. The integrative analysis of networks typically combines networks from different studies that investigate the same or similar research questions. In order to perform an integrative analysis it is often necessary to compare the properties of matching edges across the data set. This identification of common edges is often burdensome and computational intensive. Here, we present an approach that is different from inferring a new network based on common features. Instead, we select one network as a graph prototype, which then represents a set of comparable network objects, as it has the least average distance to all other networks in the same set. We demonstrate the usefulness of the graph prototyping approach on a set of prostate cancer networks and a set of corresponding benign networks. We further show that the distances within the cancer group and the benign group are statistically different depending on the utilized distance measure
Predicting Cell Cycle Regulated Genes by Causal Interactions
The fundamental difference between classic and modern biology is that technological innovations allow to generate high-throughput data to get insights into molecular interactions on a genomic scale. These high-throughput data can be used to infer gene networks, e.g., the transcriptional regulatory or signaling network, representing a blue print of the current dynamical state of the cellular system. However, gene networks do not provide direct answers to biological questions, instead, they need to be analyzed to reveal functional information of molecular working mechanisms. In this paper we propose a new approach to analyze the transcriptional regulatory network of yeast to predict cell cycle regulated genes. The novelty of our approach is that, in contrast to all other approaches aiming to predict cell cycle regulated genes, we do not use time series data but base our analysis on the prior information of causal interactions among genes. The major purpose of the present paper is to predict cell cycle regulated genes in S. cerevisiae. Our analysis is based on the transcriptional regulatory network, representing causal interactions between genes, and a list of known periodic genes. No further data are used. Our approach utilizes the causal membership of genes and the hierarchical organization of the transcriptional regulatory network leading to two groups of periodic genes with a well defined direction of information flow. We predict genes as periodic if they appear on unique shortest paths connecting two periodic genes from different hierarchy levels. Our results demonstrate that a classical problem as the prediction of cell cycle regulated genes can be seen in a new light if the concept of a causal membership of a gene is applied consequently. This also shows that there is a wealth of information buried in the transcriptional regulatory network whose unraveling may require more elaborate concepts than it might seem at first
Hierarchical coordination of periodic genes in the cell cycle of Saccharomyces cerevisiae
<p>Abstract</p> <p>Background</p> <p>Gene networks are a representation of molecular interactions among genes or products thereof and, hence, are forming causal networks. Despite intense studies during the last years most investigations focus so far on inferential methods to reconstruct gene networks from experimental data or on their structural properties, e.g., degree distributions. Their structural analysis to gain functional insights into organizational principles of, e.g., pathways remains so far under appreciated.</p> <p>Results</p> <p>In the present paper we analyze cell cycle regulated genes in <it>S. cerevisiae</it>. Our analysis is based on the transcriptional regulatory network, representing causal interactions and not just associations or correlations between genes, and a list of known periodic genes. No further data are used. Partitioning the transcriptional regulatory network according to a graph theoretical property leads to a hierarchy in the network and, hence, in the information flow allowing to identify two groups of periodic genes. This reveals a novel conceptual interpretation of the working mechanism of the cell cycle and the genes regulated by this pathway.</p> <p>Conclusion</p> <p>Aside from the obtained results for the cell cycle of yeast our approach could be exemplary for the analysis of general pathways by exploiting the rich causal structure of inferred and/or curated gene networks including protein or signaling networks.</p
Inferring the conservative causal core of gene regulatory networks
<p>Abstract</p> <p>Background</p> <p>Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically.</p> <p>Results</p> <p>In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from <it>E. coli </it>that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently.</p> <p>Conclusions</p> <p>For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.</p
Exploring Statistical and Population Aspects of Network Complexity
The characterization and the definition of the complexity of objects is an important but very difficult problem that attracted much interest in many different fields. In this paper we introduce a new measure, called network diversity score (NDS), which allows us to quantify structural properties of networks. We demonstrate numerically that our diversity score is capable of distinguishing ordered, random and complex networks from each other and, hence, allowing us to categorize networks with respect to their structural complexity. We study 16 additional network complexity measures and find that none of these measures has similar good categorization capabilities. In contrast to many other measures suggested so far aiming for a characterization of the structural complexity of networks, our score is different for a variety of reasons. First, our score is multiplicatively composed of four individual scores, each assessing different structural properties of a network. That means our composite score reflects the structural diversity of a network. Second, our score is defined for a population of networks instead of individual networks. We will show that this removes an unwanted ambiguity, inherently present in measures that are based on single networks. In order to apply our measure practically, we provide a statistical estimator for the diversity score, which is based on a finite number of samples
- …