409 research outputs found

    Dynamics of gene expression and the regulatory inference problem

    Full text link
    From the response to external stimuli to cell division and death, the dynamics of living cells is based on the expression of specific genes at specific times. The decision when to express a gene is implemented by the binding and unbinding of transcription factor molecules to regulatory DNA. Here, we construct stochastic models of gene expression dynamics and test them on experimental time-series data of messenger-RNA concentrations. The models are used to infer biophysical parameters of gene transcription, including the statistics of transcription factor-DNA binding and the target genes controlled by a given transcription factor.Comment: revised version to appear in Europhys. Lett., new titl

    Inferring dynamic genetic networks with low order independencies

    Full text link
    In this paper, we propose a novel inference method for dynamic genetic networks which makes it possible to face with a number of time measurements n much smaller than the number of genes p. The approach is based on the concept of low order conditional dependence graph that we extend here in the case of Dynamic Bayesian Networks. Most of our results are based on the theory of graphical models associated with the Directed Acyclic Graphs (DAGs). In this way, we define a minimal DAG G which describes exactly the full order conditional dependencies given the past of the process. Then, to face with the large p and small n estimation case, we propose to approximate DAG G by considering low order conditional independencies. We introduce partial qth order conditional dependence DAGs G(q) and analyze their probabilistic properties. In general, DAGs G(q) differ from DAG G but still reflect relevant dependence facts for sparse networks such as genetic networks. By using this approximation, we set out a non-bayesian inference method and demonstrate the effectiveness of this approach on both simulated and real data analysis. The inference procedure is implemented in the R package 'G1DBN' freely available from the CRAN archive

    Discovering new kinds of patient safety incidents

    No full text
    Every year, large numbers of patients in National Health Service (NHS) care suffer because of a patient safety incident. The National Patient Safety Agency (NPSA) collects large amounts of data describing individual incidents. As well as being described by categorical and numerical variables, each incident is described using free text. The aim of the work was to find quite small groups of similar incidents, which were of types that were previously unknown to the NPSA. A model of the text was produced, such that the position of each incident reflected its meaning to the greatest extent possible. The basic model was the vector space model. Dimensionality reduction was carried out in two stages: unsupervised dimensionality reduction was carried out using principal component analysis, and supervised dimensionality reduction using linear discriminant analysis. It was then possible to look for groups of incidents that were more tightly packed than would be expected given the overall distribution of the incidents. The process for assessing these groups had three stages. Firstly, a quantitative measure was used, allowing a large number of parameter combinations to be examined. The groups found for an ‘optimum’ parameter combination were then divided into categories using a qualitative filtering method. Finally, clinical experts assessed the groups qualitatively. The transition probabilities model was also examined: this model was based on the empirical probabilities that two word sequences were seen in the text. An alternative method for dimensionality reduction was to use information about the subjective meaning of a small sample of incidents elicited from experts, producing a mapping between high and low dimensional models of the text. The analysis also included the direct use of the categorical variables to model the incidents, and empirical analysis of the behaviour of high dimensional spaces

    Identifying Asset Poverty Thresholds New methods with an application to Pakistan and Ethiopia

    Get PDF
    Understanding how households escape poverty depends on understanding how they accumulate assets over time. Therefore, identifying the degree of linearity in household asset dynamics, and specifically any potential asset poverty thresholds, is of fundamental interest to the design of poverty reduction policies. If household asset holdings converged unconditionally to a single long run equilibrium, then all poor could be expected to escape poverty over time. In contrast, if there are critical asset thresholds that trap households below the poverty line, then households would need specific assistance to escape poverty. Similarly, the presence of asset poverty thresholds would mean that short term asset shocks could lead to long term destitution, thus highlighting the need for social safety nets. In addition to the direct policy relevance, identifying household asset dynamics and potential asset thresholds presents an interesting methodological challenge to researchers. Potential asset poverty thresholds can only be identified in a framework that allows multiple dynamic equilibria. Any unstable equilibrium points would indicate a potential poverty threshold, above which households are expected to accumulate further and below which households are on a trajectory that makes them poorer over time. The key empirical issue addressed in the paper is whether such threshold points exist in Pakistan and Ethiopia and, if so, where they are located. Methodologically, the paper explores what econometric technique is best suited for this type of analysis. The paper contributes to the small current literature on modeling nonlinear household welfare dynamics in three ways. First, it compares previously used techniques for identifying asset poverty traps by applying them to the same dataset, and examines whether, and how, the choice of estimation technique affects the result. Second, it explores whether other estimation techniques may be more suitable to locate poverty thresholds. Third, it adds the first study for a South Asian country and makes a comparison with Ethiopia. Household assets are combined into a single asset index using two techniques: factor analysis and regression. These indices are used to estimate asset dynamics and locate dynamic asset equilibria, first by nonparametric methods including LOWESS, kernel weighted local regression and spline smoothers, and then by global polynomial parametric techniques. To combine the advantages of nonparametric and parametric techniques - a flexible functional form and the ability to control for covariates, respectively - the paper adapts a mixed model representation of a penalized spline to estimate asset dynamics through a semiparametric partially linear model. This paper identifies a single dynamic asset equilibrium with a slightly concave dynamic asset accumulation path in each country. There is no evidence for multiple dynamic equilibria. This result is robust across econometric methods and across different ways of constructing the asset index. The concave accumulation path means that poorer households recover more slowly from asset shocks. Concavity also implies that greater initial equality of assets would lead to higher growth. Moreover, the dynamic asset equilibria are very low. In Pakistan it is below the average asset holdings of the poor households in the sample. In Ethiopia, the equilibrium is barely above the very low mean. This, together with the slow speed of asset accumulation for the poorest households, suggests that convergence towards the long run equilibrium may be slow and insufficient for rural households in Pakistan and Ethiopia to escape poverty.Poverty dynamics, Semiparametric Estimation, Penalized Splines, Pakistan, Ethiopia, Consumer/Household Economics, I32, C14, O12,

    Probabilistic Dialogue Models with Prior Domain Knowledge

    Get PDF
    Probabilistic models such as Bayesian Networks are now in widespread use in spoken dialogue systems, but their scalability to complex interaction domains remains a challenge. One central limitation is that the state space of such models grows exponentially with the problem size, which makes parameter estimation increasingly difficult, especially for domains where only limited training data is available. In this paper, we show how to capture the underlying structure of a dialogue domain in terms of probabilistic rules operating on the dialogue state. The probabilistic rules are associated with a small, compact set of parameters that can be directly estimated from data. We argue that the introduction of this abstraction mechanism yields probabilistic models that are easier to learn and generalise better than their unstructured counterparts. We empirically demonstrate the benefits of such an approach learning a dialogue policy for a human-robot interaction domain based on a Wizard-of-Oz data set. Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 179–188, Seoul, South Korea, 5-6 July 2012

    EVALUATING DISTRIBUTED WORD REPRESENTATIONS FOR PREDICTING MISSING WORDS IN SENTENCES

    Full text link
    In recent years, the distributed representation of words in vector space or word embeddings have become very popular as they have shown significant improvements in many statistical natural language processing (NLP) tasks as compared to traditional language models like Ngram. In this thesis, we explored various state-of-the-art methods like Latent Semantic Analysis, word2vec, and GloVe to learn the distributed representation of words. Their performance was compared based on the accuracy achieved when tasked with selecting the right missing word in the sentence, given five possible options. For this NLP task we trained each of these methods using a training corpus that contained texts of around five hundred 19th century novels from Project Gutenberg. The test set contained 1040 sentences where one word was missing from each sentence. The training and test set were part of the Microsoft Research Sentence Completion Challenge data set. In this work, word vectors obtained by training skip-gram model of word2vec showed the highest accuracy in finding the missing word in the sentences among all the methods tested. We also found that tuning hyperparameters of the models helped in capturing greater syntactic and semantic regularities among words

    Non-metric similarity search of tandem mass spectra including posttranslational modifications

    Get PDF
    AbstractIn biological applications, the tandem mass spectrometry is a widely used method for determining protein and peptide sequences from an “in vitro” sample. The sequences are not determined directly, but they must be interpreted from the mass spectra, which is the output of the mass spectrometer. This work is focused on a similarity-search approach to mass spectra interpretation, where the parameterized Hausdorff distance (dHP) is used as the similarity. In order to provide an efficient similarity search under dHP, the metric access methods and the TriGen algorithm (controlling the metricity of dHP) are employed. Moreover, the search model based on the dHP supports posttranslational modifications (PTMs) in the query mass spectra, what is typically a problem when an indexing approach is used. Our approach can be utilized as a coarse filter by any other database approach for mass spectra interpretation
    corecore