136 research outputs found

    AN IMPROVED APPROXIMATION FOR ASSESSING THE STATISTICAL SIGNIFICANCE OF MOLECULAR SEQUENCE FEATURES

    Get PDF
    International audienceUsing random walk theory, we first establish explicitly the exact distribution of the maximal partial sum of a sequence of independent and identically distributed random variables. This result allows us to obtain a new approximation of the distribution of the local score of one sequence. This approximation improves the one given par Karlin et al., which can be deduced from this new formula. We obtain a more accurate asymptotic expression with additional terms. Examples of application are given

    Graph Mining under Linguistic Constraints to Explore Large Texts

    Get PDF
    https://www.cys.cic.ipn.mx/ojs/index.php/CyS/article/view/1529International audienceIn this paper, we propose an approach to explore large texts by highlighting coherent sub-parts. The exploration method relies on a graph representation of the text according to Hoey's linguistic model which allows the selection and the binding of adjacent and non-adjacent sentences. The main contribution of our work consists in proposing a method based on both Hoey's linguistic model and a special graph mining technique, called CoHoP mining, to extract coherent sub-parts of the graph representation of the text. We have conducted some experiments on several English texts showing the interest of the proposed approach

    SMM: An R Package for Estimation and Simulation of Discrete-time semi-Markov Models

    Get PDF
    International audienceSemi-Markov models, independently introduced by Lévy (1954), Smith (1955) and Takacs (1954), are a generalization of the well-known Markov models. For semi-Markov models, sojourn times can be arbitrarily distributed, while sojourn times of Markov models are constrained to be exponentially distributed (in continuous time) or geometrically distributed (in discrete time). The aim of this paper is to present the R package SMM, devoted to the simulation and estimation of discrete-time multi-state semi-Markov and Markov models. For the semi-Markov case we have considered: parametric and non-parametric estimation; with and without censoring at the beginning and/or at the end of sample paths; one or several independent sample paths. Several discrete-time distributions are considered for the parametric estimation of sojourn time distributions of semi-Markov chains: Uniform, Geometric, Poisson, Discrete Weibull and Binomial Negative

    Calcul de réseaux phrastiques pour l'analyse et la navigation textuelle

    Get PDF
    International audienceIn this paper, we present an automatic process based on lexical repetition introduced by Hoey. The application of that kind of approaches on large texts is difficult to do by hand. In the paper, we propose an automatic process to treat large texts. We have conducted some experiments on different kinds of texts (narrative, expositive) to show the benefits of the approach.Le travail présente une méthode de navigation dans les textes, fondée sur la répétition lexicale. La méthode choisie est celle développée par le linguiste Hoey. Son application manuelle à des textes de grandeur conséquente est problématique. Nous proposons dans cet article un processus automatique qui permet d'analyser selon cette méthode des textes de grande taille ; des expériences ont été menées appliquant le processus à différents types de textes (narratif, expositif) et montrant l'intérêt de l'approche

    Fouille de données pour la stylistique : cas des motifs séquentiels émergents

    Get PDF
    Editeurs : Anne Dister, Dominique Longrée, Gérald Purnelle.ISBN : 978-2-9601246-0-6.International audienceIn this paper, we study the use of data mining techniques for stylistic analysis, from a linguistic point of view, by considering emerging sequential patterns. First, we show that mining sequential patterns of words with gapconstraints gives new relevant linguistic patterns with respect to patterns built on state-of-the-art n-grams. Then, we investigate how sequential patterns of itemsets can provide more generic linguistic patterns. We validate our approach both from a quantitative and a linguistic point of view by conducting experiments on three corpora of various types of French texts (poetry, letters, and fiction, respectively). By considering more particularly poetic texts, we show that characteristic linguistic patterns can be identified using data mining techniques.Dans cet article, nous présentons une étude sur l'utilisation de méthodes de fouille de données pour l'analyse stylistique - d'un point de vue linguistique - en considérant des motifs séquentiels émergents. Nous montrons tout d'abord que la fouille de motifs séquentiels de mots en utilisant la contrainte gap permet d'obtenir de nouveaux patrons linguistiques pertinents par rapport aux patrons construits à partir de n-grammes. Nous étudions ensuite l'utilisation de motifs séquentiels d'itemsets pour produire des patrons linguistiques plus généraux. Nous validons notre approche d'un point de vue quantitatif et d'un point de vue linguistique, en réalisant des expérimentations sur trois corpus français correspondant à différents genres de texte (la poésie, les correspondances et les romans, respectivement). En considérant plus particulièrement les textes poétiques, nous montrons que les techniques de fouille de données employées permettent d'identifier des patrons linguistiques caractéristiques

    Modeling of nitric oxide emissions from temperate agricultural ecosystems.

    Get PDF
    48 p.Arable soils are a significant source of nitric oxide (NO), most of which is derived from nitrogen fertilizers. Precise estimates of NO emissions from these soils are thus essential to devise strategies to mitigate the impact of agriculture on tropospheric ozone regulation. This paper presents the implementation of a soil NO emissions submodel within the environmentally-orientated soil crop model, CERES-EGC. The submodel simulates the NO production via nitrification pathway, as modulated by soil environmental drivers. The resulting model was tested with data from 4 field experiments on wheat- and maize-cropped soils representative of two agricultural regions of France, and for three years encompassing various climatic conditions. Overall, the model gave correct predictions of NO emissions, but shortcomings arose from an inadequate vertical distribution of fertilizer N in the soil surface. Inclusion of a 2-cm thick topsoil layer in an 'micro-layer' version of CERES-EGC gave more realistic simulations of NO emissions and of the under-lying microbiological process. From a statistical point, both versions of the model achieved a similar fit to the experimental data, with respectively a MD and a RMSE ranging from 1.8 to 6.2 g N-NO ha−1 d−1, and from 22.8 to 25.2 g N-NO ha−1 d −1 across the 4 experiments. The cumulative NO losses represented 1 to 2% of NH+4 fertilizer applied for the maize crops, and about 1% for the wheat crops. The 'micro-layer' version may be used for spatialized inventories of biogenic NO emissions to point mitigation strategies and to improve air quality prediction in chemistry transport models

    Short-Term cost impact of compliance with clinical practice guidelines for initial sarcoma treatment

    Get PDF
    Background: The impact of compliance to clinical practice guidelines (CPG) on outcomes and/or costs of care has not been completely clarified.Objective: To estimate relationships between medical expenditures and compliance to CPG for initial sarcoma treatment.Research design: Selected cohorts of patients diagnosed with sarcoma in 2005 and 2006, and treated at the University hospital and/or the cancer centre of the Rhône-Alpes region, France (n=90). Main outcome measurements were: patient characteristics, compliance with CPG, health outcomes, and costs. Data were mainly extracted from patient records. The logarithm of treatment costs was modelled using linear and Tobit regressions.Results: Rates of compliance with CPG were 86%, 66%, 88%, 89%, and 95% for initial diagnosis, primary surgical excision, wide surgical excision, chemotherapy, and radiotherapy, respectively. Total average costs reached €24,439, with €1,784, €11,225, €10,360, and €1,016 for diagnosis, surgery (primary and wide surgical excisions), chemotherapy, and radiotherapy, respectively. Compliance of diagnosis with CPG decreased the cost of diagnosis, whereas compliance of primary surgical excision increased the cost of chemotherapy. Compliance of chemotherapy with CPG decreased the cost of radiotherapy.Conclusion: Since chemotherapy is one of the major cost drivers, these results support that compliance with guidelines increases medical care expenditures in short term.Oncology; Sarcoma; Cost; Clinical guidelines; Efficacy; Medical Practices; Government Policy; Regulation; Public Health

    Secretory IgA mediates retrotranscytosis of intact gliadin peptides via the transferrin receptor in celiac disease

    Get PDF
    Celiac disease (CD) is an enteropathy resulting from an abnormal immune response to gluten-derived peptides in genetically susceptible individuals. This immune response is initiated by intestinal transport of intact peptide 31-49 (p31-49) and 33-mer gliadin peptides through an unknown mechanism. We show that the transferrin receptor CD71 is responsible for apical to basal retrotranscytosis of gliadin peptides, a process during which p31-49 and 33-mer peptides are protected from degradation. In patients with active CD, CD71 is overexpressed in the intestinal epithelium and colocalizes with immunoglobulin (Ig) A. Intestinal transport of intact p31-49 and 33-mer peptides was blocked by polymeric and secretory IgA (SIgA) and by soluble CD71 receptors, pointing to a role of SIgA–gliadin complexes in this abnormal intestinal transport. This retrotranscytosis of SIgA–gliadin complexes may promote the entry of harmful gliadin peptides into the intestinal mucosa, thereby triggering an immune response and perpetuating intestinal inflammation. Our findings strongly implicate CD71 in the pathogenesis of CD
    • …
    corecore