33 research outputs found

    Quelques modèles de langage statistiques et graphiques lissés avec WordNet

    Get PDF
    Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal

    Towards probabilistic decision support in public health practice: Predicting recent transmission of tuberculosis from patient attributes

    Get PDF
    AbstractObjectiveInvestigating the contacts of a newly diagnosed tuberculosis (TB) case to prevent TB transmission is a core public health activity. In the context of limited resources, it is often necessary to prioritize investigation when multiple cases are reported. Public health personnel currently prioritize contact investigation intuitively based on past experience. Decision-support software using patient attributes to predict the probability of a TB case being involved in recent transmission could aid in this prioritization, but a prediction model is needed to drive such software.MethodsWe developed a logistic regression model using the clinical and demographic information of TB cases reported to Montreal Public Health between 1997 and 2007. The reference standard for transmission was DNA fingerprint analysis. We measured the predictive performance, in terms of sensitivity, specificity, negative predictive value, positive predictive value, the Receiver Operating Characteristic (ROC) curve and the Area Under the ROC (AUC).ResultsAmong 1552 TB cases enrolled in the study, 314 (20.2%) were involved in recent transmission. The AUC of the model was 0.65 (95% confidence interval: 0.61–0.68), which is significantly better than random prediction. The maximized values of sensitivity and specificity on the ROC were 0.53 and 0.67, respectively.ConclusionsThe characteristics of a TB patient reported to public health can be used to predict whether the newly diagnosed case is associated with recent transmission as opposed to reactivation of latent infection

    A Neural Probabilistic Language Model

    No full text
    A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts

    A Neural Probabilistic Language Model

    No full text
    A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: aword sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts

    bird-house/birdhouse-deploy: Update Zenodo config

    No full text
    What's Changed <ul> <li>LICENCE: update copyright line with year and ownership by @tlvu in <a href="https://github.com/bird-house/birdhouse-deploy/pull/326">https://github.com/bird-house/birdhouse-deploy/pull/326</a></li> <li>Update .zenodo.json by @huard in <a href="https://github.com/bird-house/birdhouse-deploy/pull/327">https://github.com/bird-house/birdhouse-deploy/pull/327</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/bird-house/birdhouse-deploy/compare/1.26.1...1.26.2">https://github.com/bird-house/birdhouse-deploy/compare/1.26.1...1.26.2</a></p&gt

    PypeTree: A Tool for Reconstructing Tree Perennial Tissues from Point Clouds

    No full text
    The reconstruction of trees from point clouds that were acquired with terrestrial LiDAR scanning (TLS) may become a significant breakthrough in the study and modelling of tree development. Here, we develop an efficient method and a tool based on extensive modifications to the skeletal extraction method that was first introduced by Verroust and Lazarus in 2000. PypeTree, a user-friendly and open-source visual modelling environment, incorporates a number of improvements into the original skeletal extraction technique, making it better adapted to tackle the challenge of tree perennial tissue reconstruction. Within PypeTree, we also introduce the idea of using semi-supervised adjustment tools to address methodological challenges that are associated with imperfect point cloud datasets and which further improve reconstruction accuracy. The performance of these automatic and semi-supervised approaches was tested with the help of synthetic models and subsequently validated on real trees. Accuracy of automatic reconstruction greatly varied in terms of axis detection because small (length < 3.5 cm) branches were difficult to detect. However, as small branches account for little in terms of total skeleton length, mean reconstruction error for cumulated skeleton length only reached 5.1% and 1.8% with automatic or semi-supervised reconstruction, respectively. In some cases, using the supervised tools, a perfect reconstruction of the perennial tissue could be achieved
    corecore