67,993 research outputs found

    NetiNeti : Discovery of Scientific Names from Text Using Machine Learning Methods Figure 1

    Get PDF
    Figure 1 demonstrates a series of training experiments with the Naïve Bayes classifier using different neighborhoods for contextual features, different sizes of positive and negative training examples and evaluated the resulting classifiers with our annotated gold standard corpus. The data sets are the results of running NetiNeti on subset of 136 PubMedCentral tagged open access articles and with no stop list.A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org we present the comparison results of various machine learning algorithms on our annotated corpus. Naïve Bayes and Maximum Entropy with Generalized Iterative Scaling (GIS) parameter estimation are the top two performing algorithms

    A Maximum Entropy Procedure to Solve Likelihood Equations

    Get PDF
    In this article, we provide initial findings regarding the problem of solving likelihood equations by means of a maximum entropy (ME) approach. Unlike standard procedures that require equating the score function of the maximum likelihood problem at zero, we propose an alternative strategy where the score is instead used as an external informative constraint to the maximization of the convex Shannon\u2019s entropy function. The problem involves the reparameterization of the score parameters as expected values of discrete probability distributions where probabilities need to be estimated. This leads to a simpler situation where parameters are searched in smaller (hyper) simplex space. We assessed our proposal by means of empirical case studies and a simulation study, the latter involving the most critical case of logistic regression under data separation. The results suggested that the maximum entropy reformulation of the score problem solves the likelihood equation problem. Similarly, when maximum likelihood estimation is difficult, as is the case of logistic regression under separation, the maximum entropy proposal achieved results (numerically) comparable to those obtained by the Firth\u2019s bias-corrected approach. Overall, these first findings reveal that a maximum entropy solution can be considered as an alternative technique to solve the likelihood equation
    corecore