8,310 research outputs found

    The glass ceiling in NLP

    Get PDF

    The Word Analogy Testing Caveat

    Get PDF

    Recurrent models and lower bounds for projective syntactic decoding

    Get PDF

    On the Problem of Inference for Inequality Measures for Heavy-Tailed Distributions

    Get PDF
    The received wisdom about inference problems for inequality measures is that these are caused by the presence of extremes in samples drawn from heavy-tailed distributions. We show that this is incorrect since the density of the studentised inequality measure is heavily skewed to the left, and the excessive coverage failures of the usual confidence intervals are associated with low estimates of both the point measure and the variance. For further diagnostics the coefficients of bias, skewness and kurtosis are derived for both studentised and standardised inequality measures, and the explicit cumulant expansions make also available Edgeworth expansions and saddlepoint approximations. In view of the key role played by the estimated variance of the measure, variance stabilising transforms are considered and shown to improve inference. <br><br> Keynames; Inequality measures, inference, statistical performance, asymptotic expansions, variance stabilisation. <br><br> JEL Classification: C10, D31, D63.

    Preparing, restructuring, and augmenting a French treebank: lexicalised parsers or coherent treebanks?

    Get PDF
    We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from the Paris 7 Treebank (P7T), which is cleaner, more coherent, has several transformed structures, and introduces new linguistic analyses. To determine the effect of these changes, we investigate how theMFT fares in statistical parsing. Probabilistic parsers trained on the MFT training set (currently 3800 trees) already perform better than their counterparts trained on five times the P7T data (18,548 trees), providing an extreme example of the importance of data quality over quantity in statistical parsing. Moreover, regression analysis on the learning curve of parsers trained on the MFT lead to the prediction that parsers trained on the full projected 18,548 tree MFT training set will far outscore their counterparts trained on the full P7T. These analyses also show how problematic data can lead to problematic conclusions–in particular, we find that lexicalisation in the probabilistic parsing of French is probably not as crucial as was once thought (Arun and Keller (2005))

    Treebank-based acquisition of LFG parsing resources for French

    Get PDF
    Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in automatically obtained wide-coverage grammars from treebanks for natural language processing. In particular, recent years have seen the growth in interest in automatically obtained deep resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such as dependency syntactic trees. As is often the case in early pioneering work on natural language processing, English has provided the focus of first efforts towards acquiring deep-grammar resources, followed by successful treatments of, for example, German, Japanese, Chinese and Spanish. However, no comparable large-scale automatically acquired deep-grammar resources have been obtained for French to date. The goal of this paper is to present the application of treebank-based language acquisition to the case of French. We show that with modest changes to the established parsing architectures, encouraging results can be obtained for French, with a best dependency structure f-score of 86.73%
    corecore