839,023 research outputs found

    Reliability measurement without limits

    Get PDF
    In computational linguistics, a reliability measurement of 0.8 on some statistic such as Îș\kappa is widely thought to guarantee that hand-coded data is fit for purpose, with lower values suspect. We demonstrate that the main use of such data, machine learning, can tolerate data with a low reliability as long as any disagreement among human coders looks like random noise. When it does not, however, data can have a reliability of more than 0.8 and still be unsuitable for use: the disagreement may indicate erroneous patterns that machine-learning can learn, and evaluation against test data that contain these same erroneous patterns may lead us to draw wrong conclusions about our machine-learning algorithms. Furthermore, lower reliability values still held as acceptable by many researchers, between 0.67 and 0.8, may even yield inflated performance figures in some circumstances. Although this is a common sense result, it has implications for how we work that are likely to reach beyond the machine-learning applications we discuss. At the very least, computational linguists should look for any patterns in the disagreement among coders and assess what impact they will have

    Wide-coverage deep statistical parsing using automatic dependency structure annotation

    Get PDF
    A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep” hand-crafted wide-coverage with “shallow” treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

    Pilonidal sinus

    Get PDF
    Click on the link to view

    Similarity of Semantic Relations

    Get PDF
    There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason:stone is analogous to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, and information retrieval. Recently the Vector Space Model (VSM) of information retrieval has been adapted to measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data, and (3) automatically generated synonyms are used to explore variations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying semantic relations, LRA achieves similar gains over the VSM

    Features of the hidden labour in Italy. An empirical analysis based on the matching of survey and administrative data

    Get PDF
    In this paper, record linkage is applied to detect hidden (or irregular) workers in the Italian labour market. The idea is to trace each single worker in a survey as well as in a set of administrative registers, where the worker should appear if regular. When a worker is sampled by the survey but he does not appear in any of the administrative registers, that worker is identified as irregular. Subsequently, we estimate a logistic regression model to identify the individual and household features, which characterize the typical hidden Italian worker

    European welfare states. Does decentralization affect poverty?

    Get PDF
    The shifting of welfare systems to the local level may have positive or negative consequences. On the one hand, it may be argued that local governments better tailor welfare policies to the specific needs of population, given their direct knowledge of territory; on the other hand, especially in the presence of weak su- pervision by the central government, decentralization may cause inequalities and territorial fragmentation. The main objective of the paper is to explore the relation- ship between welfare state typologies - with different degrees of decentralization - and the level of monetary poverty and material deprivation of citizens. Using data from official statistics, we model individual binary outcomes (living or not under the poverty line, being or not able to make ends meet) as a function of both family- level and country-level characteristics. The empirical analysis is run on a selection of European countries for the year 2013

    One Health in the EU: The Next Future?

    Get PDF
    The Article investigates how the One Health concept is used in the European Union and what functions are attributed to it in EU laws and policies. To this end we conduct a systematic analysis of EU laws and policy documents, with specific emphasis on the European Green Deal and its actions. The first section outlines the main conceptual features of the evolving One Health approach over time. The second section analyses how European laws and policies have considered One Health over time, show-ing its erratic use. The third section is dedicated to analysing how One Health is taken into account by the Green Deal’s actions. The conclusion recognises that the EU conceptualization and operationaliza-tion of One Health is far from being clear, coherent or concrete. However, we argue that a transition may be underway and One Health has the potential to become a new political and legal principle ca-pable of permeating future EU actions towards a new phase of policy integration and sustainability
    • 

    corecore