1,580 research outputs found

    A comparative evaluation of deep and shallow approaches to the automatic detection of common grammatical errors

    Get PDF
    This paper compares a deep and a shallow processing approach to the problem of classifying a sentence as grammatically wellformed or ill-formed. The deep processing approach uses the XLE LFG parser and English grammar: two versions are presented, one which uses the XLE directly to perform the classification, and another one which uses a decision tree trained on features consisting of the XLE’s output statistics. The shallow processing approach predicts grammaticality based on n-gram frequency statistics: we present two versions, one which uses frequency thresholds and one which uses a decision tree trained on the frequencies of the rarest n-grams in the input sentence. We find that the use of a decision tree improves on the basic approach only for the deep parser-based approach. We also show that combining both the shallow and deep decision tree features is effective. Our evaluation is carried out using a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting grammatical errors into well-formed BNC sentences

    Comparing the use of edited and unedited text in parser self-training

    Get PDF
    We compare the use of edited text in the form of newswire and unedited text in the form of discussion forum posts as sources for training material in a self-training experiment involving the Brown reranking parser and a test set of sentences from an online sports discussion forum. We find that grammars induced from the two automatically parsed corpora achieve similar Parseval f-scores, with the grammars induced from the discussion forum material being slightly superior. An error analysis reveals that the two types of grammars do behave differently

    C-structures and f-structures for the British national corpus

    Get PDF
    We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%

    Rank-aware, Approximate Query Processing on the Semantic Web

    Get PDF
    Search over the Semantic Web corpus frequently leads to queries having large result sets. So, in order to discover relevant data elements, users must rely on ranking techniques to sort results according to their relevance. At the same time, applications oftentimes deal with information needs, which do not require complete and exact results. In this thesis, we face the problem of how to process queries over Web data in an approximate and rank-aware fashion

    Transmission Electron Microscopy of Platelets FROM Apheresis and Buffy-Coat-Derived Platelet Concentrates

    Get PDF
    Platelet concentrates are produced in order to treat bleeding disorders. They can be provided by apheresis machines or by pooling of buffy coats from four blood donations. During their manufacturing and storage, morphological alterations of platelets occur which can be demonstrated by transmission electron microscopy. Alterations range from slight and reversible changes, such as formation of small cell protrusions and swelling of the surface-connected open canalicular system, to severe structural changes, where platelets undergo transitions from discoid to ameboid shapes as a consequence of platelet activation. These alterations end in delivery of the contents of platelet granules as well as platelet involution caused by apoptosis and necrosis processes denoted as the platelet release reaction. Hereby, the involvement of the network of the open canalicular system, helping to deliver the contents of platelet granules into the surrounding milieu via pores, is distinctly shown by electron tomography. As a consequence of platelet activation, a delivery of differently sized microparticles takes place which is thought to play an important role in the adverse reactions in some recipients of platelet concentrates. In this article, the formation and delivery of platelet microparticles is illustrated by electron tomography. Above all, the ultrastructural features of platelets and megakaryocytes are discussed in the context of the molecular characteristics of the plasma membrane and organelles including the different granules and the expression of receptors engaged in signaling during platelet activation. Starting from the knowledge of the ultrastructure of resting and activated platelets, a score classification is presented, allowing the evaluation of different activation stages in a reproducible manner. Examples of evaluations of platelet concentrates using electron microscopy are briefly reviewed. In the last part, experiments showing the interaction of platelets with bacteria are presented. Using the tracer ruthenium red, for staining of the plasma membrane and the open canalicular system of platelets as well as the bacterial wall, the ability of platelets to adhere and sequestrate bacteria by formation of small aggregates, but also to incorporate them into compartments of the open canalicular system which are separated from the surrounding milieu, was shown. In conclusion, electron microscopy is an appropriate tool for the investigation of the quality of platelet concentrates. It can efficiently support results on the functional state of platelets obtained by other methods such as flow cytometry and aggregometry

    Judging grammaticality: experiments in sentence classification

    Get PDF
    A classifier which is capable of distinguishing a syntactically well formed sentence from a syntactically ill formed one has the potential to be useful in an L2 language-learning context. In this article, we describe a classifier which classifies English sentences as either well formed or ill formed using information gleaned from three different natural language processing techniques. We describe the issues involved in acquiring data to train such a classifier and present experimental results for this classifier on a variety of ill formed sentences. We demonstrate that (a) the combination of information from a variety of linguistic sources is helpful, (b) the trade-off between accuracy on well formed sentences and accuracy on ill formed sentences can be fine tuned by training multiple classifiers in a voting scheme, and (c) the performance of the classifier is varied, with better performance on transcribed spoken sentences produced by less advanced language learners

    Using NLP technology in CALL

    Get PDF
    This paper outlines the research and guiding research principles of the (I)CALL group at Dublin City University, Ireland. Our research activities include the development of (I)CALL systems targeted at a variety of user groups including advanced Romance language learners, intermediate to advanced German learners, primary and secondary school students as well as students with L1 learning disabilities requiring a variety of system types which cater to individual user needs and abilities. Suitable CL/NLP technology is incorporated where appropriate for the learner

    From news to comment: Resources and benchmarks for parsing the language of web 2.0

    Get PDF
    We investigate the problem of parsing the noisy language of social media. We evaluate four all-Street-Journal-trained statistical parsers (Berkeley, Brown, Malt and MST) on a new dataset containing 1,000 phrase structure trees for sentences from microblogs (tweets) and discussion forum posts. We compare the four parsers on their ability to produce Stanford dependencies for these Web 2.0 sentences. We find that the parsers have a particular problem with tweets and that a substantial part of this problem is related to POS tagging accuracy. We attempt three retraining experiments involving Malt, Brown and an in-house Berkeley-style parser and obtain a statistically significant improvement for all three parsers
    corecore