164,204 research outputs found

    Detection is the central problem in real-word spelling correction

    Full text link
    Real-word spelling correction differs from non-word spelling correction in its aims and its challenges. Here we show that the central problem in real-word spelling correction is detection. Methods from non-word spelling correction, which focus instead on selection among candidate corrections, do not address detection adequately, because detection is either assumed in advance or heavily constrained. As we demonstrate in this paper, merely discriminating between the intended word and a random close variation of it within the context of a sentence is a task that can be performed with high accuracy using straightforward models. Trigram models are sufficient in almost all cases. The difficulty comes when every word in the sentence is a potential error, with a large set of possible candidate corrections. Despite their strengths, trigram models cannot reliably find true errors without introducing many more, at least not when used in the obvious sequential way without added structure. The detection task exposes weakness not visible in the selection task

    Rethinking a Reinvigorated Right To Assemble

    Get PDF
    Revived after a decades-long slumber, the First Amendment’s Assembly Clause has garnered robust attention of late. Endeavoring to reinvigorate this forgotten clause, legal scholars have outlined a normative vision of the assembly right that would better safeguard the freedom of association. This Note argues that such an approach—no matter its merits or its deficiencies—overlooks the Clause’s central aim. The assembly right is in fact best understood as an assembly right, not as a right about associations. This Note advances that proposition by closely analyzing the text and the history of the Assembly Clause, a project that has not yet been systematically undertaken. The evidence unearthed from this inquiry demonstrates that the Assembly Clause seeks, as its first-order concern, to protect in-person, flesh–and–blood gatherings. Such protection is thus ultimately of great import in rethinking both the freedoms afforded and the constraints imposed on dissent within our constitutional framework

    Cognitive apprenticeship : teaching the craft of reading, writing, and mathtematics

    Get PDF
    Includes bibliographical references (p. 25-27)This research was supported by the National Institute of Education under Contract no. US-NIE-C-400-81-0030 and the Office of Naval Research under Contract No. N00014-85-C-002

    Judicial Attention as a Scarce Resource: A Preliminary Defense of How Judges Allocate Time Across Cases in the Federal Courts of Appeals

    Get PDF
    Federal appellate judges no longer have the time to hear argument and draft opinions in all of their cases. The average annual filing per active judgeship now stands at 330 filed cases per year — more than four times what it was sixty years ago. In response, judges have adopted case management strategies that effectively involve spending significantly less time on certain classes of cases than on others. Various scholars have decried this state of affairs, suggesting that the courts have created a “bifurcated” system of justice with “separate and unequal tracks.” These reformers propose altering the relevant constraints of the courts, primarily by increasing the number of judges or decreasing the judiciary’s caseload. These approaches, however, have not gained political traction thus far and seem unlikely to in the foreseeable future. This Article takes a realist approach and argues that we should recognize judicial attention for what it is — a scarce resource — and assess whether there is evidence that the courts are allocating that resource improperly. Loosely borrowing the framework of resource allocation from the political science and economics literatures, this Article considers how to apply the concepts of inputs and outputs to the work of the federal appellate courts, suggesting judicial attention as the input and a combination of error correction and law development as the output. It then makes the preliminary case that the courts’ case management techniques in fact largely comport with an output-maximization approach, while still limiting inequality of outputs across cases. This Article concludes that the courts’ overall strategy nevertheless presents opportunities for enhancement. It suggests several improvements, focusing on the review structure of cases that receive the least amount of judicial attention, to help ensure that all federal cases receive an appropriate form of appellate review

    DCU@TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval

    Get PDF
    This paper describes the first participation of DCU in the TREC Medical Records Track (TRECMed). We performed some initial experiments on the 2011 TRECMed data based on the BM25 retrieval model. Surprisingly, we found that the standard BM25 model with default parameters, performs comparable to the best automatic runs submitted to TRECMed 2011 and would have resulted in rank four out of 29 participating groups. We expected that some form of domain adaptation would increase performance. However, results on the 2011 data proved otherwise: concept-based query expansion decreased performance, and filtering and reranking by term proximity also decreased performance slightly. We submitted four runs based on the BM25 retrieval model to TRECMed 2012 using standard BM25, standard query expansion, result filtering, and concept-based query expansion. Official results for 2012 confirm that domain-specific knowledge does not increase performance compared to the BM25 baseline as applied by us

    Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench

    Get PDF
    Challenges in creating comprehensive text-processing worklows include a lack of the interoperability of individual components coming from different providers and/or a requirement imposed on the end users to know programming techniques to compose such workflows. In this paper we demonstrate Argo, a web-based system that addresses these issues in several ways. It supports the widely adopted Unstructured Information Management Architecture (UIMA), which handles the problem of interoperability; it provides a web browser-based interface for developing workflows by drawing diagrams composed of a selection of available processing components; and it provides novel user-interactive analytics such as the annotation editor which constitutes a bridge between automatic processing and manual correction. These features extend the target audience of Argo to users with a limited or no technical background. Here, we focus specifically on the construction of advanced workflows, involving multiple branching and merging points, to facilitate various comparative evalutions. Together with the use of user-collaboration capabilities supported in Argo, we demonstrate several use cases including visual inspections, comparisions of multiple processing segments or complete solutions against a reference standard, inter-annotator agreement, and shared task mass evaluations. Ultimetely, Argo emerges as a one-stop workbench for defining, processing, editing and evaluating text processing tasks

    An Analysis of Source-Side Grammatical Errors in NMT

    Full text link
    The quality of Neural Machine Translation (NMT) has been shown to significantly degrade when confronted with source-side noise. We present the first large-scale study of state-of-the-art English-to-German NMT on real grammatical noise, by evaluating on several Grammar Correction corpora. We present methods for evaluating NMT robustness without true references, and we use them for extensive analysis of the effects that different grammatical errors have on the NMT output. We also introduce a technique for visualizing the divergence distribution caused by a source-side error, which allows for additional insights.Comment: Accepted and to be presented at BlackboxNLP 201
    • …
    corecore