164,204 research outputs found
Detection is the central problem in real-word spelling correction
Real-word spelling correction differs from non-word spelling correction in
its aims and its challenges. Here we show that the central problem in real-word
spelling correction is detection. Methods from non-word spelling correction,
which focus instead on selection among candidate corrections, do not address
detection adequately, because detection is either assumed in advance or heavily
constrained. As we demonstrate in this paper, merely discriminating between the
intended word and a random close variation of it within the context of a
sentence is a task that can be performed with high accuracy using
straightforward models. Trigram models are sufficient in almost all cases. The
difficulty comes when every word in the sentence is a potential error, with a
large set of possible candidate corrections. Despite their strengths, trigram
models cannot reliably find true errors without introducing many more, at least
not when used in the obvious sequential way without added structure. The
detection task exposes weakness not visible in the selection task
Rethinking a Reinvigorated Right To Assemble
Revived after a decades-long slumber, the First Amendmentâs Assembly Clause has garnered robust attention of late. Endeavoring to reinvigorate this forgotten clause, legal scholars have outlined a normative vision of the assembly right that would better safeguard the freedom of association. This Note argues that such an approachâno matter its merits or its deficienciesâoverlooks the Clauseâs central aim. The assembly right is in fact best understood as an assembly right, not as a right about associations. This Note advances that proposition by closely analyzing the text and the history of the Assembly Clause, a project that has not yet been systematically undertaken. The evidence unearthed from this inquiry demonstrates that the Assembly Clause seeks, as its first-order concern, to protect in-person, fleshâandâblood gatherings. Such protection is thus ultimately of great import in rethinking both the freedoms afforded and the constraints imposed on dissent within our constitutional framework
Cognitive apprenticeship : teaching the craft of reading, writing, and mathtematics
Includes bibliographical references (p. 25-27)This research was supported by the National Institute of Education under Contract no. US-NIE-C-400-81-0030 and the Office of Naval Research under Contract No. N00014-85-C-002
Judicial Attention as a Scarce Resource: A Preliminary Defense of How Judges Allocate Time Across Cases in the Federal Courts of Appeals
Federal appellate judges no longer have the time to hear argument and draft opinions in all of their cases. The average annual filing per active judgeship now stands at 330 filed cases per year â more than four times what it was sixty years ago. In response, judges have adopted case management strategies that effectively involve spending significantly less time on certain classes of cases than on others. Various scholars have decried this state of affairs, suggesting that the courts have created a âbifurcatedâ system of justice with âseparate and unequal tracks.â These reformers propose altering the relevant constraints of the courts, primarily by increasing the number of judges or decreasing the judiciaryâs caseload. These approaches, however, have not gained political traction thus far and seem unlikely to in the foreseeable future.
This Article takes a realist approach and argues that we should recognize judicial attention for what it is â a scarce resource â and assess whether there is evidence that the courts are allocating that resource improperly. Loosely borrowing the framework of resource allocation from the political science and economics literatures, this Article considers how to apply the concepts of inputs and outputs to the work of the federal appellate courts, suggesting judicial attention as the input and a combination of error correction and law development as the output. It then makes the preliminary case that the courtsâ case management techniques in fact largely comport with an output-maximization approach, while still limiting inequality of outputs across cases. This Article concludes that the courtsâ overall strategy nevertheless presents opportunities for enhancement. It suggests several improvements, focusing on the review structure of cases that receive the least amount of judicial attention, to help ensure that all federal cases receive an appropriate form of appellate review
DCU@TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval
This paper describes the first participation of DCU in the TREC Medical Records Track (TRECMed). We performed some initial experiments on the 2011 TRECMed data based on the BM25 retrieval model. Surprisingly, we found that the standard BM25 model with default parameters, performs comparable to the best automatic runs submitted to TRECMed 2011 and would have resulted in rank four out of 29 participating groups. We expected that some form of domain adaptation would increase performance. However, results on the 2011 data proved otherwise: concept-based query expansion decreased performance, and filtering and reranking by term proximity also decreased performance slightly. We submitted four runs based on the BM25 retrieval model to TRECMed 2012 using standard BM25, standard query expansion, result filtering, and concept-based query expansion. Official results for 2012 confirm that domain-specific knowledge does not increase performance compared to the BM25 baseline as applied by us
Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench
Challenges in creating comprehensive text-processing worklows include a lack of the interoperability of individual components coming from different providers and/or a requirement imposed on the end users to know programming techniques to compose such workflows. In this paper we demonstrate Argo, a web-based system that addresses these issues in several ways. It supports the widely adopted Unstructured Information Management Architecture (UIMA), which handles the problem of interoperability; it provides a web browser-based interface for developing workflows by drawing diagrams composed of a selection of available processing components; and it provides novel user-interactive analytics such as the annotation editor which constitutes a bridge between automatic processing and manual correction. These features extend the target audience of Argo to users with a limited or no technical background. Here, we focus specifically on the construction of advanced workflows, involving multiple branching and merging points, to facilitate various comparative evalutions. Together with the use of user-collaboration capabilities supported in Argo, we demonstrate several use cases including visual inspections, comparisions of multiple processing segments or complete solutions against a reference standard, inter-annotator agreement, and shared task mass evaluations. Ultimetely, Argo emerges as a one-stop workbench for defining, processing, editing and evaluating text processing tasks
An Analysis of Source-Side Grammatical Errors in NMT
The quality of Neural Machine Translation (NMT) has been shown to
significantly degrade when confronted with source-side noise. We present the
first large-scale study of state-of-the-art English-to-German NMT on real
grammatical noise, by evaluating on several Grammar Correction corpora. We
present methods for evaluating NMT robustness without true references, and we
use them for extensive analysis of the effects that different grammatical
errors have on the NMT output. We also introduce a technique for visualizing
the divergence distribution caused by a source-side error, which allows for
additional insights.Comment: Accepted and to be presented at BlackboxNLP 201
- âŚ