    A large list of confusion sets for spellchecking assessed against a corpus of real-word errors

    One of the methods that has been proposed for dealing with real-word errors (errors that occur when a correctly spelled word is substituted for the one intended) is the "confusion-set" approach - a confusion set being a small group of words that are likely to be confused with one another. Using a list of confusion sets drawn up in advance, a spellchecker, on finding one of these words in a text, can assess whether one of the other members of its set would be a better fit and, if it appears to be so, propose that word as a correction. Much of the research using this approach has suffered from two weaknesses. The first is the small number of confusion sets used. The second is that systems have largely been tested on artificial errors. In this paper we address these two weaknesses. We describe the creation of a realistically sized list of confusion sets, then the assembling of a corpus of real-word errors, and then we assess the potential of that list in relation to that corpus

    BNC! Handle with care! Spelling and tagging errors in the BNC

    "You loose your no-claims bonus," instead of "You lose your no-claims bonus," is an example of a real-word spelling error. One way to enable a spellchecker to detect such errors is to prime it with information about likely features of the context for "loose" (verb) as compared with "lose". To this end, we extracted all the examples of "loose" used as a verb from the BNC (World edition, text). There were, apparently, 159 occurrences of "loose" (VVB or VVI). However, on inspection, well over half of these were not verbs at all (tagging errors) and over half of the rest were misspellings of "lose". Only about 15% were actual occurrences of "loose" as a verb. This prompted us to undertake a small investigation into errors in the BNC. We report on some words that occur more often as misspellings than in their own right - only one of the 63 occurrences of "ail", for example, is correct (possibly OCR errors) - and some words that are always mistagged, such as "haulier" and "glazier" (never NN), and "hanker" and "loiter" (never VV). We note in particular that, if a rare word resembles a common word (in spelling), it is more likely to appear as a misspelling of the common word than as a correct spelling of the rare word. These cases require some modification of an earlier conclusion (Damerau and Mays, 1989) on misspellings of rare words. We conclude with a discussion of the desirability, or otherwise, of correcting errors in corpora such as the BNC. The results may be of interest to people who use the BNC as training data or for teaching


    The NEBLINE, June 2005

    Contents:Meth Production is Toxic to Communities 2005 Perennial Plant of the Year Climbing, Twining and Vining Plant a Moss and Wire Hanging Basket Itch Mite Update: Extension Will Warn When “Mite Showers” May Happen Spider Bites? Look for Yellow Sac Spiders Spider Bites or Skin Infection? Scouting and Treating for Soybean Rust The Nebraska LEAD Program June is Noxious Weed Awareness Month President’s Notes — Janet’s Jargon Household Hints: Cleaning Dirty Socks Take Time for Family Activities FCE News & Events Cleaning Supplies Checklist MyPyramid: The Basics MVP Pudding with Milk Recipe Water is a Nutrient, Too June is Dairy Month: MyPyramid Recommendations for Dairy Foods Open Burning and Fire Safety Emergency Water Purification Obtaining Burn Permits from Lancaster County Fire Districts Grasshopper Control 2005 Lancaster County Fair Clover College Board Members of Nonprofits Have Important Responsibilities Community CROPS Seeks Executive Director Explore Careers at Big Red Academic Camps Extension Calendar Household Hazardous Waste Collections for 2005 Choose from More than 40 Nebraska 4-H Summer Camps Spring 4-H Chess Tournament Results Donna Bundy Fourth Graders Learn about Agriculture at 5th Ag Awareness Festival in Lincoln U.S. Drought Monitor Ma

    Mental Health and Compulsion

    This article looks at the role of compulsion in mental health law as it applies to civil patients. It starts by setting out the existing position and the Government’s proposals for reform as set out in the current Green Paper “Reform of the Mental Health Act 1983”. It goes on to consider principles which might be relevant to this area of law and the application of these to the Government proposals. Finally, it looks at the relevance of the European Convention on Human Rights

    Measurement precision test construction and best test design

    This article examines the precision of measurements obtained from using the Rasch Dichotomous Model to analyse test data. Considering tests in which the item difficulties are uniformly spaced from easiest to most difficult, permits the derivation of an alternative expression for the standard error of measurement. This expression is sufficiently simple to enable the precision properties of uniform tests to be readily described and to enable a variety of problems of test construction to be solved. One particular problem is that of best test design. Regarding measurement precision as a property of the test only, we show that the best uniform test of a given length and a given target interval is the one that satisfies a minimax condition on the standard error. We illustrate the solution to this problem and describe properties of best tests

    Les nouvelles technologies et leurs utilisateurs.: EnquĂȘte sur les usagers des bibliothĂšques françaises.

    Cet article prĂ©sente les premiers rĂ©sultats d'une Ă©tude sur les usages des nouvelles technologies dans les bibliothĂšques accessibles Ă  un large public en France, Ă©tude rĂ©alisĂ©e Ă  partir d'une enquĂȘte, conduite en 1998-1999 sur des sites appartenant Ă  trois rĂ©gions : Ile-de-France, RhĂŽne-Alpes et Provence-Alpes-CĂŽte-d'Azur. Ont Ă©tĂ© distinguĂ©s dans ce texte les deux " profils " qui contrastent le plus dans leur familiaritĂ© avec l'utilisation des cĂ©dĂ©roms et d'Internet : les personnes interrogĂ©es parmi les usagers de la BibliothĂšque nationale de France et ceux de la bibliothĂšque de Miramas (Bouches-du-RhĂŽne)

    La légitimité culturelle en questions

    Sur le marchĂ© des idĂ©es, les thĂ©ories explicatives globales rendant compte de l'histoire humaine, d'Ă©vĂ©nements, de situations, de comportements ou d'attitudes en leur part sociale et culturelle affleurent rĂ©guliĂšrement, s'imposent durant une pĂ©riode plus ou moins longue, puis subissent les assauts d'une critique argumentĂ©e qui en relativise la portĂ©e. Pour ĂȘtre convainquantes, elles supposent chez leurs auteurs l'alliance de qualitĂ©s rares. Il faut des qualitĂ©s plus rares encore lorsque les gains d'intelligibilitĂ© ainsi obtenus s'appuient sur une description respectueuse de la complexitĂ© et de la diversitĂ© du monde social. Cet Ă©tat de grĂące de la pensĂ©e advient parfois, mais est suffisamment peu frĂ©quent pour susciter la fascination qu'appelle toute haute virtuositĂ© ; il s'inscrit dans un genre que nous dirons “ hĂ©roĂŻque ” et qui s'Ă©loigne du droit commun rĂ©gissant l'activitĂ© de recherche en sciences sociales pour lequel la complexitĂ© Ă©nonciative Ă©pouse l'univers bariolĂ© des choses. Nous ne dĂ©battrons pas ici ? car nous n'en avons pas les moyens ? du fait de savoir s'il peut exister un optimum pour lequel une intelligibilitĂ© maximale et une descriptivitĂ© fine peuvent cohabiter. Constatons simplement que pour le domaine qui nous occupe, la sociologie de la culture, les thĂ©ories globalisantes ont Ă©tĂ© le plus souvent prises en dĂ©faut et se sont heurtĂ©es Ă  diverses objections. Il n'y donc rien d'Ă©tonnant Ă  ce que la thĂ©orie de la lĂ©gitimitĂ© suscite aujourd'hui, y compris de la part de ceux qui s'en sont nourri, le sort commun et soit soumise Ă  la critique. Ce faisant les cheminements de l'objection ont empruntĂ© des voix diverses : celle de la thĂ©orie pure ? nous pensons ici Ă  la thĂ©orie de l'action, aux travaux engagĂ©s dans le sillage des Economies de la grandeur par exemple ? ; celle de l'empirie, qui, confrontĂ©e aux objets culturels les plus divers, Ă©prouve la plasticitĂ© et l'efficacitĂ© explicative d'un cadre thĂ©orique apparemment trĂšs adaptable. C'est dans cette derniĂšre voie que nous engagerons notre discussion. Ce faisant et sans qu'il soit besoin de prĂ©senter ici une synthĂšse reprenant le fil, bien connu, qui conduit des HĂ©ritiers, de la Reproduction, de l'Amour de l'art aux RĂšgles de l'art, en passant par la Distinction, nous souhaitons mettre l'accent sur deux ordres de faits : (1) le lien problĂ©matique qui existe entre les variables qui cherchent Ă  rendre compte des comportements culturels ? comme la formation scolaire, l'appartenance Ă  un “ milieu ” ? et ces comportements, (2) la nature des indicateurs Ă  partir desquels la thĂ©orie de la lĂ©gitimitĂ© apprĂ©hende ces derniers ? dĂ©clarer une pratique culturelle, formuler un jugement ou “ le rĂ©vĂ©ler ” par ses actes et ses attitudes. Nous renverrons ici Ă  des travaux que nous rĂ©sumons briĂšvement dans le but de pointer la nature des objections qui nous semblent faire sens, un article bref, comme celui-ci ne permettant pas de rentrer plus avant dans le dĂ©tails des arguments avancĂ©s.Ajoutons pour finir que nous ne chercherons pas ici Ă  inscrire les paradoxes et les limites d'une thĂ©orie dans un cadre thĂ©orique englobant. Aux constructions de tailles plus modestes vont nos prĂ©fĂ©rences
