Search CORE

1,483 research outputs found

Using Global Constraints and Reranking to Improve Cognates Detection

Author: Bloodgood Michael
Strauss Benjamin
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Global constraints and reranking have not been used in cognates detection research to date. We propose methods for using global constraints by performing rescoring of the score matrices produced by state of the art cognates detection systems. Using global constraints to perform rescoring is complementary to state of the art methods for performing cognates detection and results in significant performance improvements beyond current state of the art performance on publicly available datasets with different language pairs and various conditions such as different levels of baseline state of the art performance and different data size conditions, including with more realistic large data size conditions than have been evaluated with in the past.Comment: 10 pages, 6 figures, 6 tables; published in the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1983-1992, Vancouver, Canada, July 201

arXiv.org e-Print Archive

Crossref

Translation Memory Retrieval Methods

Author: Bloodgood Michael
Strauss Benjamin
Publication venue
Publication date: 01/01/2014
Field of study

Translation Memory (TM) systems are one of the most widely used translation technologies. An important part of TM systems is the matching algorithm that determines what translations get retrieved from the bank of available translations to assist the human translator. Although detailed accounts of the matching algorithms used in commercial systems can't be found in the literature, it is widely believed that edit distance algorithms are used. This paper investigates and evaluates the use of several matching algorithms, including the edit distance algorithm that is believed to be at the heart of most modern commercial TM systems. This paper presents results showing how well various matching algorithms correlate with human judgments of helpfulness (collected via crowdsourcing with Amazon's Mechanical Turk). A new algorithm based on weighted n-gram precision that can be adjusted for translator length preferences consistently returns translations judged to be most helpful by translators for multiple domains and language pairs.Comment: 9 pages, 6 tables, 3 figures; appeared in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, April 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Digital Repository at the University of Maryland

Data Cleaning for XML Electronic Dictionaries via Statistical Anomaly Detection

Author: Bloodgood Michael
Strauss Benjamin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Many important forms of data are stored digitally in XML format. Errors can occur in the textual content of the data in the fields of the XML. Fixing these errors manually is time-consuming and expensive, especially for large amounts of data. There is increasing interest in the research, development, and use of automated techniques for assisting with data cleaning. Electronic dictionaries are an important form of data frequently stored in XML format that frequently have errors introduced through a mixture of manual typographical entry errors and optical character recognition errors. In this paper we describe methods for flagging statistical anomalies as likely errors in electronic dictionaries stored in XML format. We describe six systems based on different sources of information. The systems detect errors using various signals in the data including uncommon characters, text length, character-based language models, word-based language models, tied-field length ratios, and tied-field transliteration models. Four of the systems detect errors based on expectations automatically inferred from content within elements of a single field type. We call these single-field systems. Two of the systems detect errors based on correspondence expectations automatically inferred from content within elements of multiple related field types. We call these tied-field systems. For each system, we provide an intuitive analysis of the type of error that it is successful at detecting. Finally, we describe two larger-scale evaluations using crowdsourcing with Amazon's Mechanical Turk platform and using the annotations of a domain expert. The evaluations consistently show that the systems are useful for improving the efficiency with which errors in XML electronic dictionaries can be detected.Comment: 8 pages, 4 figures, 5 tables; published in Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pages 79-86, February 201

arXiv.org e-Print Archive

Crossref

Digital Repository at the University of Maryland

Private Sector Participation in the Water and Wastewater Services Industry

Author: Baumert Jennifer
Bloodgood Laura
Publication venue
Publication date
Field of study

Countries introduce private sector participation into the water and wastewater utilities sector for a number of reasons. The introduction of a profit motive may increase efficiency as compared to public management of the water system, and private firms have been noted for customer service improvements. Financial considerations, including revenues from the sale of assets and reductions in the direct cost of providing water services, may also motivate governments to introduce private sector participation in this industry. However, because water is a basic human necessity, the introduction of private participation in this industry sector may raise social, economic, and national security concerns. Private participation in the global water and wastewater industry can take a number of forms including privatization, greenfield projects, concessions, leases, operation and management contracts, and outsourcing and most countries employ a mix of methods. A handful of European firms dominate trade and investment in the global water and wastewater utilities market.water, wastewater, environmental services, private sector participation, Public Economics,

Research Papers in Economics

G-l-o-r-y Spells Glory

Author: Bloodgood L. Mauran
Bloodgood L. Mauran
EsFisher
Publication venue: DigitalCommons@UMaine
Publication date: 01/01/1903
Field of study

https://digitalcommons.library.umaine.edu/mmb-vp/4863/thumbnail.jp

University of Maine