19,087 research outputs found
A random forest system combination approach for error detection in digital dictionaries
When digitizing a print bilingual dictionary, whether via optical character
recognition or manual entry, it is inevitable that errors are introduced into
the electronic version that is created. We investigate automating the process
of detecting errors in an XML representation of a digitized print dictionary
using a hybrid approach that combines rule-based, feature-based, and language
model-based methods. We investigate combining methods and show that using
random forests is a promising approach. We find that in isolation, unsupervised
methods rival the performance of supervised methods. Random forests typically
require training data so we investigate how we can apply random forests to
combine individual base methods that are themselves unsupervised without
requiring large amounts of training data. Experiments reveal empirically that a
relatively small amount of data is sufficient and can potentially be further
reduced through specific selection criteria.Comment: 9 pages, 7 figures, 10 tables; appeared in Proceedings of the
Workshop on Innovative Hybrid Approaches to the Processing of Textual Data,
April 201
Full Carbon Account for Russia.
The Forestry Project (FOR) at IIASA has produced a full carbon account (FCA) for Russia for 1990, together with scenarios for 2010. Currently, there are rather big question marks regarding the existing carbon accounts for Russia, and Russia is critical to the global carbon balance due to its size. IIASA is in a position to perform solid analysis of Russia because of the databases that the Institute has built over the years. FOR based this work on a comprehensive geographic information system comprising georeferenced descriptions of the environment and land of Russia, which in turn are based on a number of thematic, digitized maps and databases. For the Russian energy sector and other industrial sectors (except the forest industry), the project used emissions estimates from the recent IIASA study "Global Energy Perspectives" (1998). The project carried out a separate substudy for the Russian forest industry sector. According to FOR's estimate, the total fluxes (including energy and industry sectors) in Russia were a net source of 527 teragrams of carbon (Tg C) in 1990. To illustrate the possible development of the carbon pools and fluxes over the next 10 years, FOR developed three different scenarios for the period 1990-2010, reflecting different assumptions regarding Russia's GDP growth. According to these scenarios, Russia will continue to be a net source of carbon to the atmosphere with 156-385 Tg C in 2010, including the emissions from energy and other industrial sectors. However, analysis of the FCA also shows considerable uncertainties involved in the carbon accounting. These uncertainties exceed the calculated changes in the full flux balance for the period 1990-2010. At present, this raises grave questions regarding the reliability of any accounting system used to measure terrestrial ecosystems for compliance with the Kyoto Protocol.
Machine learning assists the classification of reports by citizens on disease-carrying mosquitoes
Mosquito Alert (www.mosquitoalert.com/en) is an expert-validated citizen science platform for tracking and controlling disease-carrying mosquitoes. Citizens download a free app and use their phones to send reports of presumed sightings of two world-wide disease vector
mosquito species (the Asian Tiger and the Yellow Fever mosquito). These reports are then supervised by a team of entomologists and, once validated, added to a database. As the platform prepares to scale to much larger geographical areas and user bases, the expert validation by entomologists becomes the main bottleneck. In this paper we describe the use of machine learning on the citizen reports to automatically validate a fraction of them, therefore allowing the entomologists either to deal with larger report streams or to concentrate on those that are more strategic, such as reports from new areas (so that early warning protocols are activated) or from areas with high epidemiological risks (so that control actions to reduce mosquito populations are activated). The current prototype flags a third of the reports as “almost certainly positive” with high confidence. It is currently being integrated into the main workflow of the Mosquito Alert platform.Postprint (published version
- …