3,736 research outputs found
Automated scholarly paper review: Technologies and challenges
Peer review is a widely accepted mechanism for research evaluation, playing a
pivotal role in scholarly publishing. However, criticisms have long been
leveled on this mechanism, mostly because of its inefficiency and subjectivity.
Recent years have seen the application of artificial intelligence (AI) in
assisting the peer review process. Nonetheless, with the involvement of humans,
such limitations remain inevitable. In this review paper, we propose the
concept and pipeline of automated scholarly paper review (ASPR) and review the
relevant literature and technologies of achieving a full-scale computerized
review process. On the basis of the review and discussion, we conclude that
there is already corresponding research and implementation at each stage of
ASPR. We further look into the challenges in ASPR with the existing
technologies. The major difficulties lie in imperfect document parsing and
representation, inadequate data, defective human-computer interaction and
flawed deep logical reasoning. Moreover, we discuss the possible moral &
ethical issues and point out the future directions of ASPR. In the foreseeable
future, ASPR and peer review will coexist in a reinforcing manner before ASPR
is able to fully undertake the reviewing workload from humans
Errors lingüístics en el domini biomèdic: Cap a una tipologia d’errors per a l’espanyol
L’objectiu d’aquest treball és l’anàlisi d’errors continguts en un corpus d’informes
mèdics en llenguatge natural i el disseny d’una tipologia d’errors, ja que no hi va haver una
revisió sistemàtica sobre verificació i correcció d’errors en documentació clínica en castellà. En
el desenvolupament de sistemes automàtics de detecció i correcció, és d’interès aprofundir en la
naturalesa dels errors lingüístics que es produeixen en els informes clínics per tal de detectar-los i
tractar-los adequadament. Els resultats mostren que els errors d’omissió són els més freqüents en
la mostra analitzada i que la longitud de la paraula sens dubte influeix en la freqüència d’error.
La tipificació dels patrons d’error proporcionats permet el desenvolupament d’un mòdul basat
en coneixements lingüístics, actualment en curs, que serà capaç de millorar el rendiment dels
sistemes de correcció de detecció i correcció d’errors per al domini biomèdicThe objective of this work is the analysis of errors contained in a corpus of medical reports in
natural language and the design of a typology of errors, as there was no systematic review on
verification and correction of errors in clinical documentation in Spanish. In the development
of automatic detection and correction systems, it is of great interest to delve into the nature of
the linguistic errors that occur in clinical reports, in order to detect and treat them properly.
The results show that omission errors are the most frequent ones in the analyzed sample, and
that word length certainly influences error frequency. The typification of error patterns provided
is enabling the development of a module based on linguistic knowledge, which is currently in
progress. This will help to improve the performance of error detection and correction systems for
the biomedical domain.This work was supported by the Spanish National Research Agency (AEI) through project LaTe4PSP
(PID2019-107652RB-I00/AEI/10.13039/501100011033). Furthermore, the main autor is supported by
Ministerio de Universidades of Spain through the national program Ayudas para la formación de profesorado
universitario (FPU), with reference FPU16/0332
Hybrid model of post-processing techniques for Arabic optical character recognition
Optical character recognition (OCR) is used to extract text contained in an image. One of the stages in OCR is the post-processing and it corrects the errors of OCR output text. The OCR multiple outputs approach consists of three processes: differentiation, alignment, and voting. Existing differentiation techniques suffer from
the loss of important features as it uses
N-versions of input images. On the other hand, alignment techniques in the literatures are based on approximation while the voting process is not context-aware. These drawbacks lead to a high error rate in OCR. This research proposed three improved techniques of differentiation, alignment, and voting to overcome the identified drawbacks. These techniques were later combined into a hybrid model that can recognize the optical characters in the
Arabic language. Each of the proposed technique was separately evaluated against three other relevant existing techniques. The performance measurements used in this study were Word Error Rate (WER), Character Error Rate (CER), and Non-word
Error Rate (NWER). Experimental results showed a relative decrease in error rate on all measurements for the evaluated techniques. Similarly, the hybrid model also obtained lower WER, CER, and NWER by 30.35%, 52.42%, and 47.86% respectively when compared to the three relevant existing models. This study contributes to the OCR domain as the proposed hybrid model of post-processing techniques could facilitate the automatic recognition of Arabic text. Hence, it will lead to a better information retrieval
Evaluating automatic detection of misspellings in German
his study investigates the performance of a spell checker designed for native writers on misspellings made by second language (L2) learners. It addresses two research questions: 1) What is the correction rate of a generic spell checker for L2 misspellings? 2) What factors influence the correction rate of a generic spell checker for L2 misspellings? To explore these questions, the study considers a corpus of 1,027 unique misspellings from 48 Anglophone learners of German and classifies these along three error taxonomies: linguistic competence (competence versus performance misspellings), linguistic subsystem (lexical, morphological or phonological misspellings), and target modification (single-edit misspellings (edit distance = one) versus multiple-edit misspellings (edit distance > 1)). The study then evaluates the performance of the Microsoft Word® spell checker on these misspellings. Results indicate that only 62% of the L2 misspellings are corrected and that the spell checker, independent of other factors, generally cannot correct multiple-edit misspellings although it is quite successful in correcting single-edit errors. In contrast to most misspellings by native writers, many L2 misspellings are multiple-edit errors and are thus not corrected by a spell checker designed for native writers. The study concludes with computational and pedagogical suggestions to enhance spell checking in CALL
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Community’s Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by Consellería
de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
Automatic transcription and phonetic labelling of dyslexic children's reading in Bahasa Melayu
Automatic speech recognition (ASR) is potentially helpful for children who suffer
from dyslexia. Highly phonetically similar errors of dyslexic children‟s reading affect the accuracy of ASR. Thus, this study aims to evaluate acceptable accuracy of ASR using automatic transcription and phonetic labelling of dyslexic children‟s reading in BM. For that, three objectives have been set: first to produce manual transcription and phonetic labelling; second to construct automatic transcription and phonetic labelling using forced alignment; and third to compare between accuracy using automatic transcription and phonetic labelling and manual transcription and
phonetic labelling. Therefore, to accomplish these goals methods have been used including manual speech labelling and segmentation, forced alignment, Hidden Markov Model (HMM) and Artificial Neural Network (ANN) for training, and for measure accuracy of ASR, Word Error Rate (WER) and False Alarm Rate (FAR) were used. A number of 585 speech files are used for manual transcription, forced alignment and training experiment. The recognition ASR engine using automatic transcription and phonetic labelling obtained optimum results is 76.04% with WER as low as 23.96% and FAR is 17.9%. These results are almost similar with ASR
engine using manual transcription namely 76.26%, WER as low as 23.97% and FAR a 17.9%. As conclusion, the accuracy of automatic transcription and phonetic labelling is acceptable to use it for help dyslexic children learning using ASR in Bahasa Melayu (BM
A post processing system for global correction of Ocr generated errors
This thesis discusses the design and implementation of an OCR post processing system. The system is used to perform automatic spelling detection and correction on noisy, OCR generated text. Unlike previous post processing systems, this system works in conjunction with an inverted file database system. The initial results obtained from post processing 10,000 pages of OCR\u27ed text are encouraging. These results indicate that the use of global and local document information extracted from the inverted file system can be effectively used to correct OCR generated spelling errors
OCRspell: An interactive spelling correction system for OCR errors in text
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Character Recognition) generated text that selects candidate words through the use of information gathered from multiple knowledge sources. This system for text correction is based on static and dynamic device mappings, approximate string matching, and n-gram analysis. Our statistically based, Bayesian system incorporates a learning feature that collects confusion information at the collection and document levels. An evaluation of the new system is presented as well
- …