Search CORE

11 research outputs found

The use of new technologies to access to handwritten historical information in digital form. Galeón Project

Author: Alonso Villalobos Carlos
Márquez Carmona Lourdes
Pastor Gadea Moisés
Vidal Enrique
Publication venue: Ministerio de Educación, Cultura y Deporte. Secretaría General Técnica (España)
Publication date: 01/01/2014
Field of study

Español: La investigación histórica en archivos obliga a realizar un amplio trabajo de revisión de miles de documentos que, en muchos casos, no tienen relación con el tema de estudio, generando un importante gasto en tiempo y recursos. Para dar respuesta a este problema en relación al estudio del patrimonio arqueológico subacuático, desde el CAS-IAPH se ha ideado el Proyecto Galeón, cuyo objetivo es desarrollar soluciones innovadoras para consultar grandes conjuntos digitalizados de documentos históricos manuscritos. Actualmente no es posible la transcripción automatizada de un gran volumen de imágenes de documentos manuscritos, pero el desarrollo tecnológico en el campo del reconocimiento formal de palabras, puede simplificar este proceso. Para ello se ha ideado un modelo teórico de Búsqueda de Palabras Claves (BPC) basado en Grafos de Palabras (GP), que, además de para el patrimonio cultural marítimo, podría utilizarse para otros temas de investigación. Inglés: Historical research in archives forces to realize an extensive work of reviewing thousands of documents that, in many cases, have no connection with the subject matter, generating a significant expenditure of time and resources. To address this problem in relation to the study of underwater archaeological heritage, from the CAS-IAPH has been devised the Galleon Project, which aims to develop innovative solutions to query large sets of historical documents digitized manuscripts. Nowadays It is not possible the automated transcription of a large volume of images from handwritten documents, but the development in the field of formal recognition of words, can simplify this process. For this we have developed a theoretical model to identify Keywords based on Graphs of Words (GP), which, as well as in the maritime cultural heritage, could be used for any research topic

Activos Digitales IAPH

Spotting Keywords in Offline Handwritten Documents Using Hausdorff Edit Distance

Author: Ameri Mohammad Reza
Publication venue
Publication date: 06/06/2018
Field of study

Keyword spotting has become a crucial topic in handwritten document recognition, by enabling content-based retrieval of scanned documents using search terms. With a query keyword, one can search and index the digitized handwriting which in turn facilitates understanding of manuscripts. Common automated techniques address the keyword spotting problem through statistical representations. Structural representations such as graphs apprehend the complex structure of handwriting. However, they are rarely used, particularly for keyword spotting techniques, due to high computational costs. The graph edit distance, a powerful and versatile method for matching any type of labeled graph, has exponential time complexity to calculate the similarities of graphs. Hence, the use of graph edit distance is constrained to small size graphs. The recently developed Hausdorff edit distance algorithm approximates the graph edit distance with quadratic time complexity by efficiently matching local substructures. This dissertation speculates using Hausdorff edit distance could be a promising alternative to other template-based keyword spotting approaches in term of computational time and accuracy. Accordingly, the core contribution of this thesis is investigation and development of a graph-based keyword spotting technique based on the Hausdorff edit distance algorithm. The high representational power of graphs combined with the efficiency of the Hausdorff edit distance for graph matching achieves remarkable speedup as well as accuracy. In a comprehensive experimental evaluation, we demonstrate the solid performance of the proposed graph-based method when compared with state of the art, both, concerning precision and speed. The second contribution of this thesis is a keyword spotting technique which incorporates dynamic time warping and Hausdorff edit distance approaches. The structural representation of graph-based approach combined with statistical geometric features representation compliments each other in order to provide a more accurate system. The proposed system has been extensively evaluated with four types of handwriting graphs and geometric features vectors on benchmark datasets. The experiments demonstrate a performance boost in which outperforms individual systems

Concordia University Research Repository

Proceedings of the 4th International Workshop on Reading Music Systems

Author: Calvo-Zaragoza Jorge
Pacha Alexander
Shatri Elona
Publication venue
Publication date: 23/11/2022
Field of study

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 4th International Workshop on Reading Music Systems, held online on Nov. 18th 2022.Comment: Proceedings edited by Jorge Calvo-Zaragoza, Alexander Pacha and Elona Shatr

arXiv.org e-Print Archive

Mapping (Dis-)Information Flow about the MH17 Plane Crash

Author: Augenstein Isabelle
Golovchenko Yevgeniy
Hartmann Mareike
Publication venue
Publication date: 01/01/2019
Field of study

Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

DATA-DRIVEN ANALYTICAL MODELS FOR IDENTIFICATION AND PREDICTION OF OPPORTUNITIES AND THREATS

Author: Mishra Saurabh
Publication venue
Publication date: 01/01/2018
Field of study

During the lifecycle of mega engineering projects such as: energy facilities, infrastructure projects, or data centers, executives in charge should take into account the potential opportunities and threats that could affect the execution of such projects. These opportunities and threats can arise from different domains; including for example: geopolitical, economic or financial, and can have an impact on different entities, such as, countries, cities or companies. The goal of this research is to provide a new approach to identify and predict opportunities and threats using large and diverse data sets, and ensemble Long-Short Term Memory (LSTM) neural network models to inform domain specific foresights. In addition to predicting the opportunities and threats, this research proposes new techniques to help decision-makers for deduction and reasoning purposes. The proposed models and results provide structured output to inform the executive decision-making process concerning large engineering projects (LEPs). This research proposes new techniques that not only provide reliable timeseries predictions but uncertainty quantification to help make more informed decisions. The proposed ensemble framework consists of the following components: first, processed domain knowledge is used to extract a set of entity-domain features; second, structured learning based on Dynamic Time Warping (DTW), to learn similarity between sequences and Hierarchical Clustering Analysis (HCA), is used to determine which features are relevant for a given prediction problem; and finally, an automated decision based on the input and structured learning from the DTW-HCA is used to build a training data-set which is fed into a deep LSTM neural network for time-series predictions. A set of deeper ensemble programs are proposed such as Monte Carlo Simulations and Time Label Assignment to offer a controlled setting for assessing the impact of external shocks and a temporal alert system, respectively. The developed model can be used to inform decision makers about the set of opportunities and threats that their entities and assets face as a result of being engaged in an LEP accounting for epistemic uncertainty

Digital Repository at the University of Maryland

Adapting BLSTM Neural Network Based Keyword Spotting Trained on Modern Data to Historical Documents

Author: Bunke Horst
Fischer Andreas
Frinken Volkmar
Manmatha R.
Publication venue: SelectedWorks
Publication date: 01/01/2010
Field of study

Being able to search for words or phrases in historic handwritten documents is of paramount importance when preserving cultural heritage. Storing scanned pages of written text can save the information from degradation, but it does not make the textual information readily available. Automatic keyword spotting systems for handwritten historic documents can fill this gap. However, most such systems have trouble with the great variety of writing styles. It is not uncommon for handwriting processing systems to be built for just a single book. In this paper we show that neural network based keyword spotting systems are flexible enough to be used successfully on historic data, even when they are trained on a modern handwriting database. We demonstrate that with little transcribed historic text, added to the training set, the performance can further be enhanced

CiteSeerX

Crossref

ScholarWorks@UMass Amherst

Bern Open Repository and Information System (BORIS)

EVALITA Evaluation of NLP and Speech Tools for Italian Proceedings of the Final Workshop

Author: Basile Pierpaolo
Cutugno Franco
Nissim Malvina
Patti Viviana
Pierpaolo Basile Franco Cutugno, Malvina Nissim, Viviana Patti, Rachele Sprugnoli
Sprugnoli Rachele
Publication venue: place:Torino
Publication date: 01/01/2016
Field of study

Editor of the proceedings of EVALITA 2016

Archivio istituzionale della Ricerca - Università degli Studi di Parma

PubliCatt