587 research outputs found
Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts
This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize.
The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution.
Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead
Studies in Historical Documents from Nepal and India
study of religion|indology|anthropology|history|tibetolog
Recommended from our members
A High-Resolution Paleoenvironmental and Paleoclimatic History of Extreme Events on the Laminated Sediment Record from Basin Pond, Fayette, Maine, U.S.A.
Future impacts from climate change can be better understood by placing modern climate trends into perspective through extension of the short instrumental records of climate variability. This is especially true for extreme climatic events, such as extreme precipitation and wildfires, as the period of instrumental records provides only a few examples and these have likely have been influenced by anthropogenic warming. Multi-parameter records showing the past range of climate variability can be obtained from lakes. Lakes are particularly good recorders of climate variability because sediment from the surrounding environment accumulates in lakes, making them sensitive recorders of climate variability and providing high-resolution histories of local environmental conditions in the past. In some cases, such as at Basin Pond, sediment is persevered efficiently enough to produce distinguishable annual laminations (varves) in the sedimentary record. The varved record at Basin Pond was used to construct an accurate, highly-resolved age-to-depth model over the past 300 years.
Using a multi-proxy analysis, including organic biomarker analysis of molecular compounds and sedimentological features preserved in the sediment record, a history of environmental and climatic change at Basin Pond was constructed. These analyses were compared with the record of known extreme events (from instrumental measurements and historical documents), including 129 years of high-resolution precipitation and temperature meteorological data, 19 tropical systems over the past 145 years, and two known wildfire events over the past 200 years. Long-term trends in precipitation, including the increase in precipitation seen throughout the last half of the 20th century and the drought of the 1940’s, were captured in the analysis of long-chain n-alkane distributions and through varve thickness measurements obtained through X-Ray Fluorescence analysis. Furthermore, Polycyclic Aromatic Hydrocarbons (PAHs), a class of organic compounds that can be used to trace combustion activity, were found in abundance in the Basin Pond sedimentary record. Peaks in the abundances of two PAHs (retene and chrysene) and the ratio retene/(retene + chrysene) were found to be highly correlated with the known wildfire events occurring in the historical period, giving promise as using these compounds and ratio as a robust proxy for regional wildfire events in the northeastern U.S
Knowledge-based and data-driven approaches for geographical information access
Geographical Information Access (GeoIA) can be defined as a way of retrieving information from textual collections that includes the automatic analysis and interpretation of the geographical constraints and terms present in queries and documents. This PhD thesis presents, describes and evaluates several heterogeneous approaches for the following three GeoIA tasks: Geographical Information Retrieval (GIR), Geographical Question Answering (GeoQA), and Textual Georeferencing (TG). The GIR task deals with user queries that search over documents (e.g. ¿vineyards in California?) and the GeoQA task treats questions that retrieve answers (e.g. ¿What is the capital of France?). On the other hand, TG is the task of associate one or more georeferences (such as polygons or coordinates in a geodetic reference system) to electronic documents.
Current state-of-the-art AI algorithms are not yet fully understanding the semantic meaning and the geographical constraints and terms present in queries and document collections. This thesis attempts to improve the effectiveness results of GeoIA tasks by: 1) improving the detection, understanding, and use of a part of the geographical and the thematic content of queries and documents with Toponym Recognition, Toponym Disambiguation and Natural Language Processing (NLP) techniques, and 2) combining Geographical Knowledge-Based Heuristics based on common sense with Data-Driven IR algorithms.
The main contributions of this thesis to the state-of-the-art of GeoIA tasks are:
1) The presentation of 10 novel approaches for GeoIA tasks: 3 approaches for GIR, 3 for GeoQA, and 4 for Textual Georeferencing (TG).
2) The evaluation of these novel approaches in these contexts: within official evaluation benchmarks, after evaluation benchmarks with the test collections, and with other specific datasets. Most of these algorithms have been evaluated in international evaluations and some of them achieved top-ranked state-of-the-art results, including top-performing results in GIR (GeoCLEF 2007) and TG (MediaEval 2014) benchmarks.
3) The experiments reported in this PhD thesis show that the approaches can combine effectively Geographical Knowledge and NLP with Data-Driven techniques to improve the efectiveness measures of the three Geographical Information Access tasks investigated.
4) TALPGeoIR: a novel GIR approach that combines Geographical Knowledge ReRanking (GeoKR), NLP and Relevance Feedback (RF) that achieved state-of-the-art results in official GeoCLEF benchmarks (Ferrés and RodrÃguez, 2008; Mandl et al., 2008) and posterior experiments (Ferrés and RodrÃguez, 2015a). This approach has been evaluated with the full GeoCLEF corpus (100 topics) and showed that GeoKR, NLP, and RF techniques evaluated separately or in combination improve the results in MAP and R-Precision effectiveness measures of the state-of-the-art IR algorithms TF-IDF, BM25 and InL2 and show statistical significance in most of the experiments.
5) GeoTALP-QA: a scope-based GeoQA approach for Spanish and English and its evaluation with a set of questions of the Spanish geography (Ferrés and RodrÃguez, 2006).
6) Four state-of-the-art Textual Georeferencing approaches for informal and formal documents that achieved state-of-the-art results in evaluation benchmarks (Ferrés and RodrÃguez, 2014) and posterior experiments (Ferrés and RodrÃguez, 2011; Ferrés and RodrÃguez, 2015b).L'Accés a la Informació Geogrà fica (GeoAI) pot ser definit com una forma de recuperar informació de col·lecions textuals que inclou l'anà lisi automà tic i la interpretació dels termes i restriccions geogrà fiques que apareixen en consultes i documents. Aquesta tesi doctoral presenta, descriu i avalua varies aproximacions heterogènies a les seguents tasques de GeoAI: Recuperació de la Informació Geogrà fica (RIG), Cerca de la Resposta Geogrà fica (GeoCR), i Georeferenciament Textual (GT). La tasca de RIG tracta amb consultes d'usuari que cerquen documents (e.g. ¿vinyes a California?) i la tasca GeoCR tracta de recuperar respostes concretes a preguntes (e.g. ¿Quina és la capital de França?). D'altra banda, GT es la tasca de relacionar una o més referències geogrà fiques (com polÃgons o coordenades en un sistema de referència geodètic) a documents electrònics. Els algoritmes de l'estat de l'art actual en Intel·ligència Artificial encara no comprenen completament el significat semà ntic i els termes i les restriccions geogrà fiques presents en consultes i col·leccions de documents. Aquesta tesi intenta millorar els resultats en efectivitat de les tasques de GeoAI de la seguent manera: 1) millorant la detecció, comprensió, i la utilització d'una part del contingut geogrà fic i temà tic de les consultes i documents amb tècniques de reconeixement de topònims, desambiguació de topònims, i Processament del Llenguatge Natural (PLN), i 2) combinant heurÃstics basats en Coneixement Geogrà fic i en el sentit comú humà amb algoritmes de Recuperació de la Informació basats en dades. Les principals contribucions d'aquesta tesi a l'estat de l'art de les tasques de GeoAI són: 1) La presentació de 10 noves aproximacions a les tasques de GeoAI: 3 aproximacions per RIG, 3 per GeoCR, i 4 per Georeferenciament Textual (GT). 2) L'avaluació d'aquestes noves aproximacions en aquests contexts: en el marc d'avaluacions comparatives internacionals, posteriorment a avaluacions comparatives internacionals amb les col·lections de test, i amb altres conjunts de dades especÃfics. La majoria d'aquests algoritmes han estat avaluats en avaluacions comparatives internacionals i alguns d'ells aconseguiren alguns dels millors resultats en l'estat de l'art, com per exemple els resultats en comparatives de RIG (GeoCLEF 2007) i GT (MediaEval 2014). 3) Els experiments descrits en aquesta tesi mostren que les aproximacions poden combinar coneixement geogrà fic i PLN amb tècniques basades en dades per millorar les mesures d'efectivitat en les tres tasques de l'Accés a la Informació Geogrà fica investigades. 4) TALPGeoIR: una nova aproximació a la RIG que combina Re-Ranking amb Coneixement Geogrà fic (GeoKR), PLN i Retroalimentació de Rellevancia (RR) que aconseguà resultats en l'estat de l'art en comparatives oficials GeoCLEF (Ferrés and RodrÃguez, 2008; Mandl et al., 2008) i en experiments posteriors (Ferrés and RodrÃguez, 2015a). Aquesta aproximació ha estat avaluada amb el conjunt complert del corpus GeoCLEF (100 topics) i ha mostrat que les tècniques GeoKR, PLN i RR avaluades separadament o en combinació milloren els resultats en les mesures efectivitat MAP i R-Precision dels algoritmes de l'estat de l'art en Recuperació de la Infomació TF-IDF, BM25 i InL2 i a més mostren significació estadÃstica en la majoria dels experiments. 5) GeoTALP-QA: una aproximació basada en l'à mbit geogrà fic per espanyol i anglès i la seva avaluació amb un conjunt de preguntes de la geografÃa espanyola (Ferrés and RodrÃguez, 2006). 6) Quatre aproximacions per al georeferenciament de documents formals i informals que obtingueren resultats en l'estat de l'art en avaluacions comparatives (Ferrés and RodrÃguez, 2014) i en experiments posteriors (Ferrés and RodrÃguez, 2011; Ferrés and RodrÃguez, 2015b)
Studies in Historical Documents from Nepal and India
study of religion|indology|anthropology|history|tibetolog
Knowledge-based and data-driven approaches for geographical information access
Geographical Information Access (GeoIA) can be defined as a way of retrieving information from textual collections that includes the automatic analysis and interpretation of the geographical constraints and terms present in queries and documents. This PhD thesis presents, describes and evaluates several heterogeneous approaches for the following three GeoIA tasks: Geographical Information Retrieval (GIR), Geographical Question Answering (GeoQA), and Textual Georeferencing (TG). The GIR task deals with user queries that search over documents (e.g. ¿vineyards in California?) and the GeoQA task treats questions that retrieve answers (e.g. ¿What is the capital of France?). On the other hand, TG is the task of associate one or more georeferences (such as polygons or coordinates in a geodetic reference system) to electronic documents.
Current state-of-the-art AI algorithms are not yet fully understanding the semantic meaning and the geographical constraints and terms present in queries and document collections. This thesis attempts to improve the effectiveness results of GeoIA tasks by: 1) improving the detection, understanding, and use of a part of the geographical and the thematic content of queries and documents with Toponym Recognition, Toponym Disambiguation and Natural Language Processing (NLP) techniques, and 2) combining Geographical Knowledge-Based Heuristics based on common sense with Data-Driven IR algorithms.
The main contributions of this thesis to the state-of-the-art of GeoIA tasks are:
1) The presentation of 10 novel approaches for GeoIA tasks: 3 approaches for GIR, 3 for GeoQA, and 4 for Textual Georeferencing (TG).
2) The evaluation of these novel approaches in these contexts: within official evaluation benchmarks, after evaluation benchmarks with the test collections, and with other specific datasets. Most of these algorithms have been evaluated in international evaluations and some of them achieved top-ranked state-of-the-art results, including top-performing results in GIR (GeoCLEF 2007) and TG (MediaEval 2014) benchmarks.
3) The experiments reported in this PhD thesis show that the approaches can combine effectively Geographical Knowledge and NLP with Data-Driven techniques to improve the efectiveness measures of the three Geographical Information Access tasks investigated.
4) TALPGeoIR: a novel GIR approach that combines Geographical Knowledge ReRanking (GeoKR), NLP and Relevance Feedback (RF) that achieved state-of-the-art results in official GeoCLEF benchmarks (Ferrés and RodrÃguez, 2008; Mandl et al., 2008) and posterior experiments (Ferrés and RodrÃguez, 2015a). This approach has been evaluated with the full GeoCLEF corpus (100 topics) and showed that GeoKR, NLP, and RF techniques evaluated separately or in combination improve the results in MAP and R-Precision effectiveness measures of the state-of-the-art IR algorithms TF-IDF, BM25 and InL2 and show statistical significance in most of the experiments.
5) GeoTALP-QA: a scope-based GeoQA approach for Spanish and English and its evaluation with a set of questions of the Spanish geography (Ferrés and RodrÃguez, 2006).
6) Four state-of-the-art Textual Georeferencing approaches for informal and formal documents that achieved state-of-the-art results in evaluation benchmarks (Ferrés and RodrÃguez, 2014) and posterior experiments (Ferrés and RodrÃguez, 2011; Ferrés and RodrÃguez, 2015b).L'Accés a la Informació Geogrà fica (GeoAI) pot ser definit com una forma de recuperar informació de col·lecions textuals que inclou l'anà lisi automà tic i la interpretació dels termes i restriccions geogrà fiques que apareixen en consultes i documents. Aquesta tesi doctoral presenta, descriu i avalua varies aproximacions heterogènies a les seguents tasques de GeoAI: Recuperació de la Informació Geogrà fica (RIG), Cerca de la Resposta Geogrà fica (GeoCR), i Georeferenciament Textual (GT). La tasca de RIG tracta amb consultes d'usuari que cerquen documents (e.g. ¿vinyes a California?) i la tasca GeoCR tracta de recuperar respostes concretes a preguntes (e.g. ¿Quina és la capital de França?). D'altra banda, GT es la tasca de relacionar una o més referències geogrà fiques (com polÃgons o coordenades en un sistema de referència geodètic) a documents electrònics. Els algoritmes de l'estat de l'art actual en Intel·ligència Artificial encara no comprenen completament el significat semà ntic i els termes i les restriccions geogrà fiques presents en consultes i col·leccions de documents. Aquesta tesi intenta millorar els resultats en efectivitat de les tasques de GeoAI de la seguent manera: 1) millorant la detecció, comprensió, i la utilització d'una part del contingut geogrà fic i temà tic de les consultes i documents amb tècniques de reconeixement de topònims, desambiguació de topònims, i Processament del Llenguatge Natural (PLN), i 2) combinant heurÃstics basats en Coneixement Geogrà fic i en el sentit comú humà amb algoritmes de Recuperació de la Informació basats en dades. Les principals contribucions d'aquesta tesi a l'estat de l'art de les tasques de GeoAI són: 1) La presentació de 10 noves aproximacions a les tasques de GeoAI: 3 aproximacions per RIG, 3 per GeoCR, i 4 per Georeferenciament Textual (GT). 2) L'avaluació d'aquestes noves aproximacions en aquests contexts: en el marc d'avaluacions comparatives internacionals, posteriorment a avaluacions comparatives internacionals amb les col·lections de test, i amb altres conjunts de dades especÃfics. La majoria d'aquests algoritmes han estat avaluats en avaluacions comparatives internacionals i alguns d'ells aconseguiren alguns dels millors resultats en l'estat de l'art, com per exemple els resultats en comparatives de RIG (GeoCLEF 2007) i GT (MediaEval 2014). 3) Els experiments descrits en aquesta tesi mostren que les aproximacions poden combinar coneixement geogrà fic i PLN amb tècniques basades en dades per millorar les mesures d'efectivitat en les tres tasques de l'Accés a la Informació Geogrà fica investigades. 4) TALPGeoIR: una nova aproximació a la RIG que combina Re-Ranking amb Coneixement Geogrà fic (GeoKR), PLN i Retroalimentació de Rellevancia (RR) que aconseguà resultats en l'estat de l'art en comparatives oficials GeoCLEF (Ferrés and RodrÃguez, 2008; Mandl et al., 2008) i en experiments posteriors (Ferrés and RodrÃguez, 2015a). Aquesta aproximació ha estat avaluada amb el conjunt complert del corpus GeoCLEF (100 topics) i ha mostrat que les tècniques GeoKR, PLN i RR avaluades separadament o en combinació milloren els resultats en les mesures efectivitat MAP i R-Precision dels algoritmes de l'estat de l'art en Recuperació de la Infomació TF-IDF, BM25 i InL2 i a més mostren significació estadÃstica en la majoria dels experiments. 5) GeoTALP-QA: una aproximació basada en l'à mbit geogrà fic per espanyol i anglès i la seva avaluació amb un conjunt de preguntes de la geografÃa espanyola (Ferrés and RodrÃguez, 2006). 6) Quatre aproximacions per al georeferenciament de documents formals i informals que obtingueren resultats en l'estat de l'art en avaluacions comparatives (Ferrés and RodrÃguez, 2014) i en experiments posteriors (Ferrés and RodrÃguez, 2011; Ferrés and RodrÃguez, 2015b).Postprint (published version
Exploring Written Artefacts
This collection, presented to Michael Friedrich in honour of his academic career at of the Centre for the Study of Manuscript Cultures, traces key concepts that scholars associated with the Centre have developed and refined for the systematic study of manuscript cultures. At the same time, the contributions showcase the possibilities of expanding the traditional subject of ‘manuscripts’ to the larger perspective of ‘written artefacts’
Methods in Contemporary Linguistics
The present volume is a broad overview of methods and methodologies in linguistics, illustrated with examples from concrete research. It collects insights gained from a broad range of linguistic sub-disciplines, ranging from core disciplines to topics in cross-linguistic and language-internal diversity or to contributions towards language, space and society. Given its critical and innovative nature, the volume is a valuable source for students and researchers of a broad range of linguistic interests
Arcticness
Climate change and globalisation are opening up the Arctic for exploitation by the world – or so we are told. But what about the views, interests and needs of the peoples who live in the region? This volume explores the opportunities and limitations in engaging with the Arctic under change, and the Arctic peoples experiencing the changes, socially and physically.
With essays by both academics and Arctic peoples, integrating multiple perspectives and multiple disciplines, the book covers social, legal, political, geographical, scientific and creative questions related to Arcticness, to address the challenges faced by the Arctic as a region and specifically by local communities. As well as academic essays, the contributions to the book include personal reflections, a graphic essay, and poetry, to ensure wide and varied coverage of the Arctic experience – what the contributions all have in common is the fundamental human perspective.
Topics covered in the essays include indigenous identity and livelihoods such as reindeer herding, and adapting to modern identities; a graphic essay on the experience of Arctic indigenous peoples in residential schools; the effects of climate change; energy in the Arctic; and extractive industries and their impacts on local communities.
The book includes reflections on the future of Arcticness, engaging with communities to ensure meaningful representation and as a counterpoint to the primacy of environmental, national and global issues
- …