Search CORE

8,175 research outputs found

Event-based Access to Historical Italian War Memoirs

Author: Nanni Federico
Ponzetto Simone Paolo
Rovera Marco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure

arXiv.org e-Print Archive

MAnnheim DOCument Server

Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

Author: Baeriswyl Michael
Honnet Pierre-Edouard
Musat Claudiu
Popescu-Belis Andrei
Publication venue
Publication date: 06/02/2018
Field of study

The goal of this work is to design a machine translation (MT) system for a low-resource family of dialects, collectively known as Swiss German, which are widely spoken in Switzerland but seldom written. We collected a significant number of parallel written resources to start with, up to a total of about 60k words. Moreover, we identified several other promising data sources for Swiss German. Then, we designed and compared three strategies for normalizing Swiss German input in order to address the regional diversity. We found that character-based neural MT was the best solution for text normalization. In combination with phrase-based statistical MT, our solution reached 36% BLEU score when translating from the Bernese dialect. This value, however, decreases as the testing data becomes more remote from the training one, geographically and topically. These resources and normalization techniques are a first step towards full MT of Swiss German dialects.Comment: 11th Language Resources and Evaluation Conference (LREC), 7-12 May 2018, Miyazaki (Japan

arXiv.org e-Print Archive

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Digital Image Access & Retrieval

Author: Heidorn P. Bryan
Sandore Beth
Publication venue: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Publication date: 01/01/1997
Field of study

The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

On-line Handwritten Character Recognition: An Implementation of Counterpropagation Neural Net

Author: Mohamad Dzulkifli
Othman Muhamad Razib
Zafar Muhammad Faisal
Publication venue: ENFORMATIKA
Publication date: 01/12/2005
Field of study

On-line handwritten scripts are usually dealt with pen tip traces from pen-down to pen-up positions. Time evaluation of the pen coordinates is also considered along with trajectory information. However, the data obtained needs a lot of preprocessing including filtering, smoothing, slant removing and size normalization before recognition process. Instead of doing such lengthy preprocessing, this paper presents a simple approach to extract the useful character information. This work evaluates the use of the counter- propagation neural network (CPN) and presents feature extraction mechanism in full detail to work with on-line handwriting recognition. The obtained recognition rates were 60% to 94% using the CPN for different sets of character samples. This paper also describes a performance study in which a recognition mechanism with multiple hresholds is evaluated for counter-propagation architecture. The results indicate that the application of multiple thresholds has significant effect on recognition mechanism. The method is applicable for off-line character recognition as well. The technique is tested for upper-case English alphabets for a number of different styles from different peoples

Universiti Teknologi Malaysia Institutional Repository

A Multiple-Expert Binarization Framework for Multispectral Images

Author: Cheriet Mohamed
Moghaddam Reza Farrahi
Publication venue
Publication date: 26/08/2015
Field of study

In this work, a multiple-expert binarization framework for multispectral images is proposed. The framework is based on a constrained subspace selection limited to the spectral bands combined with state-of-the-art gray-level binarization methods. The framework uses a binarization wrapper to enhance the performance of the gray-level binarization. Nonlinear preprocessing of the individual spectral bands is used to enhance the textual information. An evolutionary optimizer is considered to obtain the optimal and some suboptimal 3-band subspaces from which an ensemble of experts is then formed. The framework is applied to a ground truth multispectral dataset with promising results. In addition, a generalization to the cross-validation approach is developed that not only evaluates generalizability of the framework, it also provides a practical instance of the selected experts that could be then applied to unseen inputs despite the small size of the given ground truth dataset.Comment: 12 pages, 8 figures, 6 tables. Presented at ICDAR'1

arXiv.org e-Print Archive

Crossref

A proposal for a coordinated effort for the determination of brainwide neuroanatomical connectivity in model organisms at a mesoscopic scale

Author: A MacKenzie-Graham
A Reiner
A Vercelli
A Visel
Allan Jones
AM Hattox
Arthur W. Toga
AW Toga
AY Hardan
B Egaas
B Horwitz
BL Davidson
Brett D. Mensh
Bruce W. Stillman
C Gustafson
C Kobbert
Caizhi Wu
CL Veenman
Claus C. Hilgetag
Clifford B. Saper
CR Gerfen
D Atasoy
DA Benson
Daniel G. Herrera
David C. Van Essen
David Kleinfeld
DC Van Essen
DC Van Essen
DL Sparks
E Miyashita
ED Jarvis
Edward G. Jones
EM Callaway
ES Lein
ET Bullmore
F Castelli
F Crick
G Aston-Jones
H Markram
Hans C. Breiter
Harvey J. Karten
HC Breiter
Helen Barbas
Hemant Bokil
Henry A. Lester
Hollis T. Cline
IR Wickersham
J DeFalco
J Dejerine
J Panksepp
J Panksepp
Jaak Panksepp
James D. Watson
Jason W. Bohland
JD Schmahmann
Jeremy D. Schmahmann
JF Démonet
JG Bjaalie
JG Bjaalie
JG Bjaalie
JG White
JL Lanciego
JM Lin
John C. Doyle
John M. Lin
Joseph L. Price
Joseph Safdieh
K Oishi
K Wernicke
Karel Svoboda
KE Stephan
KE Stephan
L Ng
L Stein
Larry W. Swanson
LM Coolen
M Bota
M Bota
M Bota
M Murias
MA Just
MD Johnson
MI Ekstrand
Michael Hawrylycz
Mihail Bota
MJ Swift
N Geschwind
Nicholas D. Schiff
O Sporns
Olaf Sporns
Partha P. Mitra
Peter J. Freed
PH Luppi
PJ Broser
R Kotter
R Kotter
Ralph J. Greenspan
RH Güting
RM Kelly
Rolf Kötter
RW Baughman
S Folstein
S Lillehaug
S Mikula
Shawn Mikula
Suzanne N. Haber
U Burgel
U Frith
V Grinevich
Z. Josh Huang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2009
Field of study

In this era of complete genomes, our knowledge of neuroanatomical circuitry remains surprisingly sparse. Such knowledge is however critical both for basic and clinical research into brain function. Here we advocate for a concerted effort to fill this gap, through systematic, experimental mapping of neural circuits at a mesoscopic scale of resolution suitable for comprehensive, brain-wide coverage, using injections of tracers or viral vectors. We detail the scientific and medical rationale and briefly review existing knowledge and experimental techniques. We define a set of desiderata, including brain-wide coverage; validated and extensible experimental techniques suitable for standardization and automation; centralized, open access data repository; compatibility with existing resources, and tractability with current informatics technology. We discuss a hypothetical but tractable plan for mouse, additional efforts for the macaque, and technique development for human. We estimate that the mouse connectivity project could be completed within five years with a comparatively modest budget.Comment: 41 page

Cold Spring Harbor Laboratory Institutional Repository

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

Caltech Authors

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

PubMed Central

Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities

Author: Beelen Kaspar
Kamps Jaap
Marx Maarten
Olieman Alex
van Lange Milan
Publication venue
Publication date: 01/01/2017
Field of study

Over the last decade we have made great progress in entity linking (EL) systems, but performance may vary depending on the context and, arguably, there are even principled limitations preventing a "perfect" EL system. This also suggests that there may be applications for which current "imperfect" EL is already very useful, and makes finding the "right" application as important as building the "right" EL system. We investigate the Digital Humanities use case, where scholars spend a considerable amount of time selecting relevant source texts. We developed WideNet; a semantically-enhanced search tool which leverages the strengths of (imperfect) EL without getting in the way of its expert users. We evaluate this tool in two historical case-studies aiming to collect a set of references to historical periods in parliamentary debates from the last two decades; the first targeted the Dutch Golden Age, and the second World War II. The case-studies conclude with a critical reflection on the utility of WideNet for this kind of research, after which we outline how such a real-world application can help to improve EL technology in general.Comment: Accepted for presentation at SEMANTiCS '1

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

"Q i-jtb the Raven": Taking Dirty OCR Seriously

Author: Ryan Cordell
Publication venue: 'Modern Language Association'
Publication date: 01/01/2017
Field of study

This article argues that scholars must understand mass digitized texts as assemblages of new editions, subsidiary editions, and impressions of their historical sources, and that these various parts require sustained bibliographic analysis and description. To adequately theorize any research conducted in large-scale text archives—including research that includes primary or secondary sources discovered through keyword search—we must avoid the myth of surrogacy proffered by page images and instead consider directly the text files they overlay. Focusing on the OCR (optical character recognition) from which most large-scale historical text data derives, this article argues that the results of this "automatic" process are in fact new editions of their source texts that offer unique insights into both the historical texts they remediate and the more recent era of their remediation. The constitution and provenance of digitized archives are, to some extent at least, knowable and describable. Just as details of type, ink, or paper, or paratext such as printer's records can help us establish the histories under which a printed book was created, details of format, interface, and even grant proposals can help us establish the histories of corpora created under conditions of mass digitization

Humanities Commons

Multi-community command and control systems in law enforcement: An introductory planning guide

Author: Garcia E. A.
Kennedy R. D.
Sohn R. L.
Publication venue
Publication date
Field of study

A set of planning guidelines for multi-community command and control systems in law enforcement is presented. Essential characteristics and applications of these systems are outlined. Requirements analysis, system concept design, implementation planning, and performance and cost modeling are described and demonstrated with numerous examples. Program management techniques and joint powers agreements for multicommunity programs are discussed in detail. A description of a typical multi-community computer-aided dispatch system is appended

NASA Technical Reports Server