Search CORE

1,233 research outputs found

Learning morphology with Morfette

Author: Chrupała Grzegorz
Dinu Georgiana
van Genabith Josef
Publication venue
Publication date: 01/01/2008
Field of study

Morfette is a modular, data-driven, probabilistic system which learns to perform joint morphological tagging and lemmatization from morphologically annotated corpora. The system is composed of two learning modules which are trained to predict morphological tags and lemmas using the Maximum Entropy classifier. The third module dynamically combines the predictions of the Maximum-Entropy models and outputs a probability distribution over tag-lemma pair sequences. The lemmatization module exploits the idea of recasting lemmatization as a classification task by using class labels which encode mappings from wordforms to lemmas. Experimental evaluation results and error analysis on three morphologically rich languages show that the system achieves high accuracy with no language-specific feature engineering or additional resources

CiteSeerX

DCU Online Research Access Service

Tagset Reductions in Morphosyntactic Tagging of Croatian Texts

Author: Agić Željko
Dovedan Zdravko
Tadić Marko
Publication venue: Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb
Publication date: 01/11/2009
Field of study

Morphosyntactic tagging of Croatian texts is performed with stochastic taggersby using a language model built on a manually annotated corpus implementingthe Multext East version 3 specifications for Croatian. Tagging accuracy in thisframework is basically predefined, i.e. proportionally dependent of two things:the size of the training corpus and the number of different morphosyntactic tagsencompassed by that corpus. Being that the 100 kw Croatia Weekly newspapercorpus by definition makes a rather small language model in terms of stochastictagging of free domain texts, the paper presents an approach dealing withtagset reductions. Several meaningful subsets of the Croatian Multext-East version3 morphosyntactic tagset specifications are created and applied on Croatiantexts with the CroTag stochastic tagger, measuring overall tagging accuracyand F1-measures. Obtained results are discussed in terms of applying differentreductions in different natural language processing systems and specifictasks defined by specific user requirements

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Digitalni arhiv Filozofskog fakulteta u Zagrebu

Conference Program [2006]

Author: Georgia International Conference on Information Literacy
Publication venue: Digital Commons@Georgia Southern
Publication date: 01/01/2006
Field of study

Georgia Southern University: Digital Commons@Georgia Southern

Conference Program [2006]

Author: Georgia International Conference on Information Literacy
Publication venue: Digital Commons@Georgia Southern
Publication date: 01/01/2006
Field of study

Georgia Southern University: Digital Commons@Georgia Southern

Data sparsity in highly inflected languages: the case of morphosyntactic tagging in Polish

Author: Ustaszewski Michael
Publication venue
Publication date: 01/01/2016
Field of study

In morphologically complex languages, many high-level tasks in natural language processing rely on accurate morphosyntactic analyses of the input. However, in light of the risk of error propagation in present-day pipeline architectures for basic linguistic pre-processing, the state of the art for morphosyntactic tagging is still not satisfactory. The main obstacle here is data sparsity inherent to natural lan- guage in general and highly inflected languages in particular. In this work, we investigate whether semi-supervised systems may alleviate the data sparsity problem. Our approach uses word clusters obtained from large amounts of unlabelled text in an unsupervised manner in order to provide a su- pervised probabilistic tagger with morphologically informed features. Our evalua- tions on a number of datasets for the Polish language suggest that this simple technique improves tagging accuracy, especially with regard to out-of-vocabulary words. This may prove useful to increase cross-domain performance of taggers, and to alleviate the dependency on large amounts of supervised training data, which is especially important from the perspective of less-resourced languages

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Digital Museum Consortia: A Prototype for Interconnected and Accessible Database Design

Author: Heller Ben
Publication venue: RIT Scholar Works
Publication date: 19/05/2015
Field of study

The evolution of the internet and devices allowing access to it indicate that users trend toward networking and interconnectivity in their daily lives. Museums have started to tread into this territory—that is, crafting, managing, and maintaining an effective internet presence and ancillary content tools—on their own. However, many museums still rely upon the earliest types of education and interpretation tools, such as audio tours and recordings that address content from one collection. Moving beyond a single institution’s holdings, a shared database of museum content including photos of artifacts and objects, historic documents, and videos would allow users to examine pieces they enjoy and to find similar works at other locations. A single application providing museum collection capabilities and visitor access would benefit both sides. To support this claim, this thesis first provides a literature review of application use in museums that is supplemented by statistics of visitor use of museum mobile offerings. This historical overview yields a list of needs, interests, and obstacles to such an interconnective model. The third section constitutes the building blocks of such a model: database design, application design, and a web-accessible mirror site which are visualized in the prototyped content. The fourth section hypothesizes the future and expected impact of a shared network topology

RIT Scholar Works

Recommended from our members

Towards a People's Social Epidemiology: Envisioning a More Inclusive and Equitable Future for Social Epi Research and Practice in the 21st Century.

Author: Allen Amani
Morello-Frosch Rachel
Mujahid Mahasin
Petteway Ryan
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

Social epidemiology has made critical contributions to understanding population health. However, translation of social epidemiology science into action remains a challenge, raising concerns about the impacts of the field beyond academia. With so much focus on issues related to social position, discrimination, racism, power, and privilege, there has been surprisingly little deliberation about the extent and value of social inclusion and equity within the field itself. Indeed, the challenge of translation/action might be more readily met through re-envisioning the role of the people within the research/practice enterprise-reimagining what "social" could, or even should, mean for the future of the field. A potential path forward rests at the nexus of social epidemiology, community-based participatory research (CBPR), and information and communication technology (ICT). Here, we draw from social epidemiology, CBPR, and ICT literatures to introduce A People's Social Epi-a multi-tiered framework for guiding social epidemiology in becoming more inclusive, equitable, and actionable for 21st century practice. In presenting this framework, we suggest the value of taking participatory, collaborative approaches anchored in CBPR and ICT principles and technological affordances-especially within the context of place-based and environmental research. We believe that such approaches present opportunities to create a social epidemiology that is of, with, and by the people-not simply about them. In this spirit, we suggest 10 ICT tools to "socialize" social epidemiology and outline 10 ways to move towards A People's Social Epi in practice

eScholarship - University of California

PDXScholar (Portland State University)

Results from the Relativistic Heavy Ion Collider

Author: Berndt Müller
Blaizot JP
Hagedorn R
Huovinen P
Ichimaru S
James L. Nagle
Kharzeev D
Klimov VV
Landau LD
Morrin R
Shuryak EV
Publication venue: 'Annual Reviews'
Publication date: 09/02/2006
Field of study

We describe the current status of the heavy ion research program at the Relativistic Heavy Ion Collider (RHIC). The new suite of experiments and the collider energies have opened up new probes of the medium created in the collisions. Our review focuses on the experimental discoveries to date at RHIC and their interpretation in the light of our present theoretical understanding of the dynamics of relativistic heavy ion collisions and of the structure of strongly interacting matter at high energy density.Comment: 47 pages, 10 figures, submitted to Annual Review of Nuclear and Particle Science. The authors invite and appreciate feedback about possible errors and/or inconsistencies in the manuscrip

arXiv.org e-Print Archive

Crossref

CERN Document Server

A gloss composition and context clustering based distributed word sense representation model

Author: Bengio
Collobert
Ruifeng Xu
Shaoul
Tao Chen
Xuan Wang
Yulan He
Publication venue: 'MDPI AG'
Publication date: 01/08/2015
Field of study

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning sentence-level embeddings from WordNet glosses using a convolutional neural networks. The initialized word sense embeddings are used by a context clustering based model to generate the distributed representations of word senses. Our learned representations outperform the publicly available embeddings on half of the metrics in the word similarity task, 6 out of 13 sub tasks in the analogical reasoning task, and gives the best overall accuracy in the word sense effect classification task, which shows the effectiveness of our proposed distributed distribution learning model

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

Aston Publications Explorer