Search CORE

25 research outputs found

Programming Language Techniques for Natural Language Applications

Author: Bringert Björn
Publication venue
Publication date: 01/01/2008
Field of study

It is easy to imagine machines that can communicate in natural language. Constructing such machines is more difficult. The aim of this thesis is to demonstrate how declarative grammar formalisms that distinguish between abstract and concrete syntax make it easier to develop natural language applications. We describe how the type-theorectical grammar formalism Grammatical Framework (GF) can be used as a high-level language for natural language applications. By taking advantage of techniques from the field of programming language implementation, we can use GF grammars to perform portable and efficient parsing and linearization, generate speech recognition language models, implement multimodal fusion and fission, generate support code for abstract syntax transformations, generate dialogue managers, and implement speech translators and web-based syntax-aware editors. By generating application components from a declarative grammar, we can reduce duplicated work, ensure consistency, make it easier to build multilingual systems, improve linguistic quality, enable re-use across system domains, and make systems more portable

CiteSeerX

Göteborgs universitets publikationer - e-publicering och e-arkiv

Türkçe için Ba˘glam Tabanlı Otomatik Yazım Düzeltme

Author: Bolucu Necva
Can Burcu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/03/2019
Field of study

This is an accepted manuscript of an article published by IEEE in 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT) on 20/06/2019, available online: https://ieeexplore.ieee.org/document/8742067 The accepted version of the publication may differ from the final published version.Spelling errors are one of the crucial problems to be addressed in Natural Language Processing tasks. In this study, a context-based automatic spell correction method for Turkish texts is presented. The method combines the Noisy Channel Model with Hidden Markov Models to correct a given word. This study deviates from the other studies by also considering the contextual information of the word within the sentence. The proposed method is aimed to be integrated to other word-based spelling correction models.Published versio

Wolverhampton Intellectual Repository and E-theses

A real time Named Entity Recognition system for Arabic text mining

Author: Aljumaily Harith Taha Abdulla
Martínez Fernández José Luis
Martínez Paloma
Van Der Goot E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Arabic is the most widely spoken language in the Arab World. Most people of the Islamic World understand the Classic Arabic language because it is the language of the Qur'an. Despite the fact that in the last decade the number of Arabic Internet users (Middle East and North and East of Africa) has increased considerably, systems to analyze Arabic digital resources automatically are not as easily available as they are for English. Therefore, in this work, an attempt is made to build a real time Named Entity Recognition system that can be used in web applications to detect the appearance of specific named entities and events in news written in Arabic. Arabic is a highly inflectional language, thus we will try to minimize the impact of Arabic affixes on the quality of the pattern recognition model applied to identify named entities. These patterns are built up by processing and integrating different gazetteers, from DBPedia (http://dbpedia.org/About, 2009) to GATE (A general architecture for text engineering, 2009) and ANERGazet (http://users.dsic.upv.es/grupos/nle/?file=kop4.php).This work has been partially supported by the Spanish Center for Industry Technological Development (CDTI, Ministry of Industry, Tourism and Trade), through the BUSCAMEDIA Project (CEN-20091026), and also by the Spanish research projects: MA2VICMR: Improving the access, analysis and visibility of the multilingual and multimedia information in web for the Region of Madrid (S2009/TIC-1542), and MULTIMEDICA: Multilingual Information Extraction in Health domain and application to scientific and informative documents (TIN2010-20644-C03-01). The authors would like also to thank the IPSC of the European Commission’s Joint Research Centre for allowing us to include the EMM search engine in our system.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Um Sistema de Realização Superficial para Geração de Textos em Português

Author: da Silva Junior Douglas Fernandes Pereira
Novais Eder Miranda de
Paraboni Ivandre
Publication venue: 'Universidade Federal do Rio Grande do Sul'
Publication date: 04/11/2013
Field of study

Sistemas de geração de língua natural (GLN) - que produzem texto a partir de dados não-linguísticos - possuem uma ampla gama de aplicações em visualização textual de conteúdos complexos e/ou em grandes volumes. Este trabalho enfoca a implementação de um módulo de realização textual baseado em regras para o português brasileiro, chamado PortNLG, que trata da tarefa de linearização sentencial para aplicações computacionais que necessitem apresentar dados de saída em formato textual. PortNLG é apresentado na forma de uma biblioteca JAVA, e seus resultados são superiores aos de modelos de n-gramas na tarefa de geração de manchetes de jornal

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

Collaborative multilingual knowledge management based on controlled natural language

Author: Canedo L.
Kaljurand K.
Kuhn T.
Publication venue
Publication date
Field of study

Unsupervised joint PoS tagging and stemming for agglutinative languages

Author: B Merialdo
G Adam
J Goldsmith
J Xu
JH Paik
MF Porter
MP Marcus
PF Brown
S Geman
T Brychcín
U Mishra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/05/2017
Field of study

This is an accepted manuscript of an article published by Association for Computing Machinery (ACM) in ACM Transactions on Asian and Low-Resource Language Information Processing on 25/01/2019, available online: https://doi.org/10.1145/3292398 The accepted version of the publication may differ from the final published version.The number of possible word forms is theoretically infinite in agglutinative languages. This brings up the out-of-vocabulary (OOV) issue for part-of-speech (PoS) tagging in agglutinative languages. Since inflectional morphology does not change the PoS tag of a word, we propose to learn stems along with PoS tags simultaneously. Therefore, we aim to overcome the sparsity problem by reducing word forms into their stems. We adopt a Bayesian model that is fully unsupervised. We build a Hidden Markov Model for PoS tagging where the stems are emitted through hidden states. Several versions of the model are introduced in order to observe the effects of different dependencies throughout the corpus, such as the dependency between stems and PoS tags or between PoS tags and affixes. Additionally, we use neural word embeddings to estimate the semantic similarity between the word form and stem. We use the semantic similarity as prior information to discover the actual stem of a word since inflection does not change the meaning of a word. We compare our models with other unsupervised stemming and PoS tagging models on Turkish, Hungarian, Finnish, Basque, and English. The results show that a joint model for PoS tagging and stemming improves on an independent PoS tagger and stemmer in agglutinative languages.This research is supported by the Scientific and Technological Research Council of Turkey (TUBITAK) with the project number EEEAG-115E464.Published versio

arXiv.org e-Print Archive

Crossref

Wolverhampton Intellectual Repository and E-theses