56 research outputs found
Inquiries into the lexicon-syntax relations in Basque
Index:- Foreword. B. Oyharçabal.- Morphosyntactic disambiguation and shallow parsing in computational processing in Basque. I. Aduriz, A. Díaz de Ilarraza.- The transitivity of borrowed verbs in Basque: an outline. X. Alberdi.- Patrixa: a unification-based parser for Basque and its application to the automatic analysis of verbs. I. Aldezabal, M. J. Aranzabe, A. Atutxa, K.Gojenola, K, Sarasola.- Learning argument/adjunct distinction for Basque. I. Aldezabal, M. J. Aranzabe, K. Gojenola, K, Sarasola, A. Atutxa.- Analyzing verbal subcategorization aimed at its computation application. I. Aldezabal, P. Goenaga.- Automatic extraction of verb paterns from “hauta-lanerako euskal hiztegia”. J. M. Arriola, X. Artola, A. Soroa.- The case of an enlightening, provoking an admirable Basque derivational siffux with implications for the theory of argument structure. X. Artiagoitia.- Verb-deriving processes in Basque. J. C. Odriozola.- Lexical causatives and causative alternation in Basque. B. Oyharçabal.- Causation and semantic control; diagnosis of incorrect use in minorized languages. I. Zabala.- Subject index.- Contributions
Borrowings, Derivational Morphology, and Perceived Productivity in English, 1300-1600.
This dissertation examines how borrowed derivational morphemes such as -age, -ity, -cion, and -ment became productive in the English language, particularly in the
fourteenth through sixteenth centuries. It endeavors to expand our current understanding of morphological productivity as a historical phenomenon--to account for not only aggregate quantitative measures of the products of morphological processes, but also some of the linguistic mechanisms that made those processes more productive for language users. Judgments about the productivity of different suffixes in the late ME period cannot be made on counts of frequency alone, since the vast majority of uses were not neologisms or newly coined hybrid forms but rather borrowings from Latin and French. It is not immediately clear to the historical linguist if Middle English speakers perceived a derivative such as enformacion as an undecomposable word or as a morphologically complex word. By examining usage patterns of these derivatives in guild records, the Wycliffite Bible, end-rhymed poetry, medical texts, and personal correspondence, this project argues that several mechanisms helped contribute to the increased transparency and perceived productivity of these affixes. These mechanisms include the following: the use of rhetorical sequences of derivatives with the same base or derivatives ending in the same suffix; the frequent use of derivatives as end rhymes in poetry; the lexical variety of derivatives ending in the same suffix; and the more frequent use of certain bases compared to their derivatives. All of these textual and linguistic features increased readers' and listeners' ability to analyze borrowed derivatives as suffixed words. Ultimately, the dissertation finds that several borrowed affixes were seen as potentially productive units of language in the late ME period, though some were seen as more productive than others in different discourses and contexts. It also emphasizes the value of register studies for understanding the specific motivations for the use of borrowed derivatives in different discourses, as well as the morphological consequences of salient usage patterns within different registers.Ph.D.English Language & LiteratureUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/64624/1/palmercc_1.pd
Ontological Approach for Semantic Modelling of Malay Translated Qur’an
This thesis contributes to the areas of ontology development and analysis, natural language processing (NLP), Information Retrieval (IR), and Language Resource and Corpus Development. Research in Natural Language Processing and semantic search for English has shown successful results for more than a decade. However, it is difficult to adapt those techniques to the Malay language, because its complex morphology and orthographic forms are very different from English. Moreover, limited resources and tools for computational linguistic analysis are available for Malay. In this thesis, we address those issues and challenges by proposing MyQOS, the Malay Qur’an Ontology System, a prototype ontology-based IR with semantics for representing and accessing a Malay translation of the Qur’an. This supports the development of a semantic search engine and a question answering system and provides a framework for storing and accessing a Malay language corpus and providing computational linguistics resources. The primary use of MyQOS in the current research is for creating and improving the quality and accuracy of the query mechanism to retrieve information embedded in the Malay text of the Qur’an translation. To demonstrate the feasibility of this approach, we describe a new architecture of morphological analysis for MyQOS and query algorithms based on MyQOS. Data analysis consisted of two measures; precision and recall, where data was obtained from MyQOS Corpus conducted in three search engines. The precision and recall for semantic search are 0.8409 (84%) and 0.8043(80%), double the results of the question-answer search which are 0.4971(50%) for precision and 0.6027 (60%) for recall. The semantic search gives high precision and high recall comparing the other two methods. This indicates that semantic search returns more relevant results than irrelevant ones. To conclude, this research is among research in the retrieval of the Qur’an texts in the Malay language that managed to outline state-of-the-art information retrieval system models. Thus, the use of MyQOS will help Malay readers to understand the Qur’an in better ways. Furthermore, the creation of a Malay language corpus and computational linguistics resources will benefit other researchers, especially in religious texts, morphological analysis, and semantic modelling
FrameNet annotation for multimodal corpora: devising a methodology for the semantic representation of text-image interactions in audiovisual productions
Multimodal analyses have been growing in importance within several approaches to
Cognitive Linguistics and applied fields such as Natural Language Understanding. Nonetheless
fine-grained semantic representations of multimodal objects are still lacking, especially in terms
of integrating areas such as Natural Language Processing and Computer Vision, which are key
for the implementation of multimodality in Computational Linguistics. In this dissertation, we
propose a methodology for extending FrameNet annotation to the multimodal domain, since
FrameNet can provide fine-grained semantic representations, particularly with a database
enriched by Qualia and other interframal and intraframal relations, as it is the case of FrameNet
Brasil. To make FrameNet Brasil able to conduct multimodal analysis, we outlined the
hypothesis that similarly to the way in which words in a sentence evoke frames and organize
their elements in the syntactic locality accompanying them, visual elements in video shots may,
also, evoke frames and organize their elements on the screen or work complementarily with the
frame evocation patterns of the sentences narrated simultaneously to their appearance on screen,
providing different profiling and perspective options for meaning construction. The corpus
annotated for testing the hypothesis is composed of episodes of a Brazilian TV Travel Series
critically acclaimed as an exemplar of good practices in audiovisual composition. The TV genre
chosen also configures a novel experimental setting for research on integrated image and text
comprehension, since, in this corpus, text is not a direct description of the image sequence but
correlates with it indirectly in a myriad of ways. The dissertation also reports on an eye-tracker
experiment conducted to validate the approach proposed to a text-oriented annotation. The
experiment demonstrated that it is not possible to determine that text impacts gaze directly and
was taken as a reinforcement to the approach of valorizing modes combination. Last, we present
the Frame2 dataset, the product of the annotation task carried out for the corpus following both
the methodology and guidelines proposed. The results achieved demonstrate that, at least for
this TV genre but possibly also for others, a fine-grained semantic annotation tackling the
diverse correlations that take place in a multimodal setting provides new perspective in
multimodal comprehension modeling. Moreover, multimodal annotation also enriches the
development of FrameNets, to the extent that correlations found between modalities can attest
the modeling choices made by those building frame-based resources.Análises multimodais vêm crescendo em importância em várias abordagens da
Linguística Cognitiva e em diversas áreas de aplicação, como o da Compreensão de Linguagem
Natural. No entanto, há significativa carência de representações semânticas refinadas de objetos
multimodais, especialmente em termos de integração de áreas como Processamento de
Linguagem Natural e Visão Computacional, que são fundamentais para a implementação de
multimodalidade no campo da Linguística Computacional. Nesta tese, propomos uma
metodologia para estender o método de anotação da FrameNet ao domínio multimodal, uma
vez que a FrameNet pode fornecer representações semânticas refinadas, particularmente com
um banco de dados enriquecido por Qualia e outras relações interframe e intraframe, como é o
caso do FrameNet Brasil. Para tornar a FrameNet Brasil capaz de realizar análises multimodais,
delineamos a hipótese de que, assim como as palavras em uma frase evocam frames e
organizam seus elementos na localidade sintática que os acompanha, os elementos visuais nos
planos de vídeo também podem evocar frames e organizar seus elementos na tela ou trabalhar
de forma complementar aos padrões de evocação de frames das sentenças narradas
simultaneamente ao seu aparecimento na tela, proporcionando diferentes perfis e opções de
perspectiva para a construção de sentido. O corpus anotado para testar a hipótese é composto
por episódios de um programa televisivo de viagens brasileiro aclamado pela crítica como um
exemplo de boas práticas em composição audiovisual. O gênero televisivo escolhido também
configura um novo conjunto experimental para a pesquisa em imagem integrada e compreensão
textual, uma vez que, neste corpus, o texto não é uma descrição direta da sequência de imagens,
mas se correlaciona com ela indiretamente em uma miríade de formas diversa. A Tese também
relata um experimento de rastreamento ocular realizado para validar a abordagem proposta para
uma anotação orientada por texto. O experimento demonstrou que não é possível determinar
que o texto impacta diretamente o direcionamento do olhar e foi tomado como um reforço para
a abordagem de valorização da combinação de modos. Por fim, apresentamos o conjunto de
dados Frame2, produto da tarefa de anotação realizada para o corpus seguindo a metodologia e
as diretrizes propostas. Os resultados obtidos demonstram que, pelo menos para esse gênero de
TV, mas possivelmente também para outros, uma anotação semântica refinada que aborde as
diversas correlações que ocorrem em um ambiente multimodal oferece uma nova perspectiva
na modelagem da compreensão multimodal. Além disso, a anotação multimodal também
enriquece o desenvolvimento de FrameNets, na medida em que as correlações encontradas entre
as modalidades podem atestar as escolhas de modelagem feitas por aqueles que criam recursos
baseados em frames.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio
Proceedings of the VIIth GSCP International Conference
The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)
Exploring formal models of linguistic data structuring. Enhanced solutions for knowledge management systems based on NLP applications
2010 - 2011The principal aim of this research is describing to which extent formal models for linguistic data structuring are crucial in Natural Language Processing (NLP) applications. In this sense, we will pay particular attention to those Knowledge Management Systems (KMS) which are designed for the Internet, and also to the enhanced solutions they may require. In order to appropriately deal with this topics, we will describe how to achieve computational linguistics applications helpful to humans in establishing and maintaining an advantageous relationship with technologies, especially with those technologies which are based on or produce man-machine interactions in natural language.
We will explore the positive relationship which may exist between well-structured Linguistic Resources (LR) and KMS, in order to state that if the information architecture of a KMS is based on the formalization of linguistic data, then the system works better and is more consistent.
As for the topics we want to deal with, frist of all it is indispensable to state that in order to structure efficient and effective Information Retrieval (IR) tools, understanding and formalizing natural language combinatory mechanisms seems to be the first operation to achieve, also because any piece of information produced by humans on the Internet is necessarily a linguistic act. Therefore, in this research work we will also discuss the NLP structuring of a linguistic formalization Hybrid Model, which we hope will prove to be a useful tool to support, improve and refine KMSs.
More specifically, in section 1 we will describe how to structure language resources implementable inside KMSs, to what extent they can improve the performance of these systems and how the problem of linguistic data structuring is dealt with by natural language formalization methods.
In section 2 we will proceed with a brief review of computational linguistics, paying particular attention to specific software packages such Intex, Unitex, NooJ, and Cataloga, which are developed according to Lexicon-Grammar (LG) method, a linguistic theory established during the 60’s by Maurice Gross.
In section 3 we will describe some specific works useful to monitor the state of the art in Linguistic Data Structuring Models, Enhanced Solutions for KMSs, and NLP Applications for KMSs.
In section 4 we will cope with problems related to natural language formalization methods, describing mainly Transformational-Generative Grammar (TGG) and LG, plus other methods based on statistical approaches and ontologies.
In section 5 we will propose a Hybrid Model usable in NLP applications in order to create effective enhanced solutions for KMSs. Specific features and elements of our hybrid model will be shown through some results on experimental research work. The case study we will present is a very complex NLP problem yet little explored in recent years, i.e. Multi Word Units (MWUs) treatment.
In section 6 we will close our research evaluating its results and presenting possible future work perspectives. [edited by author]X n.s
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Community’s Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by Consellería
de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
- …