257 research outputs found

    Current trends

    Get PDF
    Deep parsing is the fundamental process aiming at the representation of the syntactic structure of phrases and sentences. In the traditional methodology this process is based on lexicons and grammars representing roughly properties of words and interactions of words and structures in sentences. Several linguistic frameworks, such as Headdriven Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Tree Adjoining Grammar (TAG), Combinatory Categorial Grammar (CCG), etc., offer different structures and combining operations for building grammar rules. These already contain mechanisms for expressing properties of Multiword Expressions (MWE), which, however, need improvement in how they account for idiosyncrasies of MWEs on the one hand and their similarities to regular structures on the other hand. This collaborative book constitutes a survey on various attempts at representing and parsing MWEs in the context of linguistic theories and applications

    Representation and parsing of multiword expressions

    Get PDF
    This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches

    PARSEME Survey on MWE Resources

    Get PDF
    International audienceThis paper summarizes the first results of an ongoing survey on multiword resources carried out within the IC1207 Cost ActionPARSEME (PARSing and Multi-word Expressions). Despite the availability of language resource catalogues and the inventory ofmultiword data-sets available at the SIGLEX-MWE website, multiword resources are scattered and prove to be difficult to be found.In many cases, language resources such as corpora, treebanks or lexical databases include multiwords as part of their data or take theminto consideration in their annotations. However, it is needed to centralize these resources so that other researches may subsequentlyuse them. The final aim of this survey is thus to create a portal where researchers may find multiword resources or multiword-awarelanguage resources for their research. We report on how the survey was designed and analyze the data gathered so far. We also discussthe problems we have detected upon examination of the data and possible ways of enhancing the survey

    Promocijas darbs

    Get PDF
    Elektroniskā versija nesatur pielikumusPromocijas darba “Nacionālās identitātes veidošana un atspoguļojums Baltijas valstu prezidentu runās – korpusā balstīta kritiskā diskursa analīze” mērķis ir izpētīt, kā Baltijas valstu prezidentu runās atspoguļojas nacionālās identitātes diskursīvās konstrukcijas, proti, kādi valodas un diskursa makro- un mikrostruk-tūru elementi ir lietoti prezidentu retorikā, kādas ir to funkcijas un potenciālā ietekme uz runas mērķauditoriju. Izmantojot kvalitatīvo un kvantitatīvo metožu sinerģiju jeb korpusu pieeju un kritiskās diskursa analīzes vēsturisko pieeju, pētījumā veikta ne vien detalizēta runu satura, tematisko lauku, diskursīvo stratēģiju un lingvistisko paņēmienu analīze, bet arī analizēti korpusos balstītie statistiskie dati, kas palīdz detalizētāk izprast katra prezidenta lingvistisko pro-filu un lingvistisko paņēmienu izvēli. Veiktā komponentu analīze apliecina katra prezidenta multiplo identitāšu lingvistiskās iezīmes. Papildus veikta teorētisko avotu izpēte par pētījumā iekļautajiem aktuālajiem tematiem, kas veido prezi-dentu runu sociālpolitisko un vēsturisko kontekstu; pētījumā ir veiktas intervijas ar prezidentiem un prezidentu padomniekiem; savukārt, lai noskaidrotu pre-zidentu runu eksplicītos un implicītos mērķus un arī to potenciālo ietekmi uz klausītāju nacionālās identitātes veidošanu, darbā veiktas un apkopotas Latvijas iedzīvotāju viedokļu aptaujas.Atslēgvārdi: prezidentu runas, Baltijas valstis, nacionālā identitāte, kritiskās diskursa studijas, korpuslingvistikaThe goal of the dissertation ‘Construction and Representation of National Identity in the Speeches of the Presidents of the Baltic States: Corpus-Assisted Critical Discourse Analysis’ is to investigate the discursive construction of national identities in the presidential speeches of the Baltic States as well as their functions and potential impact on the target audience. By applying the synergy of qualitative and quantitative methods – corpus approach and the Discourse-Historical Approach to Critical Discourse Analysis, the study not only analyses the content of the speeches, including their thematic areas, discursive strategies, and linguistic means of realisation of these strategies but also provides statistical data and presents the analysis of the corpus data offering a detailed and objective insight into the individual linguistic profiles of the Presidents, their lexical choices, which point to the linguistic features of multiple identities constructed in the speeches. Additionally, the theoretical sources that pertain to understanding the socio-political and historical context influencing the content of the selected speeches have also been analysed, and interviews with the Presidents and their advisors, as well as opinion surveys with the target audience have been conducted to investigate the explicit and implicit goals of the speeches as well as their potential effect.Key words: presidential speeches, Baltic States, national identity, Critical Discourse Studies, Corpus Linguistic

    Formulaic language

    Get PDF
    The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective

    Theories and methods

    Get PDF
    The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective

    Automatic text summarization with Maximal Frequent Sequences

    Get PDF
    En las últimas dos décadas un aumento exponencial de la información electrónica ha provocado una gran necesidad de entender rápidamente grandes volúmenes de información. En este libro se desarrollan los métodos automáticos para producir un resumen. Un resumen es un texto corto que transmite la información más importante de un documento o de una colección de documentos. Los resúmenes utilizados en este libro son extractivos: una selección de las oraciones más importantes del texto. Otros retos consisten en generar resúmenes de manera independiente de lenguaje y dominio. Se describe la identificación de cuatro etapas para generación de resúmenes extractivos. La primera etapa es la selección de términos, en la que uno tiene que decidir qué unidades contarían como términos individuales. El proceso de estimación de la utilidad de los términos individuales se llama etapa de pesado de términos. El siguiente paso se denota como pesado de oraciones, donde todas las secuencias reciben alguna medida numérica de acuerdo con la utilidad de términos. Finalmente, el proceso de selección de las oraciones más importantes se llama selección de oraciones. Los diferentes métodos para generación de resúmenes extractivos pueden ser caracterizados como representan estas etapas. En este libro se describe la etapa de selección de términos, en la que la detección de descripciones multipalabra se realiza considerando Secuencias Frecuentes Maximales (sfms), las cuales adquieren un significado importante, mientras Secuencias Frecuentes (sf) no maximales, que son partes de otros sf, no deben de ser consideradas. En la motivación se consideró costo vs. beneficio: existen muchas sf no maximales, mientras que la probabilidad de adquirir un significado importante es baja. De todos modos, las sfms representan todas las sfs en el modo compacto: todas las sfs podrían ser obtenidas a partir de todas las sfms explotando cada sfm al conjunto de todas sus subsecuencias. Se presentan los nuevos métodos basados en grafos, algoritmos de agrupamiento y algoritmos genéticos, los cuales facilitan la tarea de generación de resúmenes de textos. Se ha experimentado diferentes combinaciones de las opciones de selección de términos, pesado de términos, pesado de oraciones y selección de oraciones para generar los resúmenes extractivos de textos independientes de lenguaje y dominio para una colección de noticias. Se ha analizado algunas opciones basadas en descripciones multipalabra considerándolas en los métodos de grafos, algoritmos de agrupamiento y algoritmos genéticos. Se han obtenido los resultados superiores al de estado de arte. Este libro está dirigido a los estudiantes y científicos del área de Lingüística Computacional, y también a quienes quieren saber sobre los recientes avances en las investigaciones de generación automática de resúmenes de textos.In the last two decades, an exponential increase in the available electronic information causes a big necessity to quickly understand large volumes of information. It raises the importance of the development of automatic methods for detecting the most relevant content of a document in order to produce a shorter text. Automatic Text Summarization (ats) is an active research area dedicated to generate abstractive and extractive summaries not only for a single document, but also for a collection of documents. Other necessity consists in finding method for ats in a language and domain independent way. In this book we consider extractive text summarization for single document task. We have identified that a typical extractive summarization method consists in four steps. First step is a term selection where one should decide what units will count as individual terms. The process of estimating the usefulness of the individual terms is called term weighting step. The next step denotes as sentence weighting where all the sentences receive some numerical measure according to the usefulness of its terms. Finally, the process of selecting the most relevant sentences calls sentence selection. Different extractive summarization methods can be characterized how they perform these steps. In this book, in the term selection step, we describe how to detect multiword descriptions considering Maximal Frequent Sequences (mfss), which bearing important meaning, while non-maximal frequent sequences (fss), those that are parts of another fs, should not be considered. Our additional motivation was cost vs. benefit considerations: there are too many non-maximal fss while their probability to bear important meaning is lower. In any case, mfss represent all fss in a compact way: all fss can be obtained from all mfss by bursting each mfs into a set of all its subsequences.New methods based on graph algorithms, genetic algorithms, and clustering algorithms which facilitate the text summarization task are presented. We have tested different combinations of term selection, term weighting, sentence weighting and sentence selection options for language-and domain-independent extractive single-document text summarization on a news report collection. We analyzed several options based on mfss, considering them with graph, genetic, and clustering algorithms. We obtained results superior to the existing state-ofthe- art methods. This book is addressed for students and scientists of the area of Computational Linguistics, and also who wants to know recent developments in the area of Automatic Text Generation of Summaries

    Proximity and impact of university-industry collaborations. A topic detection analysis of impact reports

    Get PDF
    The probability to initiate university-industry collaborations (UICs), their intensity and quality, are influenced by the proximity between the collaboration partners. However, little is known about the relationship between collaborators' proximity and impact of UICs. Building on an original database of 415 UICs in the United Kingdom, we analyse the association between collaborators' proximity and the extent to which UICs generate economic, social and knowledge impact. We find that geographical and institutional proximity are substitutes in relation to economic impact, cognitive and institutional proximity are substitutes in relation to knowledge impact, and social impact is associated with cognitive and institutional distance

    Full Issue

    Get PDF
    corecore