13 research outputs found

    In search of knowledge: text mining dedicated to technical translation

    Get PDF
    Articolo pubblicato su CD e commercializzato direttamente dall'ASLIB (http://shop.emeraldinsight.com/product_info.htm/cPath/56_59/products_id/431). Programma del convegno su http://aslib.co.uk/conferences/tc_2011/programme.htm

    Exploring formal models of linguistic data structuring. Enhanced solutions for knowledge management systems based on NLP applications

    Get PDF
    2010 - 2011The principal aim of this research is describing to which extent formal models for linguistic data structuring are crucial in Natural Language Processing (NLP) applications. In this sense, we will pay particular attention to those Knowledge Management Systems (KMS) which are designed for the Internet, and also to the enhanced solutions they may require. In order to appropriately deal with this topics, we will describe how to achieve computational linguistics applications helpful to humans in establishing and maintaining an advantageous relationship with technologies, especially with those technologies which are based on or produce man-machine interactions in natural language. We will explore the positive relationship which may exist between well-structured Linguistic Resources (LR) and KMS, in order to state that if the information architecture of a KMS is based on the formalization of linguistic data, then the system works better and is more consistent. As for the topics we want to deal with, frist of all it is indispensable to state that in order to structure efficient and effective Information Retrieval (IR) tools, understanding and formalizing natural language combinatory mechanisms seems to be the first operation to achieve, also because any piece of information produced by humans on the Internet is necessarily a linguistic act. Therefore, in this research work we will also discuss the NLP structuring of a linguistic formalization Hybrid Model, which we hope will prove to be a useful tool to support, improve and refine KMSs. More specifically, in section 1 we will describe how to structure language resources implementable inside KMSs, to what extent they can improve the performance of these systems and how the problem of linguistic data structuring is dealt with by natural language formalization methods. In section 2 we will proceed with a brief review of computational linguistics, paying particular attention to specific software packages such Intex, Unitex, NooJ, and Cataloga, which are developed according to Lexicon-Grammar (LG) method, a linguistic theory established during the 60’s by Maurice Gross. In section 3 we will describe some specific works useful to monitor the state of the art in Linguistic Data Structuring Models, Enhanced Solutions for KMSs, and NLP Applications for KMSs. In section 4 we will cope with problems related to natural language formalization methods, describing mainly Transformational-Generative Grammar (TGG) and LG, plus other methods based on statistical approaches and ontologies. In section 5 we will propose a Hybrid Model usable in NLP applications in order to create effective enhanced solutions for KMSs. Specific features and elements of our hybrid model will be shown through some results on experimental research work. The case study we will present is a very complex NLP problem yet little explored in recent years, i.e. Multi Word Units (MWUs) treatment. In section 6 we will close our research evaluating its results and presenting possible future work perspectives. [edited by author]X n.s

    Semantic Technologies for Business Decision Support

    Get PDF
    2015 - 2016In order to improve and to be competitive, enterprises should know how to get opportunities coming from data provided from the Web. The strategic vision implies a high level of communication sharing and the integration of practices across every business level. This not means that enterprises need a disruptive change in informative systems, but the conversion of them, reusing existent business data and integrating new data. However, data is heterogeneous, and so to maximise the value of the data it is necessary to extract meaning from it considering the context in which they evolve. The proliferation of new linguistic data linked to the growth of textual resources on the Web generate an inadequacy in the analysis and integration phases of data in the enterprise. Thus, the use of Semantic Technologies based on Natural Language Processing (NLP) applications is required in advance. This study arises as a first approach to the development of a document-driven Decision Support System (DSS) based on NLP technology within the theoretical framework of Lexicon-Grammar by Maurice Gross. Our research project has two main objectives: the first is to recognize and codify the innovative language with which the companies express and describe their business, in order to standardize it and make it actionable by machine. The second one aims to use information resulting from the text analysis to support strategic decisions, considering that through Text Mining analysis we can capture the hidden meaning in business documents. In the first chapter we examine the concept, characteristics and different types of DSS (with particular reference to document-driven analysis) and changes that these systems have experienced with web development and consequently of information systems within companies. In the second chapter, we proceed with a brief review of Computational Linguistics, paying particular attention to goals, resources and applications. In the third chapter, we provide a state-of-the-art of Semantic Technology Enterprises (STEs) and their process of integration in the innovation market, analysing the diffusion, the types of technologies and main sectors in which they operate. In the fourth chapter, we propose a model of linguistic support and analysis, according with Lexicon-Grammar, in order to create an enriched solution for document-driven decision systems: we provide specific features of business language, resulted from experimental research work in the startup ecosystem. Finally, we recognize that the formalization of all linguistic phenomena is extremely complex, but the results of analysis make us hopeful to continue with this line of research. Applying linguistic support to the business technological environment provides results that are more efficient and in constantly updated innovating even in strong resistance to change conditions. [edited by author]XV n.s

    A particle-centred approach on italian verb-particle constructions

    Get PDF
    2010 - 2011The following doctoral thesis, titled “A particle-centred approach on Italian Verb Particle Constructions” (hereinafter VPCs) aims at showing that the particle characterizing Italian Phrasal verbs such as su (up), giù (down), fuori (out), dentro (in/inside) and so on plays a key role in the constructions both syntactically and semantically. The framework adopted is based on the main syntactic theories developed by Z.Harris (1976) as well as on the Lexicon Grammar Method as pointed out by M. Gross (1981). I will suggest, gradually during the dissertation, that the spatial, aspectual and metaphorical meaning of a large portion of Italian VPCs such as scappare fuori di casa (to escape out of the house), tirare via un chiodo (to pull out the nail from the wall), mettere dentro il ladro (to put the thief inside), portare avanti un progetto (to carry out a project), tagliare fuori qualcuno da un discorso (to cut sb.out from a discussion) - are embedded only into the particle slot as the head verb can vary into a finite range of possibilities or it not occur at all. The head verb is in other words ‘week’ while the particle represents the powerful element (or ‘operator’ in lexicon-grammar terms) so that it cannot be considered a small added element (lat. ‘particula’): the particle affects the argument structure of the verb and carries the aspectual or spatial or idiomatic meaning. Moreover its syntactic autonomy is demonstrated by the fact that it can also occur without the head verb, in sentences such as su le mani (hand up), via di qui (away from here), fuori i soldi (money out), giù il governo (down with the government), Lazio avanti (Lazio ahead) that are defined “verbless particle constructions”. The thesis provides an in depth syntactic and semantic analysis of Italian VPCs, with interesting evidence from dictionaries and corpora, stressing the need to substitute the traditional “Verbocentrism” with an original Particle-Centred Approach. Finally the theoretical and applicative implications of such a change of perspective are pointed out. [edited by author]La tesi analizza i verbi sintagmatici dell’italiano – come tirare su, andare avanti, fare fuori - sulla base di un approccio trasformazionalista e distribuzionalista di matrice harrisiana (Harris 1976) e di una metodologia empirica di chiara derivazione grossiana (Gross 1992, Elia 2013). Un primo lemmario di più di 700 lemmi Verbo + particella locativa è stato collezionato a partire da una decina di opere lessicografiche. La nozione di lemma è stata poi sostituita con quella di ‘uso lessicale’ permettendo di distinguere con un costante processo di ‘moltiplicazione delle entrate’ due macroclassi ci costruzioni, composizionali e idiomatiche, per un totale di circa 800 usi lessicali diversi. Le costruzioni idiomatiche di tipo transitivo (213 entrate) sono state poi classificate entro nove distinte classi lessico-grammaticali. Queste costruzioni sono stata proiettate sul corpus LIP al fine di verificare empiricamente la presenza e la distribuzione di frequenza delle costruzioni V + particella (sia composizionali che idiomatiche) nel parlato dell’italiano. L’esplosione di usi idiomatici ha spinto ad indagare il fenomeno dell’ambiguità con maggiore acutezza fino a sostituire il verbocentrsmo delle prime analisi con un approccio del tutto particle-centred: la particella lungi dall’essere considerate un piccolo elemento inerte (lat. ‘particula’) svolge un ruolo centrale all’interno dell’enunciato, comportandosi harrisianamente da operatore cioè da elemento pienamente predicativo che seleziona il numero e la tipologia di argomenti e che determina il significato dell’intera costruzione. La parte finale della tesi è dedicata ad uno specifico set di costruzioni assolute con particella predicativa ma prive di verbo come su le mani, via di qui, avanti il prossimo, giù il Governo, definite ‘verbless-particle constructions’. [a cura dell'autore]X n.s

    CATALOGA®: a Software for Semantic and Terminological Information Retrieval

    No full text
    One of the most relevant problems with Information Retrieval (IR) softwares is the correct processing of complex lexical units, today also known as multiword units. The shortcomings are mainly due to the fact that such units are often considered as extemporaneous combinations of words retrievable by means of statistical routines. On the contrary, several linguistic studies, also dating back to the '60s, show that multiword units, and mainly compound nouns, are almost always fixed meaning units, with specific formal, morphological, grammatical and semantic characteristics. Furthermore, these units can be processed as dictionary entries, thus becoming concrete lingware tools useful to achieve efficient semantic information retrieval (IR). Therefore, in this paper we will focus on CATALOGA®, an automatic IR software which retrieves terminological information from digitized texts without any human intervention. CATALOGA® is actually configured as a stand-alone software which can be integrated in Web sites and portals to be used online. More specifically, we will describe its lingware and software characteristics, discussing their usage as a possible solution to current IR software limitations. The analytical procedure here described will prove itself appropriate for any type of digitized text, and will also represent a relevant support for the building and implementing of Semantic Web (SW) interactive platforms

    El lenguaje económico en los tiempos de la crisis global: un estudio longitudinal de análisis de sentimiento

    Get PDF
    El siguiente paso fue el análisis de datos, en el cual se realiza el análisis de sentimiento los conjuntos de datos. El análisis consta de tres partes: (a) una tabla de resultados estadísticos descriptivos longitudinales relativos a las puntuaciones de sentimiento, (b) una tabla anual de colocaciones y (c) una discusión sobre los hallazgos en el corpus a partir de la observación de rankings anuales de colocaciones, con la intención de triangular los datos obtenidos. Principalmente, se evidencian dos hechos: (1) Los términos se convierten en palabras evento dado el enorme aumento de su frecuencia de uso debido a los eventos clave de la crisis. A partir de este fenómeno se producen cambios significativos en el uso (la orientación semántica de colocaciones varía) y frecuentemente suelen tiene un nivel menor de especialización. (2) Las medias anuales de la orientación semántica de un término contextualizado permiten observar fluctuaciones importantes en el sentimiento embebido en el discurso. Una triangulación de los datos cuantitativos con sus colocaciones más significativas y los eventos relacionados con la Gran Recesión permite concluir que la orientación semántica de los términos del dominio económico-financiero es muy susceptible de variar a medida que se desarrollaron los hechos de la crisis financiera. Fecha de lectura de Tesis Doctoral: 20 de septiembre 2019Esta tesis se centra en el estudio longitudinal de la influencia de los eventos en la forma en la orientación semántica en la terminología económica. En este caso se estudiará el periodo de la Gran Recesión, un acontecimiento de primer orden que generó una gran cantidad de información textual que se ha aprovechado como fuente de datos susceptibles de ser analizados automáticamente. El análisis de sentimiento es una disciplina del procesamiento del lenguaje natural que se ocupa del tratamiento computacional de la opinión de la subjetividad en los textos. Por ello, el objetivo general de esta tesis es analizar las fluctuaciones en la orientación semántica de una serie de términos económicos dentro del período 2007-2015 a través de la caracterización del impacto de los eventos de mayor orden en las variaciones semánticas de las unidades léxicas. Entre sus objetivos específicos están: (1) recopilar un lexicón de sentimiento de dominio económico-financiero en lengua inglesa a partir de un corpus de noticias económicas diseñado ad-hoc, (2) definir un conjunto de datos longitudinal en forma de oraciones que contienen los términos de estudio y que serán el input del análisis de sentimiento, (3) tras analizar los una serie de términos económicos-financieros, identificar los eventos que han acompañado a cambios en su orientación semántica y (4) analizar las posibles variaciones en la prosodia semántica. Para llevar a cabo el análisis automático, se desarrolló LexiEcon, un lexicón plug-in de dominio específico para la lengua inglesa adaptado para la suite Lingmotif. Dada su amplitud, los resultados de cobertura y exhaustividad de su evaluación fueron muy satisfactorios (F1 0,735). Esta cifra supone alrededor de un 20% más que los resultados que ofrece Lingmotif sin léxico específico cuando clasifica los textos del dominio económico-financiero

    Informação e/ou conhecimento : as duas faces de Jano : atas

    Get PDF

    Desarrollo de una herramienta computacional para el estudio de los marcadores discursivos

    Get PDF
    Los marcadores discursivos son un conjunto heterogéneo de unidades con significación subjetiva, idiosincrásica e intraducible (Martí Sánchez, 2009). Introducen información relacionada con la actividad comunicativa, contribuyen a construir texto o discurso, facilitando así la interpretación de los mensajes, y siempre comunican más de lo que explícitamente expresan. Es más, su uso y entendimiento implican madurez lingüística. Según esta definición, al grupo de los marcadores discursivos pertenecen términos de distintos tipos, cuyo significado puede variar según el uso y el contexto, y que, además, se clasifican en tipos que pueden ser rasgos distintivos de los diferentes tipos de textos. Asimismo, los marcadores discursivos de cada lengua son elementos propios de la misma. Todo lo antedicho constituye el motivo por el que hemos elegido los marcadores discursivos como objeto de estudio para nuestro análisis contrastivo de textos. Se trata de un elemento lingüístico que caracteriza el tipo de texto y los usos discursivos que se dan en el mismo; por tanto, un emisor competente en una lengua dada se ajustará al uso característico de los marcadores según el tipo de texto. Su estudio y el conocimiento de la frecuencia de uso de cada tipo de marcador atendiendo al contexto, es algo de gran utilidad para profesionales de la traducción, la enseñanza o la escritura
    corecore