37 research outputs found

    16th International NooJ 2022 Conference: Book of Abstracts

    Get PDF
    Libro de resúmenes presentados en la "16th International NooJ 2022 Conference", de modalidad híbrida, realizada en el ECU (Espacio Cultural Universitario, UNR) en Rosario, Santa Fe, Argentina, entre el 14 y 15 de junio de 2022.Fil: Reyes, Silvia Susana. Universidad Nacional de Rosario. Facultad de Humanidades y Artes; Argentin

    AmAMorph: Finite State Morphological Analyzer for Amazighe

    Get PDF
    This paper presents AmAMorph, a morphological analyzer for Amazighe language using a system based on the NooJ linguistic development environment. The paper begins with the development of Amazighe lexicons with large coverage formalization. The built electronic lexicons, named ‘NAmLex’, ‘VAmLex’ and ‘PAmLex’ which stand for ‘Noun Amazighe Lexicon’, ‘Verb Amazighe Lexicon’ and ‘Particles Amazighe Lexicon’, link inflectional, morphological, and syntacticsemantic information to the list of lemmas. Automated inflectional and derivational routines are applied to each lemma producing over inflected forms. To our knowledge,AmAMorph is the first morphological analyzer for Amazighe. It identifies the component morphemes of the forms using large coverage morphological grammars. Along with the description of how the analyzer is implemented, this paper gives an evaluation of the analyzer

    Detecting Latin-Based Medical Terminology in Croatian Texts

    Get PDF
    No matter what the main language of texts in the medical domain is, there is always an evidence of the usage of Latin-derived words and formative elements in terminology development. Generally speaking, this usage presents language-specific morpho-semantic behaviors in forming both technical-scientific and common-usage words. Nevertheless, this usage of Latin in Croatian medical texts does not seem consistent due to the fact that diferent mechanisms of word formation may be applied to the same term. In our pursuit to map all the diferent occurrences of the same concept to only one, we propose a model designed within NooJ and based on dictionaries and morphological grammars. Starting from the manual detection of nouns and their variations, we recognize some word formation mechanisms and develop grammars suitable to recognize Latinisms and Croatinized Latin medical terminology

    Contractions: to align or not to align, that is the question

    Get PDF
    This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (PT - [no seio de] [a União Europeia] EN - [within] [the European Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, contractions tend to be maintained when they occur at the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (PT - [no que diz respeito a] EN - [with regard to] or PT - [além disso] EN - [in addition]. A correct alignment of multiwords and phrasal units containing contractions is instrumental for machine translation, paraphrasing, and variety adaptationinfo:eu-repo/semantics/acceptedVersio

    Aprendo con NooJ: de la lingüística computacional a la enseñanza de la lengua

    Get PDF
    En este volumen, se hallan varias obras condensadas en una sola. Se trata, por una parte, de un excelente manual del usuario de NooJ, en español y en inglés, puesto que el libro es, en gran medida, bilingüe. Por otra parte, es también una obra de análisis computacional de la lengua española, que contiene numerosos ejemplos comentados de descripción informática de distintos aspectos ortográficos, morfológicos y sintácticos del español, en su variedad rioplatense, que tanto ha aportado y sigue aportando al patrimonio común de nuestro idioma. Constituye, asimismo, un manual de didáctica de las lenguas, ya que presenta importantes elementos, tanto teóricos como prácticos, de aplicación didáctica de NooJ a la enseñanza/aprendizaje de lenguas maternas y extranjeras, que incluyen incluso relatos de experiencias de estudiantes tras los talleres de lingüística aplicada basados en este entorno informático. Es, igualmente, un manual de análisis del discurso que, además, presenta y analiza textos auténticos representativos de una «gramática de los jóvenes», suscitados en el marco de una investigación original y específica. Por último, pero no menos importante, Aprendo con NooJ es también un libro de ejercicios que, basándose en objetivos explícitamente formulados, y en una progresión didáctica minuciosamente planificada, permitirá al lector adentrarse con paso seguro en el universo de la lingüística computacional y sus aplicaciones. Es este libro, además, una obra viva, puesto que se alimenta del magisterio permanente del amplio equipo de profesores e investigadores que lo firman, así como de las experiencias compartidas a través de la página de Facebook «Aprendo con NooJ», de la serie de vídeos, disponibles en YouTube, que conforman el tutorial para el uso de NooJ en español y del módulo «Español de Argentina» para NooJ, disponible en la página web del mencionado programa informático.Fil: Rodrigo, Andrea. Universidad Nacional de Rosario. Facultad de Humanidades de ArtesFil: Bonino, Rodolfo. Universidad Nacional de Rosario. Facultad de Humanidades y Artes. Escuela de Letras

    Constructive model of the natural language

    Get PDF
    The paper deals with the natural language model. Elements of the model (the language constructions) are images with such attributes as sounds, letters, morphemes, words and other lexical and syntactic components of the language. Based on the analysis of processes of the world perception, visual and associative thinking, the operations of formation and transformation of images are pointed out. The model can be applied in the semantic NLP

    Normalizing English for Interlingua : Multi-channel Approach to Global Machine Translation

    Get PDF
    The paper tries to demonstrate that when English is used as interlingua in translating between two languages it can be normalized for reducing unnecessary ambiguity. Current usage of English often omits such critical features as the relative pronoun and the conjunction for marking the beginning of the subordinate clause. In addition to causing ambiguity, the practice also makes it difficult to produce correct structures in target language. If the source language makes such structures explicit, it is possible to carry this information through the whole translation chain into target language. If we consider English language as an interlingua in a multilingual translation environment, we should make the intermediate stage as little ambiguous as possible. There are also other possibilities for reducing ambiguity, such as selection of less ambiguous translation equivalents. Also, long noun compounds, which are often ambiguous, can be presented in unambiguous form, when the linguistic knowledge of the source language is included.Non peer reviewe

    Exploring formal models of linguistic data structuring. Enhanced solutions for knowledge management systems based on NLP applications

    Get PDF
    2010 - 2011The principal aim of this research is describing to which extent formal models for linguistic data structuring are crucial in Natural Language Processing (NLP) applications. In this sense, we will pay particular attention to those Knowledge Management Systems (KMS) which are designed for the Internet, and also to the enhanced solutions they may require. In order to appropriately deal with this topics, we will describe how to achieve computational linguistics applications helpful to humans in establishing and maintaining an advantageous relationship with technologies, especially with those technologies which are based on or produce man-machine interactions in natural language. We will explore the positive relationship which may exist between well-structured Linguistic Resources (LR) and KMS, in order to state that if the information architecture of a KMS is based on the formalization of linguistic data, then the system works better and is more consistent. As for the topics we want to deal with, frist of all it is indispensable to state that in order to structure efficient and effective Information Retrieval (IR) tools, understanding and formalizing natural language combinatory mechanisms seems to be the first operation to achieve, also because any piece of information produced by humans on the Internet is necessarily a linguistic act. Therefore, in this research work we will also discuss the NLP structuring of a linguistic formalization Hybrid Model, which we hope will prove to be a useful tool to support, improve and refine KMSs. More specifically, in section 1 we will describe how to structure language resources implementable inside KMSs, to what extent they can improve the performance of these systems and how the problem of linguistic data structuring is dealt with by natural language formalization methods. In section 2 we will proceed with a brief review of computational linguistics, paying particular attention to specific software packages such Intex, Unitex, NooJ, and Cataloga, which are developed according to Lexicon-Grammar (LG) method, a linguistic theory established during the 60’s by Maurice Gross. In section 3 we will describe some specific works useful to monitor the state of the art in Linguistic Data Structuring Models, Enhanced Solutions for KMSs, and NLP Applications for KMSs. In section 4 we will cope with problems related to natural language formalization methods, describing mainly Transformational-Generative Grammar (TGG) and LG, plus other methods based on statistical approaches and ontologies. In section 5 we will propose a Hybrid Model usable in NLP applications in order to create effective enhanced solutions for KMSs. Specific features and elements of our hybrid model will be shown through some results on experimental research work. The case study we will present is a very complex NLP problem yet little explored in recent years, i.e. Multi Word Units (MWUs) treatment. In section 6 we will close our research evaluating its results and presenting possible future work perspectives. [edited by author]X n.s

    Semantic Technologies for Business Decision Support

    Get PDF
    2015 - 2016In order to improve and to be competitive, enterprises should know how to get opportunities coming from data provided from the Web. The strategic vision implies a high level of communication sharing and the integration of practices across every business level. This not means that enterprises need a disruptive change in informative systems, but the conversion of them, reusing existent business data and integrating new data. However, data is heterogeneous, and so to maximise the value of the data it is necessary to extract meaning from it considering the context in which they evolve. The proliferation of new linguistic data linked to the growth of textual resources on the Web generate an inadequacy in the analysis and integration phases of data in the enterprise. Thus, the use of Semantic Technologies based on Natural Language Processing (NLP) applications is required in advance. This study arises as a first approach to the development of a document-driven Decision Support System (DSS) based on NLP technology within the theoretical framework of Lexicon-Grammar by Maurice Gross. Our research project has two main objectives: the first is to recognize and codify the innovative language with which the companies express and describe their business, in order to standardize it and make it actionable by machine. The second one aims to use information resulting from the text analysis to support strategic decisions, considering that through Text Mining analysis we can capture the hidden meaning in business documents. In the first chapter we examine the concept, characteristics and different types of DSS (with particular reference to document-driven analysis) and changes that these systems have experienced with web development and consequently of information systems within companies. In the second chapter, we proceed with a brief review of Computational Linguistics, paying particular attention to goals, resources and applications. In the third chapter, we provide a state-of-the-art of Semantic Technology Enterprises (STEs) and their process of integration in the innovation market, analysing the diffusion, the types of technologies and main sectors in which they operate. In the fourth chapter, we propose a model of linguistic support and analysis, according with Lexicon-Grammar, in order to create an enriched solution for document-driven decision systems: we provide specific features of business language, resulted from experimental research work in the startup ecosystem. Finally, we recognize that the formalization of all linguistic phenomena is extremely complex, but the results of analysis make us hopeful to continue with this line of research. Applying linguistic support to the business technological environment provides results that are more efficient and in constantly updated innovating even in strong resistance to change conditions. [edited by author]XV n.s

    From Linguistic Resources to Ontology-Aware Terminologies: Minding the Representation Gap

    Get PDF
    Terminological resources have proven crucial in many applications ranging from Computer-Aided Translation tools to authoring software and multilingual and cross-lingual information retrieval systems. Nonetheless, with the exception of a few felicitous examples, such as the IATE (Interactive Terminology for Europe) Termbank, many terminological resources are not available in standard formats, such as Term Base eXchange (TBX), thus preventing their sharing and reuse. Yet, these terminologies could be improved associating the correspondent ontology-based information. The research described in the present contribution demonstrates the process and the methodologies adopted in the automatic conversion into TBX of such type of resources, together with their semantic enrichment based on the formalization of ontological information into terminologies. We present a proof-of-concept using the Italian Linguistic Resource for the Archaeological domain (developed according to Thesauri and Guidelines of the Italian Central Institute for the Catalogue and Documentation). Further, we introduce the conversion tool developed to support the process of creating ontology-aware terminologies for improving interoperability and sharing of existing language technologies and data set
    corecore