    16th International NooJ 2022 Conference: Book of Abstracts

    Libro de resĂşmenes presentados en la "16th International NooJ 2022 Conference", de modalidad hĂ­brida, realizada en el ECU (Espacio Cultural Universitario, UNR) en Rosario, Santa Fe, Argentina, entre el 14 y 15 de junio de 2022.Fil: Reyes, Silvia Susana. Universidad Nacional de Rosario. Facultad de Humanidades y Artes; Argentin

    Project Management within Economic Intelligence: using NooJ as Diagnostic Tool for nanometrology cluster

    International audienceThis paper presents the methodology of monitoring/watch process (WP) and its complexity in economic intelligence (EI) context study. The examination of the theoretical foundation for knowledge representation through corpora studies is tested and then applied in application framework with the subject of nonoscience and nanotechnology (in France). The implementation was on NooJ platform. The methodology applied includes five phases: data selection, cleaning, linguistic resources development, NooJ processing and result analysis. The main results of this paper presents the connection of the WP methodology through a specific case study in nanometrology, which allows the diagnosis of the internal structure and the creation of links between actors, themes and projects towards the innovation. Processing with NooJ was, firstly, to mine on actor activities linked to information needs in order to get the better management of the project and, secondly, to determine the professional classes of each actor in the nanometrology cluster

    Information Design for "Weak Signal" detection and processing in Economic Intelligence: A case study on Health resources

    Paper Received 20 February 2011; received in revised form 22 November 2011; accepted 25 December 2011International audienceThe topics of this research cover all phases of "Information Design" applied to detect and profit from weak signals in economic intelligence (EI) or business intelligence (BI). The field of the information design (ID) applies to the process of translating complex, unorganized or unstructured data into valuable and meaningful information. ID practice requires an interdisciplinary approach, which combines skills in graphic design (writing, analysis processing and editing), human performances technology and human factors. Applied in the context of information system, it allows end-users to easily detect implicit topics known as "weak signals" (WS). In our approach to implement the ID, the processes cover the development of a knowledge management (KM) process in the context of EI. A case study concerning information monitoring health resources is presented using ID processes to outline weak signals. Both French and American bibliographic databases were applied to make the connection to multilingual concepts in the health watch process

    Arabic-English Text Translation Leveraging Hybrid NER

    Exploring formal models of linguistic data structuring. Enhanced solutions for knowledge management systems based on NLP applications

    2010 - 2011The principal aim of this research is describing to which extent formal models for linguistic data structuring are crucial in Natural Language Processing (NLP) applications. In this sense, we will pay particular attention to those Knowledge Management Systems (KMS) which are designed for the Internet, and also to the enhanced solutions they may require. In order to appropriately deal with this topics, we will describe how to achieve computational linguistics applications helpful to humans in establishing and maintaining an advantageous relationship with technologies, especially with those technologies which are based on or produce man-machine interactions in natural language. We will explore the positive relationship which may exist between well-structured Linguistic Resources (LR) and KMS, in order to state that if the information architecture of a KMS is based on the formalization of linguistic data, then the system works better and is more consistent. As for the topics we want to deal with, frist of all it is indispensable to state that in order to structure efficient and effective Information Retrieval (IR) tools, understanding and formalizing natural language combinatory mechanisms seems to be the first operation to achieve, also because any piece of information produced by humans on the Internet is necessarily a linguistic act. Therefore, in this research work we will also discuss the NLP structuring of a linguistic formalization Hybrid Model, which we hope will prove to be a useful tool to support, improve and refine KMSs. More specifically, in section 1 we will describe how to structure language resources implementable inside KMSs, to what extent they can improve the performance of these systems and how the problem of linguistic data structuring is dealt with by natural language formalization methods. In section 2 we will proceed with a brief review of computational linguistics, paying particular attention to specific software packages such Intex, Unitex, NooJ, and Cataloga, which are developed according to Lexicon-Grammar (LG) method, a linguistic theory established during the 60’s by Maurice Gross. In section 3 we will describe some specific works useful to monitor the state of the art in Linguistic Data Structuring Models, Enhanced Solutions for KMSs, and NLP Applications for KMSs. In section 4 we will cope with problems related to natural language formalization methods, describing mainly Transformational-Generative Grammar (TGG) and LG, plus other methods based on statistical approaches and ontologies. In section 5 we will propose a Hybrid Model usable in NLP applications in order to create effective enhanced solutions for KMSs. Specific features and elements of our hybrid model will be shown through some results on experimental research work. The case study we will present is a very complex NLP problem yet little explored in recent years, i.e. Multi Word Units (MWUs) treatment. In section 6 we will close our research evaluating its results and presenting possible future work perspectives. [edited by author]X n.s

    Társas megküzdési minták és identitáskonstrukciós folyamatok azonosítása történelmi tárgyú elbeszélésekben = Exploring coping strategies and processes of identity construction in fictional and non-fictional historical narratives

    Kísérleti kutatásokkal igazoltuk, hogy a csoportközi észlelésben a saját csoport iránti pozitív elfogultság, illetve a külső csoportok negatív megkülönböztetése akkor jelentkezik, ha a helyzetben a csoportok közötti történelmi konfliktus aktualizálódik. Az észlelésben megnyilvánuló elfogultság függ a személyek nemzeti azonosulásának típusától és mértékétől. Ugyanebben a kísérletsorozatban kidolgoztuk és igazoltuk a történelmi pályához kapcsolódó kollektív érzelmek kategóriáját. A kutatás céljára kifejlesztett tartalomelemző eszközökkel három történelmi esemény (honfoglalás, kiegyezés, Trianoni békeszerződés) szöveges bemutatását elemeztük 1900-tól napjainkig megjelent középiskolás történelem könyvekben, illetve 16 történelmi eseményt elemeztünk kortárs általános-és középiskolai tankönyvekben valamint egy 500 fős mintától nyert laikus történelmi elbeszélésekben. Az eredmények azt mutatták, hogy mind a történelemkönyvek, mind a laikus elbeszélések a magyar csoportot illetően alacsony ágenciájú identitást közvetítenek. A huszadik századi traumatikus események kognitív elaborációja csak az utóbbi évtizedben indult meg. A történelmi regények identitás közvetítő funkciójának tartalomelemzéses vizsgálatiban megállapítottuk, hogy a csoportközi konfliktust elbeszélő szövegek kategoriális empátiát hívnak elő. A regények magyar szereplőinek érzelmi készletét a depresszív dinamika jellemzi. Nagy számban jelennek meg a történelmi pályához kapcsolódó érzelmek. | Experimental research provided evidence that ingroup favoritism in intergroup perception is enhanced by actualization of historical intergroup conflict. Ingroup favoritism depends on type and strength of national identification. In a series of experiments we have elaborated and verified the category of historical trajectory related emotions as a type of collective emotions. Content analytic tools which have been developed for the purpose of studying historical texts were used to analyzing three historical events in high school history books published from 1900 on. Similar study was carried out with 16 historical events in contemporary secondary school and high school history books, and with lay history stories gathered from a stratified sample of 500 subjects. Results show that both history books and lay stories transmit a kind of national identity which is characterized by low agency. Cognitive elaboration of the traumatic events of the twentieth century has begin only int he past decade. Studies of historical novels' functions in mediating national identity provided evidence that narratives on intergroup conflicts evoke categorial empathy. Emotional repository of Hungarian characters is dominated by depressive dynamics. There is a high frequency of historical trajectory related emotions

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

    An automatic morphological analysis system for Indonesian

    This thesis reports the creation of SANTI-morf (Sistem Analisis Teks Indonesia – morfologi), a rule-based system that performs morphological annotation for Indonesian. The system has been built across three stages, namely preliminaries, annotation scheme creation (the linguistic aspect of the project), and system implementation (the computational aspect of the project). The preliminary matters covered include the necessary key concepts in morphology and Natural Language Processing (NLP), as well as a concise description of Indonesian morphology (largely based on the two primary reference grammars of Indonesian, Alwi et al. 1998 and Sneddon et al. 2010, together with work in the linguistic literature on Indonesian morphology (e.g. Kridalaksana 1989; Chaer 2008). As part of this preliminary stage, I created a testbed corpus for evaluation purposes. The design of the testbed is justified by considering the design of existing evaluation corpora, such as the testbed used by the English Constraint Grammar or EngCG system (Voutilanen 1992), the British National Corpus (BNC) 1994 evaluation data , and the training data used by MorphInd (Larasati et al. 2011), a morphological analyser (MA) for Indonesian. The dataset for this testbed was created by narrowing down an existing very large bit unbalanced collection of texts (drawn from the Leipzig corpora; see Goldhahn et al. 2012). The initial collection was reduced to a corpus composed of nine domains following the domain categorisation of the BNC) . A set of texts from each domain, proportional in size, was extracted and combined to form a testbed that complies with the design cited informed by the prior literature. The second stage, scheme creation, involved the creation of a new Morphological Annotation Scheme (MAS) for Indonesian, for use in the SANTI-morf system. First, a review of MASs in different languages (Finnish, Turkish, Arabic, Indonesian) as well as the Universal Dependencies MAS identifies the best practices in the field. From these, 15 design principles for the novel MAS were devised. This MAS consists of a morphological tagset, together with comprehensive justification of the morphological analyses used in the system. It achieves full morpheme-level annotation, presenting each morpheme’s orthographic and citation forms in the defined output, accompanied by robust morphological analyses, both formal and functional; to my knowledge, this is the first MAS of its kind for Indonesian. The MAS’s design is based not only on reference grammars of Indonesian and other linguistic sources, but also on the anticipated needs of researchers and other users of texts and corpora annotated using this scheme of analysis. The new MAS aims at The third stage of the project, implementation, consisted of three parts: a benchmarking evaluation exercise, a survey of frameworks and tools, leading ultimately to the actual implementation and evaluation of SANTI-morf. MorphInd (Larasati et al. 2012) is the prior state-of-the-art MA for Indonesian. That being the case, I evaluated MorphInd’s performance against the aforementioned testbed, both as just5ification of the need for an improved system, and to serve as a benchmark for SANTI-morf. MorphInd scored 93% on lexical coverage and 89% on tagging accuracy. Next, I surveyed existing MAs frameworks and tools. This survey justifies my choice for the rule-based approach (inspired by Koskenniemi’s 1983 Two Level Morphology, and NooJ (Silberztein 2S003) as respectively the framework and the software tool for SANTI-morf. After selection of this approach and tool, the language resources that constitute the SANTI-morf system were created. These are, primarily, a number of lexicons and sets of analysis rules, as well as necessary NooJ system configuration files. SANTI-morf’s 3 lexicon files (in total 86,590 entries) and 15 rule files (in total 659 rules) are organised into four modules, namely the Annotator, the Guesser, the Improver and the Disambiguator. These modules are applied one after another in a pipeline. The Annotator provides initial morpheme-level annotation for Indonesian words by identifying their having been built according to various morphological processes (affixation, reduplication, compounding, and cliticisation). The Guesser ensures that words not covered by the Annotator, because they are not covered by its lexicons, receive best guesses as to the correct analysis from the application of a set of probable but not exceptionless rules. The Improver improves the existing annotation, by adding probable analyses that the Annotator might have missed. Finally, the Disambiguator resolves ambiguities, that is, words for which the earlier elements of the pipeline have generated two or more possible analyses in terms of the morphemes identified or their annotation. NooJ annotations are saved in a binary file, but for evaluation purposes, plain-text output is required. I thus developed a system for data export using an in-NooJ mapping to and from a modified, exportable expression of the MAS, and wrote a small program to enable re-conversion of the output in plain-text format. For purposes of the evaluation, I created a 10,000 -word gold-standard SANTI-morf manually-annotated dataset. The outcome of the evaluation is that SANTI-morf has 100% coverage (because a best-guess analysis is always provided for unrecognised word forms), and 99% precision and recall for the morphological annotations, with a 1% rate of remaining ambiguity in the final output. SANTI-morf is thus shown to present a number of advancements over MorphInd, the state-of-the-art MA for Indonesian, exhibiting more robust annotation and better coverage. Other performance indicators, namely the high precision and recall, make SANTI-morf a concrete advance in the field of automated morphological annotation for Indonesian, and in consequence a substantive contribution to the field of Indonesian linguistics overall

    Formal Linguistic Models and Knowledge Processing. A Structuralist Approach to Rule-Based Ontology Learning and Population

    2013 - 2014The main aim of this research is to propose a structuralist approach for knowledge processing by means of ontology learning and population, achieved starting from unstructured and structured texts. The method suggested includes distributional semantic approaches and NL formalization theories, in order to develop a framework, which relies upon deep linguistic analysis... [edited by author]XIII n.s
