Search CORE

729 research outputs found

Recommended from our members

Minimally supervised induction of morphology through bitexts

Author: Moon Taesun, Ph. D.
Publication venue
Publication date: 01/12/2008
Field of study

textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems. Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis. While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic

Texas ScholarWorks

Searching for associations between social media trending topics and organizations

Author: Henriques João Pedro Sousa
Publication venue
Publication date: 22/11/2020
Field of study

This work focuses on how micro and small companies can take advantage of trending topics for marketing campaigns. Trending topics are the most discussed topics at the moment on social media platforms, particularly on Twitter and Facebook. While the access to trending topics is free and available to everyone, marketing specialists and specific software are more expensive, therefore small companies do not have the budget to support those costs. The main goal is to search for associations between trending topics and companies on social media platforms and HotRivers prototype is designed to accomplish this. A solution that aims to be inexpensive, fast, and automated. Detailed analyses were conducted to reduced the time and maximize the resources available at the lowest price. The final user receives a list of the trending topics related to the target company. For HotRivers were tested different pre-processing text techniques, a method to select tweets called Centroid Strategy and three models, an embedding vectors approach with Doc2Vec model, a probabilistic model with Latent Dirichlet Allocation, and a classification task approach with a Convolutional Neural Network used on the final architecture. The Centroid Strategy is used on trending topics to avoid unwanted tweets. In the results stand out that trending topic Nike has an association with the company Nike and #World- PatientSafetyDay has an association with Portsmouth Hospitals University. HotRivers cannot produce a full marketing campaign but can point out to the direction to the next campaign.Este trabalho foca-se na forma como as micro e pequenas empresas podem tirar partido dos trending topics para as suas campanhas de marketing. Os trending topics são os tópicos mais discutidos em cada momento nas redes sociais, particularmente no Twitter e no Facebook. Enquanto o acesso aos trending topics é gratuito e generalizado, os especialistas em marketing e o software especifico são dispendiosos, pelo que as pequenas empresas não têm o orçamento para suportar esses custos. O principal objetivo é procurar associações entre trending topics e empresas nas redes sociais e para isso foi criado um protótipo chamado HotRivers. Uma solução que pretende ser acessível, rápida e automatizada. Foram realizadas análises detalhadas para reduzir o tempo e maximizar os recursos disponíveis a preço baixo. O utilizador final recebe uma lista dos trending topics relacionados com a empresa alvo. O HotRivers foi testado com diferentes técnicas de pré-processamento de texto, um método para selecionar tweets chamado Estratégia Centroid e três modelos, uma abordagem de vectores embedding com o modelo Doc2Vec, um modelo probabilístico com Alocação de Dirichlet Latente, e uma abordagem de classificação com uma Rede Neural Convolucional, selecionada para a arquitetura final. A Estratégia Centroid é utilizada nos trending topics para evitar tweets indesejados. Nos resultados destacam-se o trending topic "Nike" que tem uma associação com a empresa Nike e #WorldPatientSafetyDay que tem uma associação com a Universidade dos Hospitais de Portsmouth. Embora o HotRivers não possa produzir uma campanha de marketing completa, pode apontar a direção para a campanha seguinte

Repositório Institucional do ISCTE-IUL

D4.1. Technologies and tools for corpus creation, normalization and annotation

Author: Aleksic Vera
B?l Nuria
Bartolini Roberto
Caselli Tommaso
Frontini Francesca
Hamon Olivier
Papavassiliou Vassilis
Pecina Pavel
Poch Riera Marc
Poibeau Thierry
Prokopis Prokopidis
Rimell Laura
Thurmair Gregor
Publication venue
Publication date
Field of study

The objectives of the Corpus Acquisition and Annotation (CAA) subsystem are the acquisition and processing of monolingual and bilingual language resources (LRs) required in the PANACEA context. Therefore, the CAA subsystem includes: i) a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web, ii) a component for cleanup and normalization (CNC) of these data and iii) a text processing component (TPC) which consists of NLP tools including modules for sentence splitting, POS tagging, lemmatization, parsing and named entity recognition

PUblication MAnagement

Russian Language Neural Net Chatbot with Natural Language Processing

Author: Ismoilov Nurullo
Semenov Mikhail Evgenievich
Publication venue
Publication date: 01/01/2019
Field of study

In this paper, we consider a chatbot, which can reply to various user commands and uses natural language processing. Moreover, the most common employee's working processes were automated. This solution can work under any corporate local or global networks. Although, in this article, used tools, software and libraries are explained as well. As a result, chatbot prototype is presented

Electronic archive of Tomsk Polytechnic University

Comprehending Security Events:Context-Based Identification and Explanation

Author: van Ede Thijs Sebastiaan
Publication venue: University of Twente
Publication date: 24/11/2023
Field of study

University of Twente Research Information

A Data Analysis Pipeline for the Study and Categorization of User Content in Online Health Communities

Author: Sara Filipa Mendes da Silva
Publication venue
Publication date: 31/10/2022
Field of study

Repositório Aberto da Universidade do Porto

Sharing Cultural Heritage: the Clavius on the Web Project

Author: Abrate Matteo
Del Grosso Angelo Mario
Giovannetti Emiliano
Lo Duca Angelica
Luzzi Damiana
MANCINI LORENZO
Marchetti Andrea
Pedretti Irene
Piccini Silvia
Publication venue
Publication date: 01/01/2014
Field of study

In the last few years the amount of manuscripts digitized and made available on the Web has been constantly increasing. However, there is still a considarable lack of results concerning both the explicitation of their content and the tools developed to make it available. The objective of the Clavius on the Web project is to develop a Web platform exposing a selection of Christophorus Clavius letters along with three different levels of analysis: linguistic, lexical and semantic. The multilayered annotation of the corpus involves a XML-TEI encoding followed by a tokenization step where each token is univocally identified through a CTS urn notation and then associated to a part-of-speech and a lemma. The text is lexically and semantically annotated on the basis of a lexicon and a domain ontology, the former structuring the most relevant terms occurring in the text and the latter representing the domain entities of interest (e.g. people, places, etc.). Moreover, each entity is connected to linked and non linked resources, including DBpedia and VIAF. Finally, the results of the three layers of analysis are gathered and shown through interactive visualization and storytelling techniques. A demo version of the integrated architecture was developed

Archivio della ricerca- Università di Roma La Sapienza

The Family Name as Socio-Cultural Feature and Genetic Metaphor: From Concepts to Methods

Author: Alessio Boattini
Alford R.
Antonella Useli
Archer S.
Beck P.
Bertrand Desjardins
Black G. F.
Boattini A.
Bourin M.
Castles S.
Cavalli Sforza L. L.
Chen K.
Cheshire J. A.
Darlu P.
Davide Pettener
De Felice E.
de Woulfe P.
Emery R.
Emery R.
Franz Manni
Gerrit Bloothooft
Guy Brunet
Hanks P.
Hanks P.
Hellfritzsch V.
Hey D. G.
James Cheshire
Karlin S.
Kathrin Dräger
Kedar B.
Kees Mandemakers
Kohonen T.
Kunze K.
Leendert Brouwer
MacLysaght E.
Manni F.
Mateos P.
Mateos P.
Matthijs Brouwer
McKinley R. A.
McKinley R. A.
McKinley R. A.
McKinley R. A.
McKinley R. A.
McKinley R. A.
Morgan T. J.
Nathan M.
Nei M.
Neumann I.
Neumann I.
Nicholls K.
N¨bling D.
Pablo Mateos
Pascal Chareille
Patrick Hanks
Paul Longley
Pierre Darlu
Postles D.
Postles D.
Redmonds G.
Richard Coates
Rodriguez Diaz R.
Rohlfs G.
Saitou N.
Schmuck M.
Tooth E.
Walther H.
Publication venue
Publication date: 01/01/2012
Field of study

A recent workshop entitled The Family Name as Socio-Cultural Feature and Genetic Metaphor: From Concepts to Methods was held in Paris in December 2010, sponsored by the French National Centre for Scientific Research (CNRS) and by the journal Human Biology. This workshop was intended to foster a debate on questions related to the family names and to compare different multidisciplinary approaches involving geneticists, historians, geographers, sociologists and social anthropologists. This collective paper presents a collection of selected communications

HAL-ENS-LYON

Crossref

Hal - Université Grenoble Alpes

UCL Discovery

Digital Commons@Wayne State University

HAL

HAL-Lyon 3

HAL Université de Tours

KNAW Repository