Search CORE

3,210 research outputs found

Decision making and soft computing: proceedings of the 11th international FLINS conference

Author: Dos Santos Machado Liliane
Kerre Etienne
Lu Jie
Marcos de Moraes Ronei
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2014
Field of study

Improving sentiment analysis through ensemble learning of meta-level features

Author: Alnashwan Rana
Hoare Cathal
O'Riordan Adrian P.
Sorensen Humphrey
Publication venue: Sun SITE Central Europe (CEUR) / RWTH Aachen University
Publication date: 16/05/2017
Field of study

In this research, the well-known microblogging site, Twitter, was used for a sentiment analysis investigation. We propose an ensemble learning approach based on the meta-level features of seven existing lexicon resources for automated polarity sentiment classification. The ensemble employs four base learners (a Two-Class Support Vector Machine, a Two-Class Bayes Point Machine, a Two-Class Logistic Regression and a Two-Class Decision Forest) for the classification task. Three different labelled Twitter datasets were used to evaluate the effectiveness of this approach to sentiment analysis. Our experiment shows that, based on a combination of existing lexicon resources, the ensemble learners minimize the error rate by avoiding poor selection from stand-alone classifiers

Cork Open Research Archive

A review of sentiment analysis research in Arabic language

Author: Cambria Erik
HajHmida Moez Ben
Oueslati Oumaima
Ounelli Habib
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

The 6th Conference of PhD Students in Computer Science

Author
Publication venue
Publication date: 01/01/2008
Field of study

University of Szeged

Information Retrieval with Finnish Case Law Embeddings

Author: Sarsa Sami
Publication venue: Helsingfors universitet
Publication date: 01/01/2019
Field of study

In this work, five text vectorisation models' capability in embedding Finnish case law texts to vector space for inter-textual similarity computation is studied. The embeddings and their computed similarities are used to create a Finnish case law retrieval system that allows effective querying with full documents. A working web application is presented as a part of the work. The case law data for the work is provided by the Finnish Ministry of Justice, and the studied models are: TF-IDF, LDA, Word2Vec, Doc2Vec and Doc2vecC

Helsingin yliopiston digitaalinen arkisto

Site-Specific Rules Extraction in Precision Agriculture

Author: Espejo García Borja Antonio
López Pellicer Francisco Javier
Zarazaga Soria Francisco Javier
Publication venue: Universidad de Zaragoza, Prensas de la Universidad
Publication date: 01/01/2019
Field of study

El incremento sostenible en la producción alimentaria para satisfacer las necesidades de una población mundial en aumento es un verdadero reto cuando tenemos en cuenta el impacto constante de plagas y enfermedades en los cultivos. Debido a las importantes pérdidas económicas que se producen, el uso de tratamientos químicos es demasiado alto; causando contaminación del medio ambiente y resistencia a distintos tratamientos. En este contexto, la comunidad agrícola divisa la aplicación de tratamientos más específicos para cada lugar, así como la validación automática con la conformidad legal. Sin embargo, la especificación de estos tratamientos se encuentra en regulaciones expresadas en lenguaje natural. Por este motivo, traducir regulaciones a una representación procesable por máquinas está tomando cada vez más importancia en la agricultura de precisión.Actualmente, los requisitos para traducir las regulaciones en reglas formales están lejos de ser cumplidos; y con el rápido desarrollo de la ciencia agrícola, la verificación manual de la conformidad legal se torna inabordable.En esta tesis, el objetivo es construir y evaluar un sistema de extracción de reglas para destilar de manera efectiva la información relevante de las regulaciones y transformar las reglas de lenguaje natural a un formato estructurado que pueda ser procesado por máquinas. Para ello, hemos separado la extracción de reglas en dos pasos. El primero es construir una ontología del dominio; un modelo para describir los desórdenes que producen las enfermedades en los cultivos y sus tratamientos. El segundo paso es extraer información para poblar la ontología. Puesto que usamos técnicas de aprendizaje automático, implementamos la metodología MATTER para realizar el proceso de anotación de regulaciones. Una vez creado el corpus, construimos un clasificador de categorías de reglas que discierne entre obligaciones y prohibiciones; y un sistema para la extracción de restricciones en reglas, que reconoce información relevante para retener el isomorfismo con la regulación original. Para estos componentes, empleamos, entre otra técnicas de aprendizaje profundo, redes neuronales convolucionales y “Long Short- Term Memory”. Además, utilizamos como baselines algoritmos más tradicionales como “support-vector machines” y “random forests”.Como resultado, presentamos la ontología PCT-O, que ha sido alineada con otras ontologías como NCBI, PubChem, ChEBI y Wikipedia. El modelo puede ser utilizado para la identificación de desórdenes, el análisis de conflictos entre tratamientos y la comparación entre legislaciones de distintos países. Con respecto a los sistemas de extracción, evaluamos empíricamente el comportamiento con distintas métricas, pero la métrica F1 es utilizada para seleccionar los mejores sistemas. En el caso del clasificador de categorías de reglas, el mejor sistema obtiene un macro F1 de 92,77% y un F1 binario de 85,71%. Este sistema usa una red “bidirectional long short-term memory” con “word embeddings” como entrada. En relación al extractor de restricciones de reglas, el mejor sistema obtiene un micro F1 de 88,3%. Este extractor utiliza como entrada una combinación de “character embeddings” junto a “word embeddings” y una red neuronal “bidirectional long short-term memory”.<br /

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Universidad de Zaragoza

Keeping the data lake in form: DS-kNN datasets categorization using proximity mining

Author: Abelló Gamazo Alberto
Al-serafi Ayman Mounir Mohamed
Calders Toon
Romero Moral Óscar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

With the growth of the number of datasets stored in data repositories, there has been a trend of using Data Lakes (DLs) to store such data. DLs store datasets in their raw formats without any transformations or preprocessing, with accessibility available using schema-on-read. This makes it difficult for analysts to find datasets that can be crossed and that belong to the same topic. To support them in this DL governance challenge, we propose in this paper an algorithm for categorizing datasets in the DL into pre-defined topic-wise categories of interest. We utilise a k-NN approach for this task which uses a proximity score for computing similarities of datasets based on metadata. We test our algorithm on a real-life DL with a known ground-truth categorization. Our approach is successful in detecting the correct categories for datasets and outliers with a precision of more than 90% and recall rates exceeding 75% in specific settings.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC