Search CORE

7,392 research outputs found

Neural Networks for the Web Services Classification

Author: Hernández Palma Hugo
Niebles Núẽz William
Senior Naveda Alexa
Silva Jesús
Solórzano Movilla José
Publication venue: Institute of Physics Publishing
Publication date: 01/01/2020
Field of study

This article introduces a n-gram-based approach to automatic classification of Web services using a multilayer perceptron-type artificial neural network. Web services contain information that is useful for achieving a classification based on its functionality. The approach relies on word n-grams extracted from the web service description to determine its membership in a category. The experimentation carried out shows promising results, achieving a classification with a measure F=0.995 using unigrams (2-grams) of words (characteristics composed of a lexical unit) and a TF-IDF weight

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

ZENODO

Repositorio Digital CUC

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Repositorio Académico UPC

Using TF-IDF n-gram and word embedding cluster ensembles for author profiling: Notebook for PAN at CLEF 2017

Author: Poulston A.
Stevenson M.
Waseem Z.
Publication venue: CEUR
Publication date: 13/07/2017
Field of study

This paper presents our approach and results for the 2017 PAN Author Profiling Shared Task. Language-specific corpora were provided for four langauges: Spanish, English, Portuguese, and Arabic. Each corpus consisted of tweets authored by a number of Twitter users labeled with their gender and the specific variant of their language which was used in the documents (e.g. Brazilian or European Portuguese). The task was to develop a system to infer the same attributes for unseen Twitter users. Our system employs an ensemble of two probabilistic classifiers: a Logistic regression classifier trained on TF-IDF transformed n-grams and a Gaussian Process classifier trained on word embedding clusters derived for an additional, external corpus of tweets

White Rose Research Online

Part of Speech Based Term Weighting for Information Retrieval

Author: Brown André EX
Ch'ng Quee-Lim
Currie Michael
Grundy Laura J
Hokanson Jim
Javer Avelino
Kerr Rex
Lee Chee Wai
Li Chris
Li Kezhi
Schafer William R
Yemini Eviatar
Publication venue
Publication date: 05/04/2017
Field of study

Automatic language processing tools typically assign to terms so-called weights corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the POS contexts in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF & BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline). Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline

arXiv.org e-Print Archive

ZENODO

Knowledge Discovery in Documents by Extracting Frequent Word Sequences

Author: Ahonen Helena
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/1999
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Matching Queries to Frequently Asked Questions: Search Functionality for the MRSA Web-Portal

Author: Akker Rieks op den
Tigelaar Almer S.
Verhoeven Fenne
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

As part of the long-term EUREGIO MRSA-net project a system was developed which enables health care workers and the general public to quickly find answers to their questions regarding the MRSA pathogen. This paper focuses on how these questions can be answered using Information Retrieval (IR) and Natural Language Processing (NLP) techniques on a Frequently-Asked-Questions-style (FAQ) database

University of Twente Research Information