7,392 research outputs found
Neural Networks for the Web Services Classification
This article introduces a n-gram-based approach to automatic classification of Web services using a multilayer perceptron-type artificial neural network. Web services contain information that is useful for achieving a classification based on its functionality. The approach relies on word n-grams extracted from the web service description to determine its membership in a category. The experimentation carried out shows promising results, achieving a classification with a measure F=0.995 using unigrams (2-grams) of words (characteristics composed of a lexical unit) and a TF-IDF weight
Using TF-IDF n-gram and word embedding cluster ensembles for author profiling: Notebook for PAN at CLEF 2017
This paper presents our approach and results for the 2017 PAN Author Profiling Shared Task. Language-specific corpora were provided for four langauges: Spanish, English, Portuguese, and Arabic. Each corpus consisted of tweets authored by a number of Twitter users labeled with their gender and the specific variant of their language which was used in the documents (e.g. Brazilian or European Portuguese). The task was to develop a system to infer the same attributes for unseen Twitter users. Our system employs an ensemble of two probabilistic classifiers: a Logistic regression classifier trained on TF-IDF transformed n-grams and a Gaussian Process classifier trained on word embedding clusters derived for an additional, external corpus of tweets
Part of Speech Based Term Weighting for Information Retrieval
Automatic language processing tools typically assign to terms so-called
weights corresponding to the contribution of terms to information content.
Traditionally, term weights are computed from lexical statistics, e.g., term
frequencies. We propose a new type of term weight that is computed from part of
speech (POS) n-gram statistics. The proposed POS-based term weight represents
how informative a term is in general, based on the POS contexts in which it
generally occurs in language. We suggest five different computations of
POS-based term weights by extending existing statistical approximations of term
information measures. We apply these POS-based term weights to information
retrieval, by integrating them into the model that matches documents to
queries. Experiments with two TREC collections and 300 queries, using TF-IDF &
BM25 as baselines, show that integrating our POS-based term weights to
retrieval always leads to gains (up to +33.7% from the baseline). Additional
experiments with a different retrieval model as baseline (Language Model with
Dirichlet priors smoothing) and our best performing POS-based term weight, show
retrieval gains always and consistently across the whole smoothing range of the
baseline
Knowledge Discovery in Documents by Extracting Frequent Word Sequences
published or submitted for publicatio
Matching Queries to Frequently Asked Questions: Search Functionality for the MRSA Web-Portal
As part of the long-term EUREGIO MRSA-net project a system was developed which enables health care workers and the general public to quickly find answers to their questions regarding the MRSA pathogen. This paper focuses on how these questions can be answered using Information Retrieval (IR) and Natural Language Processing (NLP) techniques on a Frequently-Asked-Questions-style (FAQ) database
- …