3,172 research outputs found
Scalable approximate FRNN-OWA classification
Fuzzy Rough Nearest Neighbour classification with Ordered Weighted Averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. However, calculating membership in an approximation requires a nearest neighbour search. In practice, the query time complexity of exact nearest neighbour search algorithms in more than a handful of dimensions is near-linear, which limits the scalability of FRNN-OWA. Therefore, we propose approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbours returned by Hierarchical Navigable Small Worlds (HNSW), a recent approximative nearest neighbour search algorithm with logarithmic query time complexity at constant near-100% accuracy. We demonstrate that approximate FRNN-OWA is sufficiently robust to match the classification accuracy of exact FRNN-OWA while scaling much more efficiently. We test four parameter configurations of HNSW, and evaluate their performance by measuring classification accuracy and construction and query times for samples of various sizes from three large datasets. We find that with two of the parameter configurations, approximate FRNN-OWA achieves near-identical accuracy to exact FRNN-OWA for most sample sizes within query times that are up to several orders of magnitude faster
Fuzzy-rough-learn 0.1 : a Python library for machine learning with fuzzy rough sets
We present fuzzy-rough-learn, the first Python library of fuzzy rough set machine learning algorithms. It contains three algorithms previously implemented in R and Java, as well as two new algorithms from the recent literature. We briefly discuss the use cases of fuzzy-rough-learn and the design philosophy guiding its development, before providing an overview of the included algorithms and their parameters
Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features
The recent tremendous success of unsupervised word embeddings in a multitude
of applications raises the obvious question if similar methods could be derived
to improve embeddings (i.e. semantic representations) of word sequences as
well. We present a simple but efficient unsupervised objective to train
distributed representations of sentences. Our method outperforms the
state-of-the-art unsupervised models on most benchmark tasks, highlighting the
robustness of the produced general-purpose sentence embeddings.Comment: NAACL 201
Una combinación basada en operadores OWA para la Clasificación de Género Multi-etiqueta de páginas web
This paper presents a new method for genre identification that combines homogeneous classifiers using OWA (Ordered Weighted Averaging) operators. Our method uses character n-grams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages, we applied MLKNN as a multi-label classifier, in which a web page can be affected by more than one genre. Experiments conducted using a known multi-label corpus show that our method achieves good results.En este trabajo se presenta un nuevo método para la identificación de género que combina clasificadores homogéneos utilizando OWA (promedio ponderado) Pedimos operadores. Nuestro método utiliza caracteres n-gramas extraídos de diferentes fuentes de información, tales como URL, título, encabezados y anclajes. Para hacer frente a la complejidad de las páginas web, se aplicó MLKNN como un clasificador multi-etiqueta, en el que una página web puede verse afectada por más de un género. Los experimentos llevados a cabo usando un conocido corpus multi-etiqueta muestran que nuestro método logra buenos resultados
BENCHMARKING CLASSIFIERS - HOW WELL DOES A GOWA-VARIANT OF THE SIMILARITY CLASSIFIER DO IN COMPARISON WITH SELECTED CLASSIFIERS?
Digital data is ubiquitous in nearly all modern businesses. Organizations have more data available, in various formats, than ever before. Machine learning algorithms and predictive analytics utilize the knowledge contained in that data, in order to help the business related decision-making. This study explores predictive analytics by comparing different classification methods – the main interest being in the Generalize Ordered Weighted Average (GOWA)-variant of the similarity classifier.
The target for this research is to find out how what is the GOWA-variant of the similarity classifier and how well it performs compared to other selected classifiers. This study also tries to investigate whether the GOWA-variant of the similarity classifier is a sufficient method to be used in the busi-ness related decision-making. Four different classical classifiers were selected as reference classifiers on the basis of their common usage in machine learning research, and on their availability in the Sta-tistics and Machine Learning Toolbox in MATLAB.
Three different data sets from UCI Machine Learning repository were used for benchmarking the classifiers. The benchmarking process uses fitness function instead of pure classification accuracy to determine the performance of the classifiers. Fitness function combines several measurement criteria into a one common value. With one data set, the GOWA-variant of the similarity classifier per-formed the best. One of the data sets contains credit card client data. It was more complex than the other two data sets and contains clearly business related data. The GOWA-variant performed also well with this data set. Therefore it can be claimed that the GOWA-variant of the similarity classifi-er is a viable option to be used also for solving business related problems
Learning ordered pooling weights in image classification
Spatial pooling is an important step in computer vision systems like
Convolutional Neural Networks or the Bag-of-Words method. The spatial pooling
purpose is to combine neighbouring descriptors to obtain a single descriptor
for a given region (local or global). The resultant combined vector must be as
discriminant as possible, in other words, must contain relevant information,
while removing irrelevant and confusing details. Maximum and average are the
most common aggregation functions used in the pooling step. To improve the
aggregation of relevant information without degrading their discriminative
power for image classification, we introduce a simple but effective scheme
based on Ordered Weighted Average (OWA) aggregation operators. We present a
method to learn the weights of the OWA aggregation operator in a Bag-of-Words
framework and in Convolutional Neural Networks, and provide an extensive
evaluation showing that OWA based pooling outperforms classical aggregation
operators
- …