Search CORE

54,981 research outputs found

k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

Author: Cunningham Padraig
Delany Sarah Jane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/04/2020
Field of study

Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

arXiv.org e-Print Archive

Arrow@TUDublin

An Intelligent System For Arabic Text Categorization

Author: Fayed Z.T.
Habib M.B.
Syiam M.M.
Publication venue: Faculty of Computers and Information Sciences, Ain Shams University
Publication date: 01/01/2006
Field of study

Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and feature selection are tried. Moreover, the document is represented using several term weighting schemes and finally the k-nearest neighbor and Rocchio classifiers are used for classification process. Experiments are performed over self collected data corpus and the results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm for Arabic language. The results also show that a hybrid approach of document frequency and information gain is the preferable feature selection criterion and normalized-tfidf is the best weighting scheme. Finally, Rocchio classifier has the advantage over k-nearest neighbor classifier in the classification process. The experimental results illustrate that the proposed model is an efficient method and gives generalization accuracy of about 98%

Maastricht University Research Portal

University of Twente Research Information

An agent-driven semantical identifier using radial basis neural networks and reinforcement learning

Author: Napoli Christian
Pappalardo Giuseppe
Tramontana Emiliano
Publication venue
Publication date: 01/01/2014
Field of study

Due to the huge availability of documents in digital form, and the deception possibility raise bound to the essence of digital documents and the way they are spread, the authorship attribution problem has constantly increased its relevance. Nowadays, authorship attribution,for both information retrieval and analysis, has gained great importance in the context of security, trust and copyright preservation. This work proposes an innovative multi-agent driven machine learning technique that has been developed for authorship attribution. By means of a preprocessing for word-grouping and time-period related analysis of the common lexicon, we determine a bias reference level for the recurrence frequency of the words within analysed texts, and then train a Radial Basis Neural Networks (RBPNN)-based classifier to identify the correct author. The main advantage of the proposed approach lies in the generality of the semantic analysis, which can be applied to different contexts and lexical domains, without requiring any modification. Moreover, the proposed system is able to incorporate an external input, meant to tune the classifier, and then self-adjust by means of continuous learning reinforcement.Comment: Published on: Proceedings of the XV Workshop "Dagli Oggetti agli Agenti" (WOA 2014), Catania, Italy, Sepember. 25-26, 201

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

A Novel Feature Selection and Extraction Technique for Classification

Author: Bakshi Ainesh
Goel Kratarth
Vohra Raunaq
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/12/2014
Field of study

This paper presents a versatile technique for the purpose of feature selection and extraction - Class Dependent Features (CDFs). We use CDFs to improve the accuracy of classification and at the same time control computational expense by tackling the curse of dimensionality. In order to demonstrate the generality of this technique, it is applied to handwritten digit recognition and text categorization.Comment: 2 pages, 2 tables, published at IEEE SMC 201

arXiv.org e-Print Archive

Crossref

Similarity Learning for High-Dimensional Sparse Data

Author: Bellet Aurélien
Liu Kuan
Sha Fei
Publication venue
Publication date: 09/09/2019
Field of study

A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.Comment: 14 pages. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS 2015). Matlab code: https://github.com/bellet/HDS

arXiv.org e-Print Archive

CiteSeerX

Cross validation of bi-modal health-related stress assessment

Author: A Marty
A Tawari
B Arnrich
B Kedem
B Schuller
B Schölkopf
D Morrison
D Ververidis
DA Craig
DF Tolin
DM Hilty
DR Ladd
DW Aha
EB Baum
Egon L. van den Broek
EL Broek van den
EL Broek van den
EN Khalil
F Pallavicini
Frans van der Sluis
IR Murray
J Blascovich
J Krumm
J Sánchez-Meca
J Wolpe
JA Healey
K Domschke
K Nieuwenhuijsen
KR Scherer
LK Hansen
LM Blainlow
M El Ayadi
M Hall
MD Zwaag van der
MG Newman
N Rüscha
N Rüscha
P Rani
PL Bartlett
R Banse
R Cowie
R Likert
RB Fillingim
RC Kessler
RG Lyons
RW Picard
S Wu
T Shimamura
TM Cover
Ton Dijkstra
TR Kosten
Publication venue: Springer Verlag
Publication date: 01/01/2011
Field of study

This study explores the feasibility of objective and ubiquitous stress assessment. 25 post-traumatic stress disorder patients participated in a controlled storytelling (ST) study and an ecologically valid reliving (RL) study. The two studies were meant to represent an early and a late therapy session, and each consisted of a "happy" and a "stress triggering" part. Two instruments were chosen to assess the stress level of the patients at various point in time during therapy: (i) speech, used as an objective and ubiquitous stress indicator and (ii) the subjective unit of distress (SUD), a clinically validated Likert scale. In total, 13 statistical parameters were derived from each of five speech features: amplitude, zero-crossings, power, high-frequency power, and pitch. To model the emotional state of the patients, 28 parameters were selected from this set by means of a linear regression model and, subsequently, compressed into 11 principal components. The SUD and speech model were cross-validated, using 3 machine learning algorithms. Between 90% (2 SUD levels) and 39% (10 SUD levels) correct classification was achieved. The two sessions could be discriminated in 89% (for ST) and 77% (for RL) of the cases. This report fills a gap between laboratory and clinical studies, and its results emphasize the usefulness of Computer Aided Diagnostics (CAD) for mental health care

Crossref

Springer - Publisher Connector

Copenhagen University Research Information System

Radboud Repository

University of Twente Research Information

Recommended from our members

Mining learning preferences in web-based instruction: Holists vs. Serialists

Author: Chen SY
Clewley N
Liu X
Publication venue: International Forum of Educational Technology & Society
Publication date: 01/01/2011
Field of study

Web-based instruction programs are used by learners with diverse knowledge, skills and needs. These differences determine their preferences for the design of Web-based instruction programs and ultimately influence learners' success in using them. Cognitive style has been found to significantly affect learners' preferences of web-based instruction programs. However, the majority of previous studies focus on Field Dependence/Independence. Pask's Holist/Serialist dimension has conceptual links with Field Dependence/Independence but it is left mostly unstudied. Therefore, this study focuses on identifying how this dimension of cognitive style affects learner preferences of Web-based instruction programs. A data mining approach is used to illustrate the difference in preferences between Holists and Serialists. The findings show that there are clear differences in regard to content presentation and navigation support. A set of design features were then produced to help designers incorporate cognitive styles into the development of Web-based instruction programs to ensure that they can accommodate learners' different preferences.This work is partially funded by National Science Council, Taiwan, ROC (NSC 98-2511-S-008-012- MY3; NSC 99- 2511-S-008 -003 -MY2; NSC 99-2631-S-008-001)

Brunel University Research Archive

Neural network setups for a precise detection of the many-body localization transition: finite-size scaling and limitations

Author: Alet Fabien
Théveniaut Hugo
Publication venue: 'American Physical Society (APS)'
Publication date: 02/12/2019
Field of study

Determining phase diagrams and phase transitions semi-automatically using machine learning has received a lot of attention recently, with results in good agreement with more conventional approaches in most cases. When it comes to more quantitative predictions, such as the identification of universality class or precise determination of critical points, the task is more challenging. As an exacting test-bed, we study the Heisenberg spin-1/2 chain in a random external field that is known to display a transition from a many-body localized to a thermalizing regime, which nature is not entirely characterized. We introduce different neural network structures and dataset setups to achieve a finite-size scaling analysis with the least possible physical bias (no assumed knowledge on the phase transition and directly inputing wave-function coefficients), using state-of-the-art input data simulating chains of sizes up to L=24. In particular, we use domain adversarial techniques to ensure that the network learns scale-invariant features. We find a variability of the output results with respect to network and training parameters, resulting in relatively large uncertainties on final estimates of critical point and correlation length exponent which tend to be larger than the values obtained from conventional approaches. We put the emphasis on interpretability throughout the paper and discuss what the network appears to learn for the various used architectures. Our findings show that a it quantitative analysis of phase transitions of unknown nature remains a difficult task with neural networks when using the minimally engineered physical input.Comment: v2: published versio

arXiv.org e-Print Archive

HAL-INSA Toulouse