Search CORE

1,643 research outputs found

Scaling KNN Computation over Large Graphs on a PC

Author: Chiluka Nitin
Kermarrec Anne-Marie
Olivares Javier
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

International audienceThis paper proposes a novel approach to compute K-Nearest Neighbors (KNN) algorithm on a large set of users by lever-aging disk and memory efficiently on a commodity PC. The system is designed to minimize random accesses to disk as well as the amount of data loaded/unloaded from/to disk so as to better utilize the computational power, thus improving the algorithmic efficiency

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

One-class classifiers based on entropic spanning graphs

Author: Alippi Cesare
Livi Lorenzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/08/2016
Field of study

One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the

\alpha

-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Open Research Exeter

Self-similarity, small-world, scale-free scaling, disassortativity, and robustness in hierarchical lattices

Author: Albert
Albert
Atay
Barabási
Barabási
Barahona
Barrat
Barriére
Berker
Bianconi
Boccaletti
Boguñá
Boguñá
Borner
Caldarelli
Callaway
Cohen
Cohen
Comellas
Comellas
Comellas
Cuomo
Dorogovtsev
Dorogovtsev
Dorogovtsev
Dorogovtsev
Dorogvtsev
Doye
Echenique
Gade
Goh
Gong
Griffiths
Hagberg
Hansel
Hinczewski
Holme
Jr
Jung
Kadanoff
Kaufman
Klemn
Maslov
Migdal
Moreno
Newman
Newman
Newman
Newman
Nishikawa
Nishikawa
Otsuka
Ozik
Pastor-Satorras
Pastor-Satorras
Pastor-Satorras
Pecora
Pecora
Qin
Ramasco
Ravasz
Ravasz
S.-G. Zhou
Song
Song
Strogatz
T. Zou
Vieira
Vázquez
Vázquez
Wang
Wang
Watts
Winful
Yang
Yook
Z.-Z. Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhao
Zhao
Zhou
Zhou
Zhou
Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/12/2006
Field of study

In this paper, firstly, we study analytically the topological features of a family of hierarchical lattices (HLs) from the view point of complex networks. We derive some basic properties of HLs controlled by a parameter

q

. Our results show that scale-free networks are not always small-world, and support the conjecture that self-similar scale-free networks are not assortative. Secondly, we define a deterministic family of graphs called small-world hierarchical lattices (SWHLs). Our construction preserves the structure of hierarchical lattices, while the small-world phenomenon arises. Finally, the dynamical processes of intentional attacks and collective synchronization are studied and the comparisons between HLs and Barab{\'asi}-Albert (BA) networks as well as SWHLs are shown. We show that degree distribution of scale-free networks does not suffice to characterize their synchronizability, and that networks with smaller average path length are not always easier to synchronize.Comment: 26 pages, 8 figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Low-shot learning with large-scale diffusion

Author: Douze Matthijs
Hariharan Bharath
Jégou Hervé
Szlam Arthur
Publication venue
Publication date: 15/06/2018
Field of study

This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime

arXiv.org e-Print Archive

Crossref

A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Efficient classification using parallel and scalable compressed model and Its application on intrusion detection

Author: Chen Tieming
Jin Shichao
Kim Okhee
Zhang Xu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

In order to achieve high efficiency of classification in intrusion detection, a compressed model is proposed in this paper which combines horizontal compression with vertical compression. OneR is utilized as horizontal com-pression for attribute reduction, and affinity propagation is employed as vertical compression to select small representative exemplars from large training data. As to be able to computationally compress the larger volume of training data with scalability, MapReduce based parallelization approach is then implemented and evaluated for each step of the model compression process abovementioned, on which common but efficient classification methods can be directly used. Experimental application study on two publicly available datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the classification using the compressed model proposed can effectively speed up the detection procedure at up to 184 times, most importantly at the cost of a minimal accuracy difference with less than 1% on average

arXiv.org e-Print Archive

The Out-of-core KNN Awakens: The light side of computation force on large datasets

Author: Chiluka Nitin
Kermarrec Anne-Marie
Olivares Javier
Publication venue: HAL CCSD
Publication date: 18/05/2016
Field of study

International audienceK-Nearest Neighbors (KNN) is a crucial tool for many applications , e.g. recommender systems, image classification and web-related applications. However, KNN is a resource greedy operation particularly for large datasets. We focus on the challenge of KNN computation over large datasets on a single commodity PC with limited memory. We propose a novel approach to compute KNN on large datasets by leveraging both disk and main memory efficiently. The main rationale of our approach is to minimize random accesses to disk, maximize sequential accesses to data and efficient usage of only the available memory. We evaluate our approach on large datasets, in terms of performance and memory consumption. The evaluation shows that our approach requires only 7% of the time needed by an in-memory baseline to compute a KNN graph

HAL-CentraleSupelec

HAL-Inserm

INRIA a CCSD electronic archive server

HAL-Rennes 1