Search CORE

3,491 research outputs found

Blessing of dimensionality at the edge

Author: Gorban Alexander N.
McEwan Alistair A.
Meshkinfamfard Sepehr
Tang Lixin
Tyukin Ivan Y.
Publication venue
Publication date: 10/07/2020
Field of study

In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setting, the computational complexity of training is linear in the number of training samples. At the time of classification, the computational complexity is bounded by few inner product calculations. Moreover, the implementation is shown to be very scalable. This makes it viable for deployment in applications where computational power and memory are limited, such as embedded environments. It enables the possibility for fast on-line optimisation using improved training samples. The approach is based on the concentration of measure effects and stochastic separation theorems and is illustrated with an example on the identification faulty processes in Computer Numerical Control (CNC) milling and with a case study on adaptive removal of false positives in an industrial video surveillance and analytics system

arXiv.org e-Print Archive

How many independent bets are there?

Author: AM Rudin
D Wilcox
DA Jackson
Daniel Polakow
JA Rice
L Guttman
LR Thomas
R Clarke
R Fernholz
RC Grinold
Tim Gebbie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/01/2008
Field of study

The benefits of portfolio diversification is a central tenet implicit to modern financial theory and practice. Linked to diversification is the notion of breadth. Breadth is correctly thought of as the number of in- dependent bets available to an investor. Conventionally applications us- ing breadth frequently assume only the number of separate bets. There may be a large discrepancy between these two interpretations. We uti- lize a simple singular-value decomposition (SVD) and the Keiser-Gutman stopping criterion to select the integer-valued effective dimensionality of the correlation matrix of returns. In an emerging market such as South African we document an estimated breadth that is considerably lower than anticipated. This lack of diversification may be because of market concentration, exposure to the global commodity cycle and local currency volatility. We discuss some practical extensions to a more statistically correct interpretation of market breadth, and its theoretical implications for both global and domestic investors.Comment: Less technical rewrite. 12 Pages, 6 Figures (.eps

arXiv.org e-Print Archive

Crossref

The blessing of Dimensionality : feature selection outperforms functional connectivity-based feature transformation to classify ADHD subjects from EEG patterns of phase synchronisation

Author: García-Torres Miguel
González Julián J.
Mañas Soledad
Melián-Batista Belén
Méndez Leopoldo
Pereda Ernesto
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Functional connectivity (FC) characterizes brain activity from a multivariate set of N brain signals by means of an NxN matrix A, whose elements estimate the dependence within each possible pair of signals. Such matrix can be used as a feature vector for (un)supervised subject classification. Yet if N is large, A is highly dimensional. Little is known on the effect that different strategies to reduce its dimensionality may have on its classification ability. Here, we apply different machine learning algorithms to classify 33 children (age [6-14 years]) into two groups (healthy controls and Attention Deficit Hyperactivity Disorder patients) using EEG FC patterns obtained from two phase synchronisation indices. We found that the classification is highly successful (around 95%) if the whole matrix A is taken into account, and the relevant features are selected using machine learning methods. However, if FC algorithms are applied instead to transform A into a lower dimensionality matrix, the classification rate drops to less than 80%. We conclude that, for the purpose of pattern classification, the relevant features should be selected among the elements of A by using appropriate machine learning algorithms

Ghent University Academic Bibliography

Directory of Open Access Journals

High--Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality

Author: Gorban Alexander N.
Makarov Valery A.
Tyukin Ivan Y.
Publication venue: 'MDPI AG'
Publication date: 09/01/2020
Field of study

High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much attention. It turns out that generic high-dimensional datasets exhibit fairly simple geometric properties. Thus, there is a fundamental tradeoff between complexity and simplicity in high dimensional spaces. Here we present a brief explanatory review of recent ideas, results and hypotheses about the blessing of dimensionality and related simplifying effects relevant to machine learning and neuroscience.Comment: 18 pages, 5 figure

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Docta Complutense

The configuration multi-edge model: Assessing the effect of fixing node strengths on weighted network magnitudes

Author: Díaz-Guilera Albert
Font-Clos Francesc
Pérez-Vicente Conrad J.
Sagarra Oleguer
Publication venue: 'IOP Publishing'
Publication date: 25/07/2014
Field of study

Complex networks grow subject to structural constraints which affect their measurable properties. Assessing the effect that such constraints impose on their observables is thus a crucial aspect to be taken into account in their analysis. To this end,we examine the effect of fixing the strength sequence in multi-edge networks on several network observables such as degrees, disparity, average neighbor properties and weight distribution using an ensemble approach. We provide a general method to calculate any desired weighted network metric and we show that several features detected in real data could be explained solely by structural constraints. We thus justify the need of analytical null models to be used as basis to assess the relevance of features found in real data represented in weighted network form.Comment: 11 pages. 4 figure

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

The Bane of Low-Dimensionality Clustering

Author: Cohen-Addad Vincent
de Mesmay Arnaud
Rotenberg Eva
Roytman Alan
Publication venue
Publication date: 03/11/2017
Field of study

In this paper, we give a conditional lower bound of

n^{\Omega(k)}

on running time for the classic k-median and k-means clustering objectives (where n is the size of the input), even in low-dimensional Euclidean space of dimension four, assuming the Exponential Time Hypothesis (ETH). We also consider k-median (and k-means) with penalties where each point need not be assigned to a center, in which case it must pay a penalty, and extend our lower bound to at least three-dimensional Euclidean space. This stands in stark contrast to many other geometric problems such as the traveling salesman problem, or computing an independent set of unit spheres. While these problems benefit from the so-called (limited) blessing of dimensionality, as they can be solved in time

n^{O(k^{1-1/d})}

2^{n^{1-1/d}}

in d dimensions, our work shows that widely-used clustering objectives have a lower bound of

n^{\Omega(k)}

, even in dimension four. We complete the picture by considering the two-dimensional case: we show that there is no algorithm that solves the penalized version in time less than

n^{o(\sqrt{k})}

, and provide a matching upper bound of

n^{O(\sqrt{k})}

. The main tool we use to establish these lower bounds is the placement of points on the moment curve, which takes its inspiration from constructions of point sets yielding Delaunay complexes of high complexity

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Copenhagen University Research Information System

Online Research Database In Technology