3,491 research outputs found
Blessing of dimensionality at the edge
In this paper we present theory and algorithms enabling classes of Artificial
Intelligence (AI) systems to continuously and incrementally improve with
a-priori quantifiable guarantees - or more specifically remove classification
errors - over time. This is distinct from state-of-the-art machine learning,
AI, and software approaches. Another feature of this approach is that, in the
supervised setting, the computational complexity of training is linear in the
number of training samples. At the time of classification, the computational
complexity is bounded by few inner product calculations. Moreover, the
implementation is shown to be very scalable. This makes it viable for
deployment in applications where computational power and memory are limited,
such as embedded environments. It enables the possibility for fast on-line
optimisation using improved training samples. The approach is based on the
concentration of measure effects and stochastic separation theorems and is
illustrated with an example on the identification faulty processes in Computer
Numerical Control (CNC) milling and with a case study on adaptive removal of
false positives in an industrial video surveillance and analytics system
How many independent bets are there?
The benefits of portfolio diversification is a central tenet implicit to
modern financial theory and practice. Linked to diversification is the notion
of breadth. Breadth is correctly thought of as the number of in- dependent bets
available to an investor. Conventionally applications us- ing breadth
frequently assume only the number of separate bets. There may be a large
discrepancy between these two interpretations. We uti- lize a simple
singular-value decomposition (SVD) and the Keiser-Gutman stopping criterion to
select the integer-valued effective dimensionality of the correlation matrix of
returns. In an emerging market such as South African we document an estimated
breadth that is considerably lower than anticipated. This lack of
diversification may be because of market concentration, exposure to the global
commodity cycle and local currency volatility. We discuss some practical
extensions to a more statistically correct interpretation of market breadth,
and its theoretical implications for both global and domestic investors.Comment: Less technical rewrite. 12 Pages, 6 Figures (.eps
The blessing of Dimensionality : feature selection outperforms functional connectivity-based feature transformation to classify ADHD subjects from EEG patterns of phase synchronisation
Functional connectivity (FC) characterizes brain activity from a multivariate set of N brain signals by means of an NxN matrix A, whose elements estimate the dependence within each possible pair of signals. Such matrix can be used as a feature vector for (un)supervised subject classification. Yet if N is large, A is highly dimensional. Little is known on the effect that different strategies to reduce its dimensionality may have on its classification ability. Here, we apply different machine learning algorithms to classify 33 children (age [6-14 years]) into two groups (healthy controls and Attention Deficit Hyperactivity Disorder patients) using EEG FC patterns obtained from two phase synchronisation indices. We found that the classification is highly successful (around 95%) if the whole matrix A is taken into account, and the relevant features are selected using machine learning methods. However, if FC algorithms are applied instead to transform A into a lower dimensionality matrix, the classification rate drops to less than 80%. We conclude that, for the purpose of pattern classification, the relevant features should be selected among the elements of A by using appropriate machine learning algorithms
High--Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality
High-dimensional data and high-dimensional representations of reality are
inherent features of modern Artificial Intelligence systems and applications of
machine learning. The well-known phenomenon of the "curse of dimensionality"
states: many problems become exponentially difficult in high dimensions.
Recently, the other side of the coin, the "blessing of dimensionality", has
attracted much attention. It turns out that generic high-dimensional datasets
exhibit fairly simple geometric properties. Thus, there is a fundamental
tradeoff between complexity and simplicity in high dimensional spaces. Here we
present a brief explanatory review of recent ideas, results and hypotheses
about the blessing of dimensionality and related simplifying effects relevant
to machine learning and neuroscience.Comment: 18 pages, 5 figure
The configuration multi-edge model: Assessing the effect of fixing node strengths on weighted network magnitudes
Complex networks grow subject to structural constraints which affect their
measurable properties. Assessing the effect that such constraints impose on
their observables is thus a crucial aspect to be taken into account in their
analysis. To this end,we examine the effect of fixing the strength sequence in
multi-edge networks on several network observables such as degrees, disparity,
average neighbor properties and weight distribution using an ensemble approach.
We provide a general method to calculate any desired weighted network metric
and we show that several features detected in real data could be explained
solely by structural constraints. We thus justify the need of analytical null
models to be used as basis to assess the relevance of features found in real
data represented in weighted network form.Comment: 11 pages. 4 figure
The Bane of Low-Dimensionality Clustering
In this paper, we give a conditional lower bound of on
running time for the classic k-median and k-means clustering objectives (where
n is the size of the input), even in low-dimensional Euclidean space of
dimension four, assuming the Exponential Time Hypothesis (ETH). We also
consider k-median (and k-means) with penalties where each point need not be
assigned to a center, in which case it must pay a penalty, and extend our lower
bound to at least three-dimensional Euclidean space.
This stands in stark contrast to many other geometric problems such as the
traveling salesman problem, or computing an independent set of unit spheres.
While these problems benefit from the so-called (limited) blessing of
dimensionality, as they can be solved in time or
in d dimensions, our work shows that widely-used clustering
objectives have a lower bound of , even in dimension four.
We complete the picture by considering the two-dimensional case: we show that
there is no algorithm that solves the penalized version in time less than
, and provide a matching upper bound of .
The main tool we use to establish these lower bounds is the placement of
points on the moment curve, which takes its inspiration from constructions of
point sets yielding Delaunay complexes of high complexity
- …