384 research outputs found
Sparser Johnson-Lindenstrauss Transforms
We give two different and simple constructions for dimensionality reduction
in via linear mappings that are sparse: only an
-fraction of entries in each column of our embedding matrices
are non-zero to achieve distortion with high probability, while
still achieving the asymptotically optimal number of rows. These are the first
constructions to provide subconstant sparsity for all values of parameters,
improving upon previous works of Achlioptas (JCSS 2003) and Dasgupta, Kumar,
and Sarl\'{o}s (STOC 2010). Such distributions can be used to speed up
applications where dimensionality reduction is used.Comment: v6: journal version, minor changes, added Remark 23; v5: modified
abstract, fixed typos, added open problem section; v4: simplified section 4
by giving 1 analysis that covers both constructions; v3: proof of Theorem 25
in v2 was written incorrectly, now fixed; v2: Added another construction
achieving same upper bound, and added proof of near-tight lower bound for DKS
schem
Sequential projection pursuit for optimal transformation of autoregressive coefficients for damage detection in an experimental wind turbine blade
Peer reviewedPublisher PD
Isometric sketching of any set via the Restricted Isometry Property
In this paper we show that for the purposes of dimensionality reduction
certain class of structured random matrices behave similarly to random Gaussian
matrices. This class includes several matrices for which matrix-vector multiply
can be computed in log-linear time, providing efficient dimensionality
reduction of general sets. In particular, we show that using such matrices any
set from high dimensions can be embedded into lower dimensions with near
optimal distortion. We obtain our results by connecting dimensionality
reduction of any set to dimensionality reduction of sparse vectors via a
chaining argument.Comment: 17 page
Building Gene Expression Profile Classifiers with a Simple and Efficient Rejection Option in R
Background: The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. Results: This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. Conclusions: This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be availabl
Deterministic parallel algorithms for bilinear objective functions
Many randomized algorithms can be derandomized efficiently using either the
method of conditional expectations or probability spaces with low independence.
A series of papers, beginning with work by Luby (1988), showed that in many
cases these techniques can be combined to give deterministic parallel (NC)
algorithms for a variety of combinatorial optimization problems, with low time-
and processor-complexity.
We extend and generalize a technique of Luby for efficiently handling
bilinear objective functions. One noteworthy application is an NC algorithm for
maximal independent set. On a graph with edges and vertices, this
takes time and processors, nearly
matching the best randomized parallel algorithms. Other applications include
reduced processor counts for algorithms of Berger (1997) for maximum acyclic
subgraph and Gale-Berlekamp switching games.
This bilinear factorization also gives better algorithms for problems
involving discrepancy. An important application of this is to automata-fooling
probability spaces, which are the basis of a notable derandomization technique
of Sivakumar (2002). Our method leads to large reduction in processor
complexity for a number of derandomization algorithms based on
automata-fooling, including set discrepancy and the Johnson-Lindenstrauss
Lemma
Increasing pattern recognition accuracy for chemical sensing by evolutionary based drift compensation
Artificial olfaction systems, which mimic human olfaction by using arrays of gas chemical sensors combined with pattern recognition methods, represent a potentially low-cost tool in many areas of industry such as perfumery, food and drink production, clinical diagnosis, health and safety, environmental monitoring and process control. However, successful applications of these systems are still largely limited to specialized laboratories. Sensor drift, i.e., the lack of a sensor's stability over time, still limits real in dustrial setups. This paper presents and discusses an evolutionary based adaptive drift-correction method designed to work with state-of-the-art classification systems. The proposed approach exploits a cutting-edge evolutionary strategy to iteratively tweak the coefficients of a linear transformation which can transparently correct raw sensors' measures thus mitigating the negative effects of the drift. The method learns the optimal correction strategy without the use of models or other hypotheses on the behavior of the physical chemical sensors
High-dimensional Black-box Optimization via Divide and Approximate Conquer
Divide and Conquer (DC) is conceptually well suited to high-dimensional
optimization by decomposing a problem into multiple small-scale sub-problems.
However, appealing performance can be seldom observed when the sub-problems are
interdependent. This paper suggests that the major difficulty of tackling
interdependent sub-problems lies in the precise evaluation of a partial
solution (to a sub-problem), which can be overwhelmingly costly and thus makes
sub-problems non-trivial to conquer. Thus, we propose an approximation
approach, named Divide and Approximate Conquer (DAC), which reduces the cost of
partial solution evaluation from exponential time to polynomial time.
Meanwhile, the convergence to the global optimum (of the original problem) is
still guaranteed. The effectiveness of DAC is demonstrated empirically on two
sets of non-separable high-dimensional problems.Comment: 7 pages, 2 figures, conferenc
- ā¦