333 research outputs found
Guaranteed clustering and biclustering via semidefinite programming
Identifying clusters of similar objects in data plays a significant role in a
wide range of applications. As a model problem for clustering, we consider the
densest k-disjoint-clique problem, whose goal is to identify the collection of
k disjoint cliques of a given weighted complete graph maximizing the sum of the
densities of the complete subgraphs induced by these cliques. In this paper, we
establish conditions ensuring exact recovery of the densest k cliques of a
given graph from the optimal solution of a particular semidefinite program. In
particular, the semidefinite relaxation is exact for input graphs corresponding
to data consisting of k large, distinct clusters and a smaller number of
outliers. This approach also yields a semidefinite relaxation for the
biclustering problem with similar recovery guarantees. Given a set of objects
and a set of features exhibited by these objects, biclustering seeks to
simultaneously group the objects and features according to their expression
levels. This problem may be posed as partitioning the nodes of a weighted
bipartite complete graph such that the sum of the densities of the resulting
bipartite complete subgraphs is maximized. As in our analysis of the densest
k-disjoint-clique problem, we show that the correct partition of the objects
and features can be recovered from the optimal solution of a semidefinite
program in the case that the given data consists of several disjoint sets of
objects exhibiting similar features. Empirical evidence from numerical
experiments supporting these theoretical guarantees is also provided
The Interface Region Imaging Spectrograph (IRIS)
The Interface Region Imaging Spectrograph (IRIS) small explorer spacecraft
provides simultaneous spectra and images of the photosphere, chromosphere,
transition region, and corona with 0.33-0.4 arcsec spatial resolution, 2 s
temporal resolution and 1 km/s velocity resolution over a field-of-view of up
to 175 arcsec x 175 arcsec. IRIS was launched into a Sun-synchronous orbit on
27 June 2013 using a Pegasus-XL rocket and consists of a 19-cm UV telescope
that feeds a slit-based dual-bandpass imaging spectrograph. IRIS obtains
spectra in passbands from 1332-1358, 1389-1407 and 2783-2834 Angstrom including
bright spectral lines formed in the chromosphere (Mg II h 2803 Angstrom and Mg
II k 2796 Angstrom) and transition region (C II 1334/1335 Angstrom and Si IV
1394/1403 Angstrom). Slit-jaw images in four different passbands (C II 1330, Si
IV 1400, Mg II k 2796 and Mg II wing 2830 Angstrom) can be taken simultaneously
with spectral rasters that sample regions up to 130 arcsec x 175 arcsec at a
variety of spatial samplings (from 0.33 arcsec and up). IRIS is sensitive to
emission from plasma at temperatures between 5000 K and 10 MK and will advance
our understanding of the flow of mass and energy through an interface region,
formed by the chromosphere and transition region, between the photosphere and
corona. This highly structured and dynamic region not only acts as the conduit
of all mass and energy feeding into the corona and solar wind, it also requires
an order of magnitude more energy to heat than the corona and solar wind
combined. The IRIS investigation includes a strong numerical modeling component
based on advanced radiative-MHD codes to facilitate interpretation of
observations of this complex region. Approximately eight Gbytes of data (after
compression) are acquired by IRIS each day and made available for unrestricted
use within a few days of the observation.Comment: 53 pages, 15 figure
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
Over the past five decades, k-means has become the clustering algorithm of
choice in many application domains primarily due to its simplicity, time/space
efficiency, and invariance to the ordering of the data points. Unfortunately,
the algorithm's sensitivity to the initial selection of the cluster centers
remains to be its most serious drawback. Numerous initialization methods have
been proposed to address this drawback. Many of these methods, however, have
time complexity superlinear in the number of data points, which makes them
impractical for large data sets. On the other hand, linear methods are often
random and/or sensitive to the ordering of the data points. These methods are
generally unreliable in that the quality of their results is unpredictable.
Therefore, it is common practice to perform multiple runs of such methods and
take the output of the run that produces the best results. Such a practice,
however, greatly increases the computational requirements of the otherwise
highly efficient k-means algorithm. In this chapter, we investigate the
empirical performance of six linear, deterministic (non-random), and
order-invariant k-means initialization methods on a large and diverse
collection of data sets from the UCI Machine Learning Repository. The results
demonstrate that two relatively unknown hierarchical initialization methods due
to Su and Dy outperform the remaining four methods with respect to two
objective effectiveness criteria. In addition, a recent method due to Erisoglu
et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms
(Springer, 2014). arXiv admin note: substantial text overlap with
arXiv:1304.7465, arXiv:1209.196
A survey on feature weighting based K-Means algorithms
This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of Classification [de Amorim, R. C., 'A survey on feature weighting based K-Means algorithms', Journal of Classification, Vol. 33(2): 210-242, August 25, 2016]. Subject to embargo. Embargo end date: 25 August 2017. The final publication is available at Springer via http://dx.doi.org/10.1007/s00357-016-9208-4 © Classification Society of North America 2016In a real-world data set there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process. With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analysing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.Peer reviewedFinal Accepted Versio
Least squares optimization: From theory to practice
Nowadays, Nonlinear Least-Squares embodies the foundation of many Robotics and Computer Vision systems. The research community deeply investigated this topic in the last few years, and this resulted in the development of several open-source solvers to approach constantly increasing classes of problems. In this work, we propose a unified methodology to design and develop efficient Least-Squares Optimization algorithms, focusing on the structures and patterns of each specific domain. Furthermore, we present a novel open-source optimization system that addresses problems transparently with a different structure and designed to be easy to extend. The system is written in modern C++ and runs efficiently on embedded systemsWe validated our approach by conducting comparative experiments on several problems using standard datasets. The results show that our system achieves state-of-the-art performances in all tested scenarios
Population genetic structure of the bank vole Myodes glareolus within its glacial refugium in peninsular Italy
BNCI Horizon 2020 - Towards a Roadmap for Brain/Neural Computer Interaction
In this paper, we present BNCI Horizon 2020, an EU Coordination and Support Action (CSA) that will provide a roadmap for brain-computer interaction research for the next years, starting in 2013, and aiming at research efforts until 2020 and beyond. The project is a successor of the earlier EU-funded Future BNCI CSA that started in 2010 and produced a roadmap for a shorter time period. We present how we, a consortium of the main European BCI research groups as well as companies and end user representatives, expect to tackle the problem of designing a roadmap for BCI research. In this paper, we define the field with its recent developments, in particular by considering publications and EU-funded research projects, and we discuss how we plan to involve research groups, companies, and user groups in our effort to pave the way for useful and fruitful EU-funded BCI research for the next ten years
Workload measurement in a communication application operated through a P300-based brain-computer interface
n/
The Estimation of Cortical Activity for Brain-Computer Interface: Applications in a Domotic Context
In order to analyze whether the use of the cortical activity, estimated from noninvasive EEG recordings, could be useful to detect mental states related to the imagination of limb movements, we estimate cortical activity from high-resolution EEG recordings in a group of healthy subjects by using realistic head models. Such cortical activity was estimated in region of interest associated with the subject's Brodmann areas by using a depth-weighted minimum norm technique. Results showed that the use of the cortical-estimated activity instead of the unprocessed EEG improves the recognition of the mental states associated to the limb movement imagination in the group of normal subjects. The BCI methodology presented here has been used in a group of disabled patients in order to give
them a suitable control of several electronic devices disposed in a three-room environment devoted to the neurorehabilitation. Four of six patients were able to control several electronic devices in this domotic context with the BCI system
- …
