333 research outputs found

    Guaranteed clustering and biclustering via semidefinite programming

    Get PDF
    Identifying clusters of similar objects in data plays a significant role in a wide range of applications. As a model problem for clustering, we consider the densest k-disjoint-clique problem, whose goal is to identify the collection of k disjoint cliques of a given weighted complete graph maximizing the sum of the densities of the complete subgraphs induced by these cliques. In this paper, we establish conditions ensuring exact recovery of the densest k cliques of a given graph from the optimal solution of a particular semidefinite program. In particular, the semidefinite relaxation is exact for input graphs corresponding to data consisting of k large, distinct clusters and a smaller number of outliers. This approach also yields a semidefinite relaxation for the biclustering problem with similar recovery guarantees. Given a set of objects and a set of features exhibited by these objects, biclustering seeks to simultaneously group the objects and features according to their expression levels. This problem may be posed as partitioning the nodes of a weighted bipartite complete graph such that the sum of the densities of the resulting bipartite complete subgraphs is maximized. As in our analysis of the densest k-disjoint-clique problem, we show that the correct partition of the objects and features can be recovered from the optimal solution of a semidefinite program in the case that the given data consists of several disjoint sets of objects exhibiting similar features. Empirical evidence from numerical experiments supporting these theoretical guarantees is also provided

    The Interface Region Imaging Spectrograph (IRIS)

    Get PDF
    The Interface Region Imaging Spectrograph (IRIS) small explorer spacecraft provides simultaneous spectra and images of the photosphere, chromosphere, transition region, and corona with 0.33-0.4 arcsec spatial resolution, 2 s temporal resolution and 1 km/s velocity resolution over a field-of-view of up to 175 arcsec x 175 arcsec. IRIS was launched into a Sun-synchronous orbit on 27 June 2013 using a Pegasus-XL rocket and consists of a 19-cm UV telescope that feeds a slit-based dual-bandpass imaging spectrograph. IRIS obtains spectra in passbands from 1332-1358, 1389-1407 and 2783-2834 Angstrom including bright spectral lines formed in the chromosphere (Mg II h 2803 Angstrom and Mg II k 2796 Angstrom) and transition region (C II 1334/1335 Angstrom and Si IV 1394/1403 Angstrom). Slit-jaw images in four different passbands (C II 1330, Si IV 1400, Mg II k 2796 and Mg II wing 2830 Angstrom) can be taken simultaneously with spectral rasters that sample regions up to 130 arcsec x 175 arcsec at a variety of spatial samplings (from 0.33 arcsec and up). IRIS is sensitive to emission from plasma at temperatures between 5000 K and 10 MK and will advance our understanding of the flow of mass and energy through an interface region, formed by the chromosphere and transition region, between the photosphere and corona. This highly structured and dynamic region not only acts as the conduit of all mass and energy feeding into the corona and solar wind, it also requires an order of magnitude more energy to heat than the corona and solar wind combined. The IRIS investigation includes a strong numerical modeling component based on advanced radiative-MHD codes to facilitate interpretation of observations of this complex region. Approximately eight Gbytes of data (after compression) are acquired by IRIS each day and made available for unrestricted use within a few days of the observation.Comment: 53 pages, 15 figure

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    A survey on feature weighting based K-Means algorithms

    Get PDF
    This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of Classification [de Amorim, R. C., 'A survey on feature weighting based K-Means algorithms', Journal of Classification, Vol. 33(2): 210-242, August 25, 2016]. Subject to embargo. Embargo end date: 25 August 2017. The final publication is available at Springer via http://dx.doi.org/10.1007/s00357-016-9208-4 © Classification Society of North America 2016In a real-world data set there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process. With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analysing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.Peer reviewedFinal Accepted Versio

    Least squares optimization: From theory to practice

    Get PDF
    Nowadays, Nonlinear Least-Squares embodies the foundation of many Robotics and Computer Vision systems. The research community deeply investigated this topic in the last few years, and this resulted in the development of several open-source solvers to approach constantly increasing classes of problems. In this work, we propose a unified methodology to design and develop efficient Least-Squares Optimization algorithms, focusing on the structures and patterns of each specific domain. Furthermore, we present a novel open-source optimization system that addresses problems transparently with a different structure and designed to be easy to extend. The system is written in modern C++ and runs efficiently on embedded systemsWe validated our approach by conducting comparative experiments on several problems using standard datasets. The results show that our system achieves state-of-the-art performances in all tested scenarios

    BNCI Horizon 2020 - Towards a Roadmap for Brain/Neural Computer Interaction

    Get PDF
    In this paper, we present BNCI Horizon 2020, an EU Coordination and Support Action (CSA) that will provide a roadmap for brain-computer interaction research for the next years, starting in 2013, and aiming at research efforts until 2020 and beyond. The project is a successor of the earlier EU-funded Future BNCI CSA that started in 2010 and produced a roadmap for a shorter time period. We present how we, a consortium of the main European BCI research groups as well as companies and end user representatives, expect to tackle the problem of designing a roadmap for BCI research. In this paper, we define the field with its recent developments, in particular by considering publications and EU-funded research projects, and we discuss how we plan to involve research groups, companies, and user groups in our effort to pave the way for useful and fruitful EU-funded BCI research for the next ten years

    The Estimation of Cortical Activity for Brain-Computer Interface: Applications in a Domotic Context

    Get PDF
    In order to analyze whether the use of the cortical activity, estimated from noninvasive EEG recordings, could be useful to detect mental states related to the imagination of limb movements, we estimate cortical activity from high-resolution EEG recordings in a group of healthy subjects by using realistic head models. Such cortical activity was estimated in region of interest associated with the subject's Brodmann areas by using a depth-weighted minimum norm technique. Results showed that the use of the cortical-estimated activity instead of the unprocessed EEG improves the recognition of the mental states associated to the limb movement imagination in the group of normal subjects. The BCI methodology presented here has been used in a group of disabled patients in order to give them a suitable control of several electronic devices disposed in a three-room environment devoted to the neurorehabilitation. Four of six patients were able to control several electronic devices in this domotic context with the BCI system
    corecore