438 research outputs found
A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm
K-means is undoubtedly the most widely used partitional clustering algorithm.
Unfortunately, due to its gradient descent nature, this algorithm is highly
sensitive to the initial placement of the cluster centers. Numerous
initialization methods have been proposed to address this problem. In this
paper, we first present an overview of these methods with an emphasis on their
computational efficiency. We then compare eight commonly used linear time
complexity initialization methods on a large and diverse collection of data
sets using various performance criteria. Finally, we analyze the experimental
results using non-parametric statistical tests and provide recommendations for
practitioners. We demonstrate that popular initialization methods often perform
poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table
Study of the effects of salt crystallisation on degradation of limestone rocks
Salt crystallization is widely recognized as a cause of deterioration of porous building materials. In particular, the crystallization pressure of salt crystals growing in confined pores is found to be the main cause for damage. The aim of this study is to better understand the degradation of porous rocks induced by salt crystallisation and correlate such processes with the intrinsic characteristics of materials. With this intend, an experimental salt weathering simulation has been carried out on two limestones widely used in the Baroque architecture of eastern Sicily. A systematic approach including petrographic, porosimetric and colorimetric analyses, was used to evaluate the correlation among salt crystallisation, microstructural and chromatic variations of limestone. Results showed a quite different resistance of the two limestones to salt damage, and this was found to be strongly dependent on their pore structure and textural characteristics
The Interface Region Imaging Spectrograph (IRIS)
The Interface Region Imaging Spectrograph (IRIS) small explorer spacecraft
provides simultaneous spectra and images of the photosphere, chromosphere,
transition region, and corona with 0.33-0.4 arcsec spatial resolution, 2 s
temporal resolution and 1 km/s velocity resolution over a field-of-view of up
to 175 arcsec x 175 arcsec. IRIS was launched into a Sun-synchronous orbit on
27 June 2013 using a Pegasus-XL rocket and consists of a 19-cm UV telescope
that feeds a slit-based dual-bandpass imaging spectrograph. IRIS obtains
spectra in passbands from 1332-1358, 1389-1407 and 2783-2834 Angstrom including
bright spectral lines formed in the chromosphere (Mg II h 2803 Angstrom and Mg
II k 2796 Angstrom) and transition region (C II 1334/1335 Angstrom and Si IV
1394/1403 Angstrom). Slit-jaw images in four different passbands (C II 1330, Si
IV 1400, Mg II k 2796 and Mg II wing 2830 Angstrom) can be taken simultaneously
with spectral rasters that sample regions up to 130 arcsec x 175 arcsec at a
variety of spatial samplings (from 0.33 arcsec and up). IRIS is sensitive to
emission from plasma at temperatures between 5000 K and 10 MK and will advance
our understanding of the flow of mass and energy through an interface region,
formed by the chromosphere and transition region, between the photosphere and
corona. This highly structured and dynamic region not only acts as the conduit
of all mass and energy feeding into the corona and solar wind, it also requires
an order of magnitude more energy to heat than the corona and solar wind
combined. The IRIS investigation includes a strong numerical modeling component
based on advanced radiative-MHD codes to facilitate interpretation of
observations of this complex region. Approximately eight Gbytes of data (after
compression) are acquired by IRIS each day and made available for unrestricted
use within a few days of the observation.Comment: 53 pages, 15 figure
Guaranteed clustering and biclustering via semidefinite programming
Identifying clusters of similar objects in data plays a significant role in a
wide range of applications. As a model problem for clustering, we consider the
densest k-disjoint-clique problem, whose goal is to identify the collection of
k disjoint cliques of a given weighted complete graph maximizing the sum of the
densities of the complete subgraphs induced by these cliques. In this paper, we
establish conditions ensuring exact recovery of the densest k cliques of a
given graph from the optimal solution of a particular semidefinite program. In
particular, the semidefinite relaxation is exact for input graphs corresponding
to data consisting of k large, distinct clusters and a smaller number of
outliers. This approach also yields a semidefinite relaxation for the
biclustering problem with similar recovery guarantees. Given a set of objects
and a set of features exhibited by these objects, biclustering seeks to
simultaneously group the objects and features according to their expression
levels. This problem may be posed as partitioning the nodes of a weighted
bipartite complete graph such that the sum of the densities of the resulting
bipartite complete subgraphs is maximized. As in our analysis of the densest
k-disjoint-clique problem, we show that the correct partition of the objects
and features can be recovered from the optimal solution of a semidefinite
program in the case that the given data consists of several disjoint sets of
objects exhibiting similar features. Empirical evidence from numerical
experiments supporting these theoretical guarantees is also provided
Knocking Down Low Molecular Weight Protein Tyrosine Phosphatase (LMW-PTP) Reverts Chemoresistance through Inactivation of Src and Bcr-Abl Proteins
The development of multidrug resistance (MDR) limits the efficacy of continuous chemotherapeutic treatment in chronic myelogenous leukemia (CML). Low molecular weight protein tyrosine phosphatase (LMW-PTP) is up-regulated in several cancers and has been associated to poor prognosis. This prompted us to investigate the involvement of LMW-PTP in MDR. In this study, we investigated the role of LMW-PTP in a chemoresistant CML cell line, Lucena-1. Our results showed that LMW-PTP is highly expressed and 7-fold more active in Lucena-1 cells compared to K562 cells, the non-resistant cell line. Knocking down LMW-PTP in Lucena-1 cells reverted chemoresistance to vincristine and imatinib mesylate, followed by a decrease of Src and Bcr-Abl phosphorylation at the activating sites, inactivating both kinases. On the other hand, overexpression of LMW-PTP in K562 cells led to chemoresistance to vincristine. Our findings describe, for the first time, that LMW-PTP cooperates with MDR phenotype, at least in part, through maintaining Src and Bcr-Abl kinases in more active statuses. These findings suggest that inhibition of LMW-PTP may be a useful strategy for the development of therapies for multidrug resistant CML
Evolutionary history and species delimitations: a case study of the hazel dormouse, Muscardinus avellanarius
Robust identification of species and significant evolutionary units (ESUs) is essential to implement appropriate conservation strategies for endangered species. However, definitions of species or ESUs are numerous and
sometimes controversial, which might lead to biased conclusions, with serious consequences for the management of
endangered species. The hazel dormouse, an arboreal rodent of conservation concern throughout Europe is an
ideal model species to investigate the relevance of species identification for conservation purposes. This species is a
member of the Gliridae family, which is protected in Europe and seriously threatened in the northern part of its
range. We assessed the extent of genetic subdivision in the hazel dormouse by sequencing one mitochondrial gene
(cytb) and two nuclear genes (BFIBR, APOB) and genotyping 10 autosomal microsatellites. These data were analysed using a combination of phylogenetic analyses and species delimitation methods. Multilocus analyses revealed
the presence of two genetically distinct lineages (approximately 11 % cytb genetic divergence, no nuclear alleles
shared) for the hazel dormouse in Europe, which presumably diverged during the Late Miocene. The phylogenetic
patterns suggests that Muscardinus avellanarius populations could be split into two cryptic species respectively
distributed in western and central-eastern Europe and Anatolia. However, the comparison of several species
definitions and methods estimated the number of species between 1 and 10. Our results revealed the difficulty in
choosing and applying an appropriate criterion and markers to identify species and highlight the fact that consensus
guidelines are essential for species delimitation in the future. In addition, this study contributes to a better
knowledge about the evolutionary history of the species
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
Over the past five decades, k-means has become the clustering algorithm of
choice in many application domains primarily due to its simplicity, time/space
efficiency, and invariance to the ordering of the data points. Unfortunately,
the algorithm's sensitivity to the initial selection of the cluster centers
remains to be its most serious drawback. Numerous initialization methods have
been proposed to address this drawback. Many of these methods, however, have
time complexity superlinear in the number of data points, which makes them
impractical for large data sets. On the other hand, linear methods are often
random and/or sensitive to the ordering of the data points. These methods are
generally unreliable in that the quality of their results is unpredictable.
Therefore, it is common practice to perform multiple runs of such methods and
take the output of the run that produces the best results. Such a practice,
however, greatly increases the computational requirements of the otherwise
highly efficient k-means algorithm. In this chapter, we investigate the
empirical performance of six linear, deterministic (non-random), and
order-invariant k-means initialization methods on a large and diverse
collection of data sets from the UCI Machine Learning Repository. The results
demonstrate that two relatively unknown hierarchical initialization methods due
to Su and Dy outperform the remaining four methods with respect to two
objective effectiveness criteria. In addition, a recent method due to Erisoglu
et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms
(Springer, 2014). arXiv admin note: substantial text overlap with
arXiv:1304.7465, arXiv:1209.196
On the Use of Electrooculogram for Efficient Human Computer Interfaces
The aim of this study is to present electrooculogram signals that can be used for human computer interface efficiently. Establishing an efficient alternative channel for communication without overt speech and hand movements is important to increase the quality of life for patients suffering from Amyotrophic Lateral Sclerosis or other illnesses that prevent correct limb and facial muscular responses. We have made several experiments to compare the P300-based BCI speller and EOG-based new system. A five-letter word can be written on average in 25 seconds and in 105 seconds with the EEG-based device. Giving message such as “clean-up” could be performed in 3 seconds with the new system. The new system is more efficient than P300-based BCI system in terms of accuracy, speed, applicability, and cost efficiency. Using EOG signals, it is possible to improve the communication abilities of those patients who can move their eyes
BNCI Horizon 2020 - Towards a Roadmap for Brain/Neural Computer Interaction
In this paper, we present BNCI Horizon 2020, an EU Coordination and Support Action (CSA) that will provide a roadmap for brain-computer interaction research for the next years, starting in 2013, and aiming at research efforts until 2020 and beyond. The project is a successor of the earlier EU-funded Future BNCI CSA that started in 2010 and produced a roadmap for a shorter time period. We present how we, a consortium of the main European BCI research groups as well as companies and end user representatives, expect to tackle the problem of designing a roadmap for BCI research. In this paper, we define the field with its recent developments, in particular by considering publications and EU-funded research projects, and we discuss how we plan to involve research groups, companies, and user groups in our effort to pave the way for useful and fruitful EU-funded BCI research for the next ten years
- …
