20 research outputs found

    Gravitational Clustering: A Simple, Robust and Adaptive Approach for Distributed Networks

    Full text link
    Distributed signal processing for wireless sensor networks enables that different devices cooperate to solve different signal processing tasks. A crucial first step is to answer the question: who observes what? Recently, several distributed algorithms have been proposed, which frame the signal/object labelling problem in terms of cluster analysis after extracting source-specific features, however, the number of clusters is assumed to be known. We propose a new method called Gravitational Clustering (GC) to adaptively estimate the time-varying number of clusters based on a set of feature vectors. The key idea is to exploit the physical principle of gravitational force between mass units: streaming-in feature vectors are considered as mass units of fixed position in the feature space, around which mobile mass units are injected at each time instant. The cluster enumeration exploits the fact that the highest attraction on the mobile mass units is exerted by regions with a high density of feature vectors, i.e., gravitational clusters. By sharing estimates among neighboring nodes via a diffusion-adaptation scheme, cooperative and distributed cluster enumeration is achieved. Numerical experiments concerning robustness against outliers, convergence and computational complexity are conducted. The application in a distributed cooperative multi-view camera network illustrates the applicability to real-world problems.Comment: 12 pages, 9 figure

    Bayesian Cluster Enumeration Criterion for Unsupervised Learning

    Full text link
    We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed variables. We show that incorporating the data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the number of clusters is determined as the one associated with the model for which the proposed BIC is maximal. The performance of the proposed two-step algorithm is tested using synthetic and real data sets.Comment: 14 pages, 7 figure

    Improving the Retrieval of Offshore-Onshore Correlation Functions With Machine Learning

    Get PDF
    The retrieval of reliable offshore‐onshore correlation functions is critical to improve our ability to predict long‐period ground motions from megathrust earthquakes. However, localized ambient seismic field sources between offshore and onshore stations can bias correlation functions and generate nonphysical arrivals. We present a two‐step method based on unsupervised learning to improve the quality of correlation functions calculated with the deconvolution technique (e.g., deconvolution functions, DFs). For a DF data set calculated between two stations over a long time period, we first reduce the data set dimensions using the principal component analysis and cluster the features of the low‐dimensional space with a Gaussian mixture model. We then stack the DFs belonging to each cluster together and select the best stacked DF. We apply our technique to DFs calculated every 30 min between an offshore station located on top of the Nankai Trough, Japan, and 78 onshore receivers. Our method removes spurious arrivals and improves the signal‐to‐noise ratio of DFs. Most 30‐min DFs selected by our clustering method are generated during extreme meteorological events such as typhoons. To demonstrate that the DFs obtained with our method contain reliable phases and amplitudes, we use them to simulate the long‐period ground motions from a Mw 5.8 earthquake, which occurred near the offshore station. Results show that the earthquake long‐period ground motions are accurately simulated. Our method can easily be used as an additional processing step when calculating offshore‐onshore DFs and offers a new way to improve the prediction of long‐period ground motions from potential megathrust earthquakes

    Optimal number of pressure sensors for real-time monitoring of distribution networks by using the hypervolume indicator

    Get PDF
    This article proposes a novel methodology to determine the optimal number of pressure sensors for the real-time monitoring of water distribution networks based on a quality hypervolume indicator. The proposed methodology solves the optimization problem for different numbers of pressure sensors, assesses the gain of installing each set of sensors by means of the hypervolume indicator and determines the optimal number of sensors by the variation of the hypervolume indicator. The methodology was applied to a real case study. Several robustness analyses were carried out. The results demonstrate that the methodology is hardly influenced by the method parameters and that a reasonable estimation of the optimal number of sensors can be easily achieved.info:eu-repo/semantics/publishedVersio

    Comparative mapping of the macrochromosomes of eight avian species provides further insight into their phylogenetic relationships and avian karyotype evolution

    Get PDF
    Avian genomes typically consist of ~10 pairs of macro- and ~30 pairs of microchromosomes. While inter-chromosomally, a pattern emerges of very little change (with notable exceptions) throughout evolution, intrachromosomal changes remain relatively poorly studied. To rectify this, here we use a pan-avian universally hybridising set of 74 chicken bacterial artificial chromosome (BAC) probes on the macrochromosomes of eight bird species: common blackbird, Atlantic canary, Eurasian woodcock, helmeted guinea fowl, houbara bustard, mallard duck, and rock dove. A combination of molecular cytogenetic, bioinformatics, and mathematical analyses allowed the building of comparative cytogenetic maps, reconstruction of a putative Neognathae ancestor, and assessment of chromosome rearrangement patterns and phylogenetic relationships in the studied neognath lineages. We observe that, as with our previous studies, chicken appears to have the karyotype most similar to the ancestor; however, previous reports of an increased rate of intrachromosomal change in Passeriformes (songbirds) appear not to be the case in our dataset. The use of this universally hybridizing probe set is applicable not only for the re-tracing of avian karyotype evolution but, potentially, for reconstructing genome assemblies

    Knee Point Search Using Cascading Top- k

    Get PDF
    Anomaly detection systems and many other applications are frequently confronted with the problem of finding the largest knee point in the sorted curve for a set of unsorted points. This paper proposes an efficient knee point search algorithm with minimized time complexity using the cascading top-k sorting when a priori probability distribution of the knee point is known. First, a top-k sort algorithm is proposed based on a quicksort variation. We divide the knee point search problem into multiple steps. And in each step an optimization problem of the selection number k is solved, where the objective function is defined as the expected time cost. Because the expected time cost in one step is dependent on that of the afterwards steps, we simplify the optimization problem by minimizing the maximum expected time cost. The posterior probability of the largest knee point distribution and the other parameters are updated before solving the optimization problem in each step. An example of source detection of DNS DoS flooding attacks is provided to illustrate the applications of the proposed algorithm

    Cluster validity in clustering methods

    Get PDF

    Disentangling clustering configuration intricacies for divergently selected chicken breeds

    Get PDF
    Divergently selected chicken breeds are of great interest not only from an economic point of view, but also in terms of sustaining diversity of the global poultry gene pool. In this regard, it is essential to evaluate the classification (clustering) of varied chicken breeds using methods and models based on phenotypic and genotypic breed differences. It is also important to implement new mathematical indicators and approaches. Accordingly, we set the objectives to test and improve clustering algorithms and models to discriminate between various chicken breeds. A representative portion of the global chicken gene pool including 39 different breeds was examined in terms of an integral performance index, i.e., specific egg mass yield relative to body weight of females. The generated dataset was evaluated within the traditional, phenotypic and genotypic classification/clustering models using the k-means method, inflection points clustering, and admixture analysis. The latter embraced SNP genotype datasets including a specific one focused on the performance-associated NCAPG-LCORL locus. The k-means and inflection points analyses showed certain discrepancies between the tested models/submodels and flaws in the produced cluster configurations. On the other hand, 11 core breeds were identified that were shared between the examined models and demonstrated more adequate clustering and admixture patterns. These findings will lay the foundation for future research to improve methods for clustering as well as genome- and phenomewide association/mediation analyses

    Evolutionary subdivision of domestic chickens: implications for local breeds as assessed by phenotype and genotype in comparison to commercial and fancy breeds

    Get PDF
    To adjust breeding programs for local, commercial, and fancy breeds, and to implement molecular (marker-assisted) breeding, a proper comprehension of phenotypic and genotypic variation is a sine qua non for breeding progress in animal production. Here, we investigated an evolutionary subdivision of domestic chickens based on their phenotypic and genotypic variability using a wide sample of 49 different breeds/populations. These represent a significant proportion of the global chicken gene pool and all major purposes of breed use (according to their traditional classification model), with many of them being characterized by a synthetic genetic structure and notable admixture. We assessed their phenotypic variability in terms of body weight, body measurements, and egg production. From this, we proposed a phenotypic clustering model (PCM) including six evolutionary lineages of breed formation: egg-type, meat-type, dual purpose (egg-meat and meat-egg), game, fancy, and Bantam. Estimation of genotypic variability was carried out using the analysis of five SNPs, i.e., at the level of genomic variation at the NCAPG-LCORL locus. Based on these data, two generally similar genotypic clustering models (GCM1 and GCM2) were inferred that also had several overlaps with PCM. Further research for SNPs associated with economically important traits can be instrumental in marker-assisted breeding programs
    corecore