257 research outputs found

    Scaling Analysis of Affinity Propagation

    Get PDF
    We analyze and exploit some scaling properties of the Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007). First we observe that a divide and conquer strategy, used on a large data set hierarchically reduces the complexity O(N2){\cal O}(N^2) to O(N(h+2)/(h+1)){\cal O}(N^{(h+2)/(h+1)}), for a data-set of size NN and a depth hh of the hierarchical strategy. For a data-set embedded in a dd-dimensional space, we show that this is obtained without notably damaging the precision except in dimension d=2d=2. In fact, for dd larger than 2 the relative loss in precision scales like N(2d)/(h+1)dN^{(2-d)/(h+1)d}. Finally, under some conditions we observe that there is a value ss^* of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for s<ss<s^*) from a coalescent one (for s>ss>s^*) of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position. From this observation, a strategy based on \AP can be defined to find out how many clusters are present in a given dataset.Comment: 28 pages, 14 figures, Inria research repor

    The influence of selected factors on the distribution of epilithic diatoms in a torrential river the Kamniška Bistrica (Slovenia)

    Get PDF
    Physical and chemical characteristics of habitats and species diversity in streams and rivers are strongly influenced by the catchment area. We analysed the influence of selected environmental and spatial variables on the diversity and species composition of epilithic diatom communities in periphyton. Samples were collected along the river course in a torrential river the Kamniška Bistrica. Sampling sites were selected in reaches distributed from the source to the outlet of the river and were under different influences from the catchment area and with different physical and chemical characteristics. The most common and dominant diatom species in the periphyton community were Achnanthes biasolettiana and A. minutissima. Achnanthes species often inhabit rivers and springs with moderate organic pollution. Another common diatom taxon was Gomphonema pumilum – a key species indicating oligosaprobic conditions. The results of the canonical correspondence analyses revealed that variance of the periphytic diatom community was explained by water temperature and conductivity as well as altitude. Diatom species richness was positively correlated with saprobic index values and abundance of filamentous algae in the river bed indicating a relatively low organic matter and nutrient input into the river system

    Ataxia-telangiectasia: Linkage analysis in highly inbred Arab and Druze families and differentiation from an ataxia-microcephaly-cataract syndrome

    Get PDF
    Ataxia-telangiectasia (A-T) is a progressive autosomal recessive disease featuring neurodegeneration, immunodeficiency, chromosomal instability, radiation sensitivity and a highly increased proneness to cancer. A-T is ethnically widespread and genetically heterogeneous, as indicated by the existence of four complementation groups in this disease. Several "A-T-like" genetic diseases share various clinical and cellular characteristics with A-T. By using linkage analysis to study North American and Turkish A-O families, the ATA (A-T, complementation group A) gene has been mapped to chromosome 11q23. A number of Israeli Arab A-T patients coming from large, highly inbred families were assigned to group A In one of these families, an additional autosomal recessive disease was identified, characterized by ataxia, hypotonia, microcephaly and bilateral congenital cataracts. In two patients with this syndrome, normal levels of serum immunoglobulins and alpha-fetoprotein, chromosomal stability in peripheral blood lymphocytes and skin fibroblasts, and normal cellular response to treatments with X-rays and the radiomimetic drug neocarzinostatin indicated that this disease does not share, with A-T, any additional features other than ataxia. These tests also showed that another patient in this family, who is also mentally retarded, is affected with both disorders. This conclusion was further supported by linkage analysis with 11q23 markers. Lod scores between A-O and these markers, cumulated over three large Arab families, were significant and confirmed the localization of the ATA gene to aq23. However, another Druze family unassigned to a specific complementation group, showed several recombinants between A-T and the same markers, leaving the localization of the A-T gene in this family open

    Uncertainty quantification in graph-based classification of high dimensional data

    Get PDF
    Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning. We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification in the graph-based setting, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in [Zhu et al 2003]. We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms.Comment: 33 pages, 14 figure

    Continuation for thin film hydrodynamics and related scalar problems

    Full text link
    This chapter illustrates how to apply continuation techniques in the analysis of a particular class of nonlinear kinetic equations that describe the time evolution through transport equations for a single scalar field like a densities or interface profiles of various types. We first systematically introduce these equations as gradient dynamics combining mass-conserving and nonmass-conserving fluxes followed by a discussion of nonvariational amendmends and a brief introduction to their analysis by numerical continuation. The approach is first applied to a number of common examples of variational equations, namely, Allen-Cahn- and Cahn-Hilliard-type equations including certain thin-film equations for partially wetting liquids on homogeneous and heterogeneous substrates as well as Swift-Hohenberg and Phase-Field-Crystal equations. Second we consider nonvariational examples as the Kuramoto-Sivashinsky equation, convective Allen-Cahn and Cahn-Hilliard equations and thin-film equations describing stationary sliding drops and a transversal front instability in a dip-coating. Through the different examples we illustrate how to employ the numerical tools provided by the packages auto07p and pde2path to determine steady, stationary and time-periodic solutions in one and two dimensions and the resulting bifurcation diagrams. The incorporation of boundary conditions and integral side conditions is also discussed as well as problem-specific implementation issues

    Action Recognition with a Bio--Inspired Feedforward Motion Processing Model: The Richness of Center-Surround Interactions

    Get PDF
    International audienceHere we show that reproducing the functional properties of MT cells with various center--surround interactions enriches motion representation and improves the action recognition performance. To do so, we propose a simplified bio--inspired model of the motion pathway in primates: It is a feedforward model restricted to V1-MT cortical layers, cortical cells cover the visual space with a foveated structure, and more importantly, we reproduce some of the richness of center-surround interactions of MT cells. Interestingly, as observed in neurophysiology, our MT cells not only behave like simple velocity detectors, but also respond to several kinds of motion contrasts. Results show that this diversity of motion representation at the MT level is a major advantage for an action recognition task. Defining motion maps as our feature vectors, we used a standard classification method on the Weizmann database: We obtained an average recognition rate of 98.9%, which is superior to the recent results by Jhuang et al. (2007). These promising results encourage us to further develop bio--inspired models incorporating other brain mechanisms and cortical layers in order to deal with more complex videos

    General statistical scaling laws for stability in ecological systems

    Get PDF
    Ecological stability refers to a family of concepts used to describe how systems of interacting species vary through time and respond to disturbances. Because observed ecological stability depends on sampling scales and environmental context, it is notoriously difficult to compare measurements across sites and systems. Here, we apply stochastic dynamical systems theory to derive general statistical scaling relationships across time, space, and ecological level of organisation for three fundamental stability aspects: resilience, resistance, and invariance. These relationships can be calibrated using random or representative samples measured at individual scales, and projected to predict average stability at other scales across a wide range of contexts. Moreover deviations between observed vs. extrapolated scaling relationships can reveal information about unobserved heterogeneity across time, space, or species. We anticipate that these methods will be useful for cross-study synthesis of stability data, extrapolating measurements to unobserved scales, and identifying underlying causes and consequences of heterogeneity
    corecore