257 research outputs found
Scaling Analysis of Affinity Propagation
We analyze and exploit some scaling properties of the Affinity Propagation
(AP) clustering algorithm proposed by Frey and Dueck (2007). First we observe
that a divide and conquer strategy, used on a large data set hierarchically
reduces the complexity to , for a
data-set of size and a depth of the hierarchical strategy. For a
data-set embedded in a -dimensional space, we show that this is obtained
without notably damaging the precision except in dimension . In fact, for
larger than 2 the relative loss in precision scales like
. Finally, under some conditions we observe that there is a
value of the penalty coefficient, a free parameter used to fix the number
of clusters, which separates a fragmentation phase (for ) from a
coalescent one (for ) of the underlying hidden cluster structure. At
this precise point holds a self-similarity property which can be exploited by
the hierarchical strategy to actually locate its position. From this
observation, a strategy based on \AP can be defined to find out how many
clusters are present in a given dataset.Comment: 28 pages, 14 figures, Inria research repor
The influence of selected factors on the distribution of epilithic diatoms in a torrential river the Kamniška Bistrica (Slovenia)
Physical and chemical characteristics of habitats and species diversity in streams and rivers are strongly influenced by the catchment area. We analysed the influence of selected environmental and spatial variables on the diversity and species composition of epilithic diatom communities in periphyton. Samples were collected along the river course in a torrential river the Kamniška Bistrica. Sampling sites were selected in reaches distributed from the source to the outlet of the river and were under different influences from the catchment area and with different physical and chemical characteristics. The most common and dominant diatom species in the periphyton community were Achnanthes biasolettiana and A. minutissima. Achnanthes species often inhabit rivers and springs with moderate organic pollution. Another common diatom taxon was Gomphonema pumilum – a key species indicating oligosaprobic conditions. The results of the canonical correspondence analyses revealed that variance of the periphytic diatom community was explained by water temperature and conductivity as well as altitude. Diatom species richness was positively correlated with saprobic index values and abundance of filamentous algae in the river bed indicating a relatively low organic matter and nutrient input into the river system
Learning Behavioural Context
The original publication is available at www.springerlink.co
Ataxia-telangiectasia: Linkage analysis in highly inbred Arab and Druze families and differentiation from an ataxia-microcephaly-cataract syndrome
Ataxia-telangiectasia (A-T) is a progressive autosomal recessive disease featuring neurodegeneration, immunodeficiency, chromosomal instability, radiation sensitivity and a highly increased proneness to cancer. A-T is ethnically widespread and genetically heterogeneous, as indicated by the existence of four complementation groups in this disease. Several "A-T-like" genetic diseases share various clinical and cellular characteristics with A-T. By using linkage analysis to study North American and Turkish A-O families, the ATA (A-T, complementation group A) gene has been mapped to chromosome 11q23. A number of Israeli Arab A-T patients coming from large, highly inbred families were assigned to group A In one of these families, an additional autosomal recessive disease was identified, characterized by ataxia, hypotonia, microcephaly and bilateral congenital cataracts. In two patients with this syndrome, normal levels of serum immunoglobulins and alpha-fetoprotein, chromosomal stability in peripheral blood lymphocytes and skin fibroblasts, and normal cellular response to treatments with X-rays and the radiomimetic drug neocarzinostatin indicated that this disease does not share, with A-T, any additional features other than ataxia. These tests also showed that another patient in this family, who is also mentally retarded, is affected with both disorders. This conclusion was further supported by linkage analysis with 11q23 markers. Lod scores between A-O and these markers, cumulated over three large Arab families, were significant and confirmed the localization of the ATA gene to aq23. However, another Druze family unassigned to a specific complementation group, showed several recombinants between A-T and the same markers, leaving the localization of the A-T gene in this family open
Uncertainty quantification in graph-based classification of high dimensional data
Classification of high dimensional data finds wide-ranging applications. In
many of these applications equipping the resulting classification with a
measure of uncertainty may be as important as the classification itself. In
this paper we introduce, develop algorithms for, and investigate the properties
of, a variety of Bayesian models for the task of binary classification; via the
posterior distribution on the classification labels, these methods
automatically give measures of uncertainty. The methods are all based around
the graph formulation of semi-supervised learning.
We provide a unified framework which brings together a variety of methods
which have been introduced in different communities within the mathematical
sciences. We study probit classification in the graph-based setting, generalize
the level-set method for Bayesian inverse problems to the classification
setting, and generalize the Ginzburg-Landau optimization-based classifier to a
Bayesian setting; we also show that the probit and level set approaches are
natural relaxations of the harmonic function approach introduced in [Zhu et al
2003].
We introduce efficient numerical methods, suited to large data-sets, for both
MCMC-based sampling as well as gradient-based MAP estimation. Through numerical
experiments we study classification accuracy and uncertainty quantification for
our models; these experiments showcase a suite of datasets commonly used to
evaluate graph-based semi-supervised learning algorithms.Comment: 33 pages, 14 figure
Continuation for thin film hydrodynamics and related scalar problems
This chapter illustrates how to apply continuation techniques in the analysis
of a particular class of nonlinear kinetic equations that describe the time
evolution through transport equations for a single scalar field like a
densities or interface profiles of various types. We first systematically
introduce these equations as gradient dynamics combining mass-conserving and
nonmass-conserving fluxes followed by a discussion of nonvariational amendmends
and a brief introduction to their analysis by numerical continuation. The
approach is first applied to a number of common examples of variational
equations, namely, Allen-Cahn- and Cahn-Hilliard-type equations including
certain thin-film equations for partially wetting liquids on homogeneous and
heterogeneous substrates as well as Swift-Hohenberg and Phase-Field-Crystal
equations. Second we consider nonvariational examples as the
Kuramoto-Sivashinsky equation, convective Allen-Cahn and Cahn-Hilliard
equations and thin-film equations describing stationary sliding drops and a
transversal front instability in a dip-coating. Through the different examples
we illustrate how to employ the numerical tools provided by the packages
auto07p and pde2path to determine steady, stationary and time-periodic
solutions in one and two dimensions and the resulting bifurcation diagrams. The
incorporation of boundary conditions and integral side conditions is also
discussed as well as problem-specific implementation issues
Action Recognition with a Bio--Inspired Feedforward Motion Processing Model: The Richness of Center-Surround Interactions
International audienceHere we show that reproducing the functional properties of MT cells with various center--surround interactions enriches motion representation and improves the action recognition performance. To do so, we propose a simplified bio--inspired model of the motion pathway in primates: It is a feedforward model restricted to V1-MT cortical layers, cortical cells cover the visual space with a foveated structure, and more importantly, we reproduce some of the richness of center-surround interactions of MT cells. Interestingly, as observed in neurophysiology, our MT cells not only behave like simple velocity detectors, but also respond to several kinds of motion contrasts. Results show that this diversity of motion representation at the MT level is a major advantage for an action recognition task. Defining motion maps as our feature vectors, we used a standard classification method on the Weizmann database: We obtained an average recognition rate of 98.9%, which is superior to the recent results by Jhuang et al. (2007). These promising results encourage us to further develop bio--inspired models incorporating other brain mechanisms and cortical layers in order to deal with more complex videos
General statistical scaling laws for stability in ecological systems
Ecological stability refers to a family of concepts used to describe how systems of interacting species vary through time and respond to disturbances. Because observed ecological stability depends on sampling scales and environmental context, it is notoriously difficult to compare measurements across sites and systems. Here, we apply stochastic dynamical systems theory to derive general statistical scaling relationships across time, space, and ecological level of organisation for three fundamental stability aspects: resilience, resistance, and invariance. These relationships can be calibrated using random or representative samples measured at individual scales, and projected to predict average stability at other scales across a wide range of contexts. Moreover deviations between observed vs. extrapolated scaling relationships can reveal information about unobserved heterogeneity across time, space, or species. We anticipate that these methods will be useful for cross-study synthesis of stability data, extrapolating measurements to unobserved scales, and identifying underlying causes and consequences of heterogeneity
- …