611 research outputs found
Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics
In a wide range of statistical learning problems such as ranking, clustering
or metric learning among others, the risk is accurately estimated by
-statistics of degree , i.e. functionals of the training data with
low variance that take the form of averages over -tuples. From a
computational perspective, the calculation of such statistics is highly
expensive even for a moderate sample size , as it requires averaging
terms. This makes learning procedures relying on the optimization of
such data functionals hardly feasible in practice. It is the major goal of this
paper to show that, strikingly, such empirical risks can be replaced by
drastically computationally simpler Monte-Carlo estimates based on terms
only, usually referred to as incomplete -statistics, without damaging the
learning rate of Empirical Risk Minimization (ERM)
procedures. For this purpose, we establish uniform deviation results describing
the error made when approximating a -process by its incomplete version under
appropriate complexity assumptions. Extensions to model selection, fast rate
situations and various sampling techniques are also considered, as well as an
application to stochastic gradient descent for ERM. Finally, numerical examples
are displayed in order to provide strong empirical evidence that the approach
we promote largely surpasses more naive subsampling techniques.Comment: To appear in Journal of Machine Learning Research. 34 pages. v2:
minor correction to Theorem 4 and its proof, added 1 reference. v3: typo
corrected in Proposition 3. v4: improved presentation, added experiments on
model selection for clustering, fixed minor typo
Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions
In decentralized networks (of sensors, connected objects, etc.), there is an
important need for efficient algorithms to optimize a global cost function, for
instance to learn a global model from the local data collected by each
computing unit. In this paper, we address the problem of decentralized
minimization of pairwise functions of the data points, where these points are
distributed over the nodes of a graph defining the communication topology of
the network. This general problem finds applications in ranking, distance
metric learning and graph inference, among others. We propose new gossip
algorithms based on dual averaging which aims at solving such problems both in
synchronous and asynchronous settings. The proposed framework is flexible
enough to deal with constrained and regularized variants of the optimization
problem. Our theoretical analysis reveals that the proposed algorithms preserve
the convergence rate of centralized dual averaging up to an additive bias term.
We present numerical simulations on Area Under the ROC Curve (AUC) maximization
and metric learning problems which illustrate the practical interest of our
approach
Extending Gossip Algorithms to Distributed Estimation of U-Statistics
Efficient and robust algorithms for decentralized estimation in networks are
essential to many distributed systems. Whereas distributed estimation of sample
mean statistics has been the subject of a good deal of attention, computation
of -statistics, relying on more expensive averaging over pairs of
observations, is a less investigated area. Yet, such data functionals are
essential to describe global properties of a statistical population, with
important examples including Area Under the Curve, empirical variance, Gini
mean difference and within-cluster point scatter. This paper proposes new
synchronous and asynchronous randomized gossip algorithms which simultaneously
propagate data across the network and maintain local estimates of the
-statistic of interest. We establish convergence rate bounds of and
for the synchronous and asynchronous cases respectively, where
is the number of iterations, with explicit data and network dependent
terms. Beyond favorable comparisons in terms of rate analysis, numerical
experiments provide empirical evidence the proposed algorithms surpasses the
previously introduced approach.Comment: to be presented at NIPS 201
Rain Rate Estimation with SAR using NEXRAD measurements with Convolutional Neural Networks
Remote sensing of rainfall events is critical for both operational and
scientific needs, including for example weather forecasting, extreme flood
mitigation, water cycle monitoring, etc. Ground-based weather radars, such as
NOAA's Next-Generation Radar (NEXRAD), provide reflectivity and precipitation
measurements of rainfall events. However, the observation range of such radars
is limited to a few hundred kilometers, prompting the exploration of other
remote sensing methods, paricularly over the open ocean, that represents large
areas not covered by land-based radars. For a number of decades, C-band SAR
imagery such a such as Sentinel-1 imagery has been known to exhibit rainfall
signatures over the sea surface. However, the development of SAR-derived
rainfall products remains a challenge. Here we propose a deep learning approach
to extract rainfall information from SAR imagery. We demonstrate that a
convolutional neural network, such as U-Net, trained on a colocated and
preprocessed Sentinel-1/NEXRAD dataset clearly outperforms state-of-the-art
filtering schemes. Our results indicate high performance in segmenting
precipitation regimes, delineated by thresholds at 1, 3, and 10 mm/h. Compared
to current methods that rely on Koch filters to draw binary rainfall maps,
these multi-threshold learning-based models can provide rainfall estimation for
higher wind speeds and thus may be of great interest for data assimilation
weather forecasting or for improving the qualification of SAR-derived wind
field data.Comment: 25 pages, 10 figure
La réhabilitation d'un grand ensemble de Vaulx-en-Velin vue par ses habitants
Les opĂ©rations de rĂ©habilitation constituent un moment important dans la vie d'un ensemble immobilier car leurs effets ne se limitent pas au bĂąti et concernent l'ensemble de la vie quotidienne des habitants, de leur usage des lieux et de leurs relations sociales dans le quartier. Suite Ă la rĂ©habilitation de l'ensemble qu'il gĂšre Ă Vaulx-en-Velin, l'OPAC a confiĂ© au laboratoire junior Focales de l'ENS de Lyon la mission d'Ă©valuer la perception qu'ont les locataires des travaux rĂ©alisĂ©s. Focales est composĂ© de jeunes chercheurs en sociologie urbaine qui se sont chargĂ©s de mener auprĂšs d'un Ă©chantillon reprĂ©sentatif des locataires du site une enquĂȘte par questionnaires portant sur leur apprĂ©ciation de la rĂ©habilitation, et abordant Ă©galement, Ă travers ce prisme, leur rapport Ă l'habitat et Ă la gestion de la relation de proximitĂ© par le bailleur
Parallelized computational 3D video microscopy of freely moving organisms at multiple gigapixels per second
To study the behavior of freely moving model organisms such as zebrafish
(Danio rerio) and fruit flies (Drosophila) across multiple spatial scales, it
would be ideal to use a light microscope that can resolve 3D information over a
wide field of view (FOV) at high speed and high spatial resolution. However, it
is challenging to design an optical instrument to achieve all of these
properties simultaneously. Existing techniques for large-FOV microscopic
imaging and for 3D image measurement typically require many sequential image
snapshots, thus compromising speed and throughput. Here, we present 3D-RAPID, a
computational microscope based on a synchronized array of 54 cameras that can
capture high-speed 3D topographic videos over a 135-cm^2 area, achieving up to
230 frames per second at throughputs exceeding 5 gigapixels (GPs) per second.
3D-RAPID features a 3D reconstruction algorithm that, for each synchronized
temporal snapshot, simultaneously fuses all 54 images seamlessly into a
globally-consistent composite that includes a coregistered 3D height map. The
self-supervised 3D reconstruction algorithm itself trains a
spatiotemporally-compressed convolutional neural network (CNN) that maps raw
photometric images to 3D topography, using stereo overlap redundancy and
ray-propagation physics as the only supervision mechanism. As a result, our
end-to-end 3D reconstruction algorithm is robust to generalization errors and
scales to arbitrarily long videos from arbitrarily sized camera arrays. The
scalable hardware and software design of 3D-RAPID addresses a longstanding
problem in the field of behavioral imaging, enabling parallelized 3D
observation of large collections of freely moving organisms at high
spatiotemporal throughputs, which we demonstrate in ants (Pogonomyrmex
barbatus), fruit flies, and zebrafish larvae
Photography-based taxonomy is inadequate, unnecessary, and potentially harmful for biological sciences
The question whether taxonomic descriptions naming new animal species without type specimen(s) deposited in collections should be accepted for publication by scientific journals and allowed by the Code has already been discussed in Zootaxa (Dubois & NemĂ©sio 2007; Donegan 2008, 2009; NemĂ©sio 2009aâb; Dubois 2009; Gentile & Snell 2009; Minelli 2009; Cianferoni & Bartolozzi 2016; Amorim et al. 2016). This question was again raised in a letter supported
by 35 signatories published in the journal Nature (Pape et al. 2016) on 15 September 2016. On 25 September 2016, the following rebuttal (strictly limited to 300 words as per the editorial rules of Nature) was submitted to Nature, which on
18 October 2016 refused to publish it. As we think this problem is a very important one for zoological taxonomy, this text is published here exactly as submitted to Nature, followed by the list of the 493 taxonomists and collection-based
researchers who signed it in the short time span from 20 September to 6 October 2016
A Consensus Molecular Classification of Muscle-invasive Bladder Cancer
Background: Muscle-invasive bladder cancer (MIBC) is a molecularly diverse disease with heterogeneous clinical outcomes. Several molecular classifications have been proposed, but the diversity of their subtype sets impedes their clinical application. Objective: To achieve an international consensus on MIBC molecular subtypes that reconciles the published classification schemes. Design, setting, and participants: We used 1750 MIBC transcriptomic profiles from 16 published datasets and two additional cohorts. Outcome measurements and statistical analysis: We performed a network-based analysis of six independent MIBC classification systems to identify a consensus set of molecular classes. Association with survival was assessed using multivariable Cox models. Results and limitations: We report the results of an international effort to reach a consensus on MIBC molecular subtypes. We identified a consensus set of six molecular classes: luminal papillary (24%), luminal nonspecified (8%), luminal unstable (15%), stroma-rich (15%), basal/squamous (35%), and neuroendocrine-like (3%). These consensus classes differ regarding underlying oncogenic mechanisms, infiltration by immune and stromal cells, and histological and clinical characteristics, including outcomes. We provide a single-sample classifier that assigns a consensus class label to a tumor sample's transcriptome. Limitations of the work are retrospective clinical data collection and a lack of complete information regarding patient treatment. Conclusions: This consensus system offers a robust framework that will enable testing and validation of predictive biomarkers in future prospective clinical trials. Patient summary: Bladder cancers are heterogeneous at the molecular level, and scientists have proposed several classifications into sets of molecular classes. While these classifications may be useful to stratify patients for prognosis or response to treatment, a consensus classification would facilitate the clinical use of molecular classes. Conducted by multidisciplinary expert teams in the field, this study proposes such a consensus and provides a tool for applying the consensus classification in the clinical setting. An international consortium of bladder cancer expert teams establishes a consensus reconciling the diverse molecular classifications of muscle-invasive bladder cancer. This work offers a robust framework that will enable testing and validating predictive biomarkers in future prospective clinical trials
Measurement of the top quark forward-backward production asymmetry and the anomalous chromoelectric and chromomagnetic moments in pp collisions at âs = 13 TeV
Abstract The parton-level top quark (t) forward-backward asymmetry and the anomalous chromoelectric (dÌ t) and chromomagnetic (ÎŒÌ t) moments have been measured using LHC pp collisions at a center-of-mass energy of 13 TeV, collected in the CMS detector in a data sample corresponding to an integrated luminosity of 35.9 fbâ1. The linearized variable AFB(1) is used to approximate the asymmetry. Candidate t t ÂŻ events decaying to a muon or electron and jets in final states with low and high Lorentz boosts are selected and reconstructed using a fit of the kinematic distributions of the decay products to those expected for t t ÂŻ final states. The values found for the parameters are AFB(1)=0.048â0.087+0.095(stat)â0.029+0.020(syst),ÎŒÌt=â0.024â0.009+0.013(stat)â0.011+0.016(syst), and a limit is placed on the magnitude of | dÌ t| < 0.03 at 95% confidence level. [Figure not available: see fulltext.
MUSiC : a model-unspecific search for new physics in proton-proton collisions at root s=13TeV
Results of the Model Unspecific Search in CMS (MUSiC), using proton-proton collision data recorded at the LHC at a centre-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 35.9 fb(-1), are presented. The MUSiC analysis searches for anomalies that could be signatures of physics beyond the standard model. The analysis is based on the comparison of observed data with the standard model prediction, as determined from simulation, in several hundred final states and multiple kinematic distributions. Events containing at least one electron or muon are classified based on their final state topology, and an automated search algorithm surveys the observed data for deviations from the prediction. The sensitivity of the search is validated using multiple methods. No significant deviations from the predictions have been observed. For a wide range of final state topologies, agreement is found between the data and the standard model simulation. This analysis complements dedicated search analyses by significantly expanding the range of final states covered using a model independent approach with the largest data set to date to probe phase space regions beyond the reach of previous general searches.Peer reviewe
- âŠ