130 research outputs found

    Apprentissage Ă  grande Ă©chelle et applications

    Get PDF
    This thesis presents my main research activities in statistical machine learning aftermy PhD, starting from my post-doc at UC Berkeley to my present research position atInria Grenoble. The first chapter introduces the context and a summary of my scientificcontributions and emphasizes the importance of pluri-disciplinary research. For instance,mathematical optimization has become central in machine learning and the interplay betweensignal processing, statistics, bioinformatics, and computer vision is stronger thanever. With many scientific and industrial fields producing massive amounts of data, theimpact of machine learning is potentially huge and diverse. However, dealing with massivedata raises also many challenges. In this context, the manuscript presents differentcontributions, which are organized in three main topics.Chapter 2 is devoted to large-scale optimization in machine learning with a focus onalgorithmic methods. We start with majorization-minimization algorithms for structuredproblems, including block-coordinate, incremental, and stochastic variants. These algorithmsare analyzed in terms of convergence rates for convex problems and in terms ofconvergence to stationary points for non-convex ones. We also introduce fast schemesfor minimizing large sums of convex functions and principles to accelerate gradient-basedapproaches, based on Nesterov’s acceleration and on Quasi-Newton approaches.Chapter 3 presents the paradigm of deep kernel machine, which is an alliance betweenkernel methods and multilayer neural networks. In the context of visual recognition, weintroduce a new invariant image model called convolutional kernel networks, which is anew type of convolutional neural network with a reproducing kernel interpretation. Thenetwork comes with simple and effective principles to do unsupervised learning, and iscompatible with supervised learning via backpropagation rules.Chapter 4 is devoted to sparse estimation—that is, the automatic selection of modelvariables for explaining observed data; in particular, this chapter presents the result ofpluri-disciplinary collaborations in bioinformatics and neuroscience where the sparsityprinciple is a key to build intepretable predictive models.Finally, the last chapter concludes the manuscript and suggests future perspectives.Ce mémoire présente mes activités de recherche en apprentissage statistique après mathèse de doctorat, dans une période allant de mon post-doctorat à UC Berkeley jusqu’àmon activité actuelle de chercheur chez Inria. Le premier chapitre fournit un contextescientifique dans lequel s’inscrivent mes travaux et un résumé de mes contributions, enmettant l’accent sur l’importance de la recherche pluri-disciplinaire. L’optimisation mathématiqueest ainsi devenue un outil central en apprentissage statistique et les interactionsavec les communautés de vision artificielle, traitement du signal et bio-informatiquen’ont jamais été aussi fortes. De nombreux domaines scientifiques et industriels produisentdes données massives, mais les traiter efficacement nécessite de lever de nombreux verrousscientifiques. Dans ce contexte, ce mémoire présente différentes contributions, qui sontorganisées en trois thématiques.Le chapitre 2 est dédié à l’optimisation à large échelle en apprentissage statistique.Dans un premier lieu, nous étudions plusieurs variantes d’algorithmes de majoration/minimisationpour des problèmes structurés, telles que des variantes par bloc de variables,incrémentales, et stochastiques. Chaque algorithme est analysé en terme de taux deconvergence lorsque le problème est convexe, et nous montrons la convergence de ceux-civers des points stationnaires dans le cas contraire. Des méthodes de minimisation rapidespour traiter le cas de sommes finies de fonctions sont aussi introduites, ainsi que desalgorithmes d’accélération pour les techniques d’optimisation de premier ordre.Le chapitre 3 présente le paradigme des méthodes à noyaux profonds, que l’on peutinterpréter comme un mariage entre les méthodes à noyaux classiques et les techniquesd’apprentissage profond. Dans le contexte de la reconnaissance visuelle, ce chapitre introduitun nouveau modèle d’image invariant appelé réseau convolutionnel à noyaux, qui estun nouveau type de réseau de neurones convolutionnel avec une interprétation en termesde noyaux reproduisants. Le réseau peut être appris simplement sans supervision grâceà des techniques classiques d’approximation de noyaux, mais est aussi compatible avecl’apprentissage supervisé grâce à des règles de backpropagation.Le chapitre 4 est dédié à l’estimation parcimonieuse, c’est à dire, à la séléction automatiquede variables permettant d’expliquer des données observées. En particulier, cechapitre décrit des collaborations pluri-disciplinaires en bioinformatique et neuroscience,où le principe de parcimonie est crucial pour obtenir des modèles prédictifs interprétables.Enfin, le dernier chapitre conclut ce mémoire et présente des perspectives futures

    Laboratory directed research and development. FY 1995 progress report

    Full text link

    Journal of the Arkansas Academy of Science - Volume 57 2003

    Get PDF

    Discriminative Learning for Probabilistic Sequence Analysis

    No full text

    Development of Methods for Structure and Function Determination in Living and Fixated Cells on the Single-Molecule Level Based on Coincidence Analysis and Spectrally-Resolved Fluorescence Lifetime Imaging Microscopy [SFLIM]

    Get PDF
    The proceeding evolution in molecular biology and biochemistry led to groundbreaking results in the recent years, like the mapping of the human genome. The consequence of the rising knowledge of biological structures and mechanism is that gradually smaller and infrequent units, which are not resolvable by common methods anymore, are subject to investigation. In principle there are two question in the structural exploration of biological systems: Where are the single components localized, or what distances do they have in respect to each other, and from which or how many units are they composed? To solve these questions single-molecule spectroscopy is an excellent tool. The localization of dye-labeled biomolecules is easy, as long as the distance between the single fluorophores exceeds the optical diffraction limit of about 200 nm. For distances between 1 and 10 nm the FRET-effect can be exploited. In the intermediate range of 10 to 200 nm, the so-called resolution gap, only few methods for distance determinations are available, which are usually technically demanding and limited to two dimensions. Since many biological relevant molecules, for example biomolecular machines, are exactly in this order of magnitude, it is of major importance to have a simple 3-dimensional method at hand, which closes the gap. For this purpose an algorithm based on confocal imaging microscopy has been developed, which facilitates the separation of colocalized dyes by their fluorescence lifetimes and spectral characteristics. The accuracy and applicability of the method was in this work using biological calibration compounds. Therefore DNA molecules of different lengths, whose double-stranded backbone is known to be very rigid, were terminally labeled with the dyes Bodipy 630 and Cy 5.5 and immobilized in a 3-dimensional matrix, a cell-like but homogenous inclusion reagent. Comparison with "worm-like chain" model calculations showed that the measured lengths were in good agreement with the model. Furthermore, measurements in cells were accomplished, which affirmed the suitability of the method in biological environment. Beside the localization of biomolecules more and more quantitative investigations of complex cellular units come to the fore. Often the matter is not exclusively anymore the determination of various subunits, which can be discriminated against each other by different dyes, but rather the detection of identical molecules, which assemble or are generated within a cell compartment. For example the read-out and transduction of the genetic information by polymerases, the transcription, takes place in so called transcription factories. A typical HeLa cell contains about 8.000 of such 40 to 80 nm sized centers each containing on average 8 polymerase II enzymes. The reason for the accumulation, as well as the exact number of polymerases, could not be determined so far due to a lack of suitable techniques. However, for the comprehension of the cell function it is of great importance to study these basic units. The first step in this direction, the counting of polymerase II molecules in transcription sites, ought to be conducted in the second part of this work. To be able to quantify colocalized molecules, the analysis of interphoton times deduced from antibunching experiments can be used. Therefore dyes are located in a microscopic image, subsequently singly positioned in the laser focus and the fluorescence is collected until photodestruction. Especially the carbopyronine derivatives Atto 620 and Atto 647 turned out to be best suitable for the experiments because of their high photostability and emission rate. To investigate the applicability of the method in cellular environment, dye labeled oligomers consisting of 40 thymines were incorporated into cells. It was shown that these units selectively and partly multiply hybridize to the up to 200 basepair long adenosine ends of mRNA. By coincidence it was possible to analyze up to four molecules in a single image spot. To reduce the density of the transcription centers for imaging and to enable molecule counting for the 3.000 transcription factories per nucleus, so called "cryosections", cell slices with a thickness of 100 nm, were introduced. The simplest method to label polymerase II molecules uses specific dye labeled antibodies, which singly bind to the polymerases. A fundamental requirement for the success of the experiment is a stoichiometric labeling of the antibodies with the dyes, i.e. no multiply- or unlabeled compounds are allowed to be present. Therefore a new method was developed, which allows preparing one to one labeled proteins and quantum dots by the introduction of an affine group at the dye. It could be shown that the antibodies selectively bind to their targets and first experiments with these probes towards the success of the experiment could be initiated

    Development of Human Genome Editing Tools for the Study of Genetic Variations and Gene Therapies

    Get PDF
    The human genome encodes information that instructs human development, physiology, medicine, and evolution. Massive amount of genomic data has generated an ever-growing pool of hypothesis. Genome editing, broadly defined as targeted changes to the genome, posits to deliver the promise of genomic revolution to transform basic science and personalized medicine. This thesis aims to contribute to this scientific endeavor with a particular focus on the development of effective human genome engineering tools
    • …
    corecore