32 research outputs found

    Coopération entre Optimisation Combinatoire et Statistiques pour la Sélection animale

    Get PDF
    National audienceL'objectif de cette étude est d'élaborer des modÚles prédictifs permettant, à partir de données génomiques, de déterminer les individus les plus performants selon certains critÚres quantitatifs. L'approche proposée allie les forces des méthodes statistiques et des méthodes d'optimisation combinatoire

    Feature selection for high dimensional regression using local search and statistical criteria

    Get PDF
    International audienceGenomic selection is a genetic evaluation of animals from their DNA, based on a huge number of markers covering the whole genome. It requires advanced approaches and in particular feature selection methods. Feature selection is a combinatorial problem that may be addressed by combinatorial optimization methods. We propose to combine an iterated local search (ILS) with a statistical evaluation of a multivariate regression and we compared three criteria in order to analyse their impact on the performance of the local search

    Feature selection in high dimensional regression problems for genomic

    Get PDF
    International audienceIn the context of genomic selection in animal breeding, an important objective consists in looking for explicative markers for a phe- notype under study. In order to deal with a high number of markers, we propose to use combinatorial optimization to perform variable selection. Results show that our approach outperforms some classical and widely used methods on simulated and "closed to real" datasets

    Combining combinatorial optimization and statistics to mine high-throughput genotyping data

    Get PDF
    National audienceDepuis quelques annĂ©es, la gĂ©nomique a grandement Ă©voluĂ© avec le dĂ©veloppement de nouvelles technologies telles que le sĂ©quençage et le gĂ©notypage haut-dĂ©bit. En ce qui concerne le domaine animal, nous sommes aujourd'hui capables de lire les informations gĂ©nomiques sur prĂšs de 800 000 marqueurs sur des ensembles d'individus de plus en plus larges (de 3 000 Ă  10 000). Ces donnĂ©es peuvent donner lieu Ă  des Ă©tudes d'association entre les marqueurs (GWAS : Genome-Wide Association Studies). Outre les contraintes biologiques (stockage des Ă©chantillons, manipulations longues et coĂ»teuses...), la partie analyse de donnĂ©es (Ă©tude et extraction de connaissances) doit aussi ĂȘtre adaptĂ©e en terme de mĂ©thodologie et d'architecture matĂ©rielle et logicielle. L'objectif est d'Ă©laborer des modĂ©les prĂ©dictifs permettant, Ă  partir des donnĂ©es gĂ©nomiques, de dĂ©terminer les individus les plus performants selon certains critĂšres quantitatifs de sĂ©lection animale. Pour cela, l'objectif thĂ©orique est Ă  terme de dĂ©finir de nouvelles mĂ©thodes permettant la coopĂ©ration entre statistique et optimisation combinatoire spĂ©cifiquement dĂ©diĂ©es aux donnĂ©es issues de gĂ©notypage haut dĂ©bit en vue d'une implĂ©mentation

    Cell-to-Cell Stochastic Variation in Gene Expression Is a Complex Genetic Trait

    Get PDF
    The genetic control of common traits is rarely deterministic, with many genes contributing only to the chance of developing a given phenotype. This incomplete penetrance is poorly understood and is usually attributed to interactions between genes or interactions between genes and environmental conditions. Because many traits such as cancer can emerge from rare events happening in one or very few cells, we speculate an alternative and complementary possibility where some genotypes could facilitate these events by increasing stochastic cell-to-cell variations (or ‘noise’). As a very first step towards investigating this possibility, we studied how natural genetic variation influences the level of noise in the expression of a single gene using the yeast S. cerevisiae as a model system. Reproducible differences in noise were observed between divergent genetic backgrounds. We found that noise was highly heritable and placed under a complex genetic control. Scanning the genome, we mapped three Quantitative Trait Loci (QTL) of noise, one locus being explained by an increase in noise when transcriptional elongation was impaired. Our results suggest that the level of stochasticity in particular molecular regulations may differ between multicellular individuals depending on their genotypic background. The complex genetic architecture of noise buffering couples genetic to non-genetic robustness and provides a molecular basis to the probabilistic nature of complex traits

    Raw_global BIOM file

    No full text
    The raw_global BIOM file was obtained in the third analytical step of the home-made bioinformatics pipeline. All the annotated OTU tables were merged into a global OTU table based on each OTU's taxonomic annotation, in which each column of this table represents a sample, and each line represents a taxon (identified by its OTU identifier in the first column, and the taxonomic annotation in the last column). Finally, this annotated OTU table was converted into a global BIOM file to obtain the raw_global BIOM file

    Normalized_global BIOM file

    No full text
    The normalized_global BIOM file was obtained at the end of the third analytical step of the home-made bioinformatics pipeline, after using DESeq2 normalization and conversion into a full annotated and normalized global BIOM file

    Home-made scripts used in the bioinformatics pipeline

    No full text
    Compressed file that contains the four home-made scripts used in the home-made bioinformatics pipeline: fasta_dealigner.py: Python script used in the first step of the home-made bioinformatics pipeline, that generates an unaligned FASTA file. rarefaction.R: R script (v2.14.1) used in the second step of the home-made bioinformatics pipeline, that generates intra-sample rarefaction curves. OTU_tables_format.pl: Perl script that generates OTU count tables at the end of the second step of the home-made bioinformatics pipeline. OTU_tables_merge.py: Python script that merges OTU count tables from all samples produced at the end of the second step of the home-made bioinformatics pipeline in order to produce a global OTU Table tsv file

    OTU_count_tables tsv file

    No full text
    This compressed file contains the OTU_count_tables tsv file that is the output of the second analytical step (clustering analysis and OTU classification) of the home-made bioinformatics pipeline. It contains, for each sample, four columns: the first column is the consensus read name associated to the OTU, the second column is the OTU raw counts, the third column is the consensus read name (same as first column) and the fourth column is the associated taxon

    ModÚles mixtes en génétique animale : sélection de variables par optimisation combinatoire

    Get PDF
    National audienceEn sĂ©lection gĂ©nomique animale, un des enjeux consiste Ă  identiïŹer un sous-ensemble de marqueurs gĂ©nomiques explicatifs pour un trait d'intĂ©rĂȘt quantitatif. La spĂ©ciïŹcitĂ© des Ă©tudes animales nĂ©cessite l'utilisation de modĂšles mixtes, du fait des liens de parentĂ© entre individus. Nous proposons d'eïŹ€ectuer, dans ce cadre, une sĂ©lection des marqueurs d'intĂ©rĂȘt Ă  l'aide de mĂ©thodes d'optimisation combinatoire
    corecore