268 research outputs found

    A triple-random ensemble classification method for mining multi-label data

    Full text link
    This paper presents a triple-random ensemble learning method for handling multi-label classification problems. The proposed method integrates and develops the concepts of random subspace, bagging and random k-label sets ensemble learning methods to form an approach to classify multi-label data. It applies the random subspace method to feature space, label space as well as instance space. The devised subsets selection procedure is executed iteratively. Each multi-label classifier is trained using the randomly selected subsets. At the end of the iteration, optimal parameters are selected and the ensemble MLC classifiers are constructed. The proposed method is implemented and its performance compared against that of popular multi-label classification methods. The experimental results reveal that the proposed method outperforms the examined counterparts in most occasions when tested on six small to larger multi-label datasets from different domains. This demonstrates that the developed method possesses general applicability for various multi-label classification problems.<br /

    PREDICTING COMPLEX PHENOTYPE-GENOTYPE RELATIONSHIPS IN GRASSES: A SYSTEMS GENETICS APPROACH

    Get PDF
    It is becoming increasingly urgent to identify and understand the mechanisms underlying complex traits. Expected increases in the human population coupled with climate change make this especially urgent for grasses in the Poaceae family because these serve as major staples of the human and livestock diets worldwide. In particular, Oryza sativa (rice), Triticum spp. (wheat), Zea mays (maize), and Saccharum spp. (sugarcane) are among the top agricultural commodities. Molecular marker tools such as linkage-based Quantitative Trait Loci (QTL) mapping, Genome-Wide Association Studies (GWAS), Multiple Marker Assisted Selection (MMAS), and Genome Selection (GS) techniques offer promise for understanding the mechanisms behind complex traits and to improve breeding programs. These methods have shown some success. Often, however, they cannot identify the causal genes underlying traits nor the biological context in which those genes function. To improve our understanding of complex traits as well improve breeding techniques, additional tools are needed to augment existing methods. This work proposes a knowledge-independent systems-genetic paradigm that integrates results from genetic studies such as QTL mapping, GWAS and mutational insertion lines such as Tos17 with gene co-expression networks for grasses--in particular for rice. The techniques described herein attempt to overcome the bias of limited human knowledge by relying solely on the underlying signals within the data to capture a holistic representation of gene interactions for a species. Through integration of gene co-expression networks with genetic signal, modules of genes can be identified with potential effect for a given trait, and the biological function of those interacting genes can be determined

    Network diffusion methods for omics big bio data analytics and interpretation with application to cancer datasets

    Get PDF
    Nella attuale ricerca biomedica un passo fondamentale verso una comprensione dei meccanismi alla radice di una malattia è costituito dalla identificazione dei disease modules, cioè quei sottonetwork dell'interattoma, il network delle interazioni tra proteine, con un alto numero di alterazioni geniche. Tuttavia, l'incompletezza del network e l'elevata variabilità dei geni alterati rendono la soluzione di questo problema non banale. I metodi fisici che sfruttano le proprietà dei processi diffusivi su network, dei quali mi sono occupato in questo lavoro di tesi, sono quelli che consentono di ottenere le migliori prestazioni. Nella prima parte del mio lavoro, ho indagato la teoria relativa alla diffusione ed ai random walk su network, trovando interessanti relazioni con le tecniche di clustering e con altri modelli fisici la cui dinamica è descritta dalla matrice laplaciana. Ho poi implementato un tecnica basata sulla diffusione su rete applicandola a dati di espressione genica e mutazioni somatiche di tre diverse tipologie di cancro. Il metodo è organizzato in due parti. Dopo aver selezionato un sottoinsieme dei nodi dell'interattoma, associamo ad ognuno di essi un'informazione iniziale che riflette il "grado" di alterazione del gene. L'algoritmo di diffusione propaga l'informazione iniziale nel network raggiungendo, dopo un transiente, lo stato stazionario. A questo punto, la quantità di fluido in ciascun nodo è utilizzata per costruire un ranking dei geni. Nella seconda parte, i disease modules sono identificati mediante una procedura di network resampling. L'analisi condotta ci ha permesso di identificare un numero consistente di geni già noti nella letteratura relativa ai tipi di cancro studiati, nonché un insieme di altri geni correlati a questi che potrebbero essere interessanti candidati per ulteriori approfondimenti.Attraverso una procedura di Gene Set Enrichment abbiamo infine testato la correlazione dei moduli identificati con pathway biologici noti

    Annotation concept synthesis and enrichment analysis: a logic-based approach to the interpretation of high-throughput experiments

    Get PDF
    Motivation: Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information (e.g. GO annotations) for sets of genes identified by experiments (e.g. a set of differentially expressed genes, a cluster). The discovered information is utilized by human experts to find biological interpretations of the experiments

    Annotation concept synthesis and enrichment analysis: a logic-based approach to the interpretation of high-throughput experiments

    Get PDF
    Motivation: Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information (e.g. GO annotations) for sets of genes identified by experiments (e.g. a set of differentially expressed genes, a cluster). The discovered information is utilized by human experts to find biological interpretations of the experiments
    corecore