153,963 research outputs found

    APPLE: Approximate Path for Penalized Likelihood Estimators

    Full text link
    In high-dimensional data analysis, penalized likelihood estimators are shown to provide superior results in both variable selection and parameter estimation. A new algorithm, APPLE, is proposed for calculating the Approximate Path for Penalized Likelihood Estimators. Both the convex penalty (such as LASSO) and the nonconvex penalty (such as SCAD and MCP) cases are considered. The APPLE efficiently computes the solution path for the penalized likelihood estimator using a hybrid of the modified predictor-corrector method and the coordinate-descent algorithm. APPLE is compared with several well-known packages via simulation and analysis of two gene expression data sets.Comment: 24 pages, 9 figure

    A Robust Hybrid Approach Based on Estimation of Distribution Algorithm and Support Vector Machine for Hunting Candidate Disease Genes

    Get PDF
    Microarray data are high dimension with high noise ratio and relatively small sample size, which makes it a challenge to use microarray data to identify candidate disease genes. Here, we have presented a hybrid method that combines estimation of distribution algorithm with support vector machine for selection of key feature genes. We have benchmarked the method using the microarray data of both diffuse B cell lymphoma and colon cancer to demonstrate its performance for identifying key features from the profile data of high-dimension gene expression. The method was compared with a probabilistic model based on genetic algorithm and another hybrid method based on both genetics algorithm and support vector machine. The results showed that the proposed method provides new computational strategy for hunting candidate disease genes from the profile data of disease gene expression. The selected candidate disease genes may help to improve the diagnosis and treatment for diseases

    An evolutionary approach for balancing effectiveness and representation level in gene selection

    Get PDF
    As data mining develops and expands to new application areas, feature selection also reveals various aspects to be considered. This paper underlines two aspects that seem to categorize the large body of available feature selection algorithms: the effectiveness and the representation level. The effectiveness deals with selecting the minimum set of variables that maximize the accuracy of a classifier and the representation level concerns discovering how relevant the variables are for the domain of interest. For balancing the above aspects, the paper proposes an evolutionary framework for feature selection that expresses a hybrid method, organized in layers, each of them exploits a specific model of search strategy. Extensive experiments on gene selection from DNA-microarray datasets are presented and discussed. Results indicate that the framework compares well with different hybrid methods proposed in literature as it has the capability of finding well suited subsets of informative features while improving classification accurac

    Filter-Wrapper Methods For Gene Selection In Cancer Classification

    Get PDF
    In microarray gene expression studies, finding the smallest subset of informative genes from microarray datasets for clinical diagnosis and accurate cancer classification is one of the most difficult challenges in machine learning task. Many researchers have devoted their efforts to address this problem by using a filter method, a wrapper method or a combination of both approaches. A hybrid method is a hybridisation approach between filter and wrapper methods. It benefits from the speed of the filter approach and the accuracy of the wrapper approach. Several hybrid filter-wrapper methods have been proposed to select informative genes. However, hybrid methods encounter a number of limitations, which are associated with filter and wrapper approaches. The gene subset that is produced by filter approaches lacks predictiveness and robustness. The wrapper approach encounters problems of complex interactions among genes and stagnation in local optima. To address these drawbacks, this study investigates filter and wrapper methods to develop effective hybrid methods for gene selection. This study proposes new hybrid filter-wrapper methods based on Maximum Relevancy Minimum Redundancy (MRMR) as a filter approach and adapted bat-inspired algorithm (BA) as a wrapper approach. First, MRMR hybridisation and BA adaptation are investigated to resolve the gene selection problem. The proposed method is called MRMR-BA

    Very Important Pool (VIP) genes – an application for microarray-based molecular signatures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Advances in DNA microarray technology portend that molecular signatures from which microarray will eventually be used in clinical environments and personalized medicine. Derivation of biomarkers is a large step beyond hypothesis generation and imposes considerably more stringency for accuracy in identifying informative gene subsets to differentiate phenotypes. The inherent nature of microarray data, with fewer samples and replicates compared to the large number of genes, requires identifying informative genes prior to classifier construction. However, improving the ability to identify differentiating genes remains a challenge in bioinformatics.</p> <p>Results</p> <p>A new hybrid gene selection approach was investigated and tested with nine publicly available microarray datasets. The new method identifies a Very Important Pool (VIP) of genes from the broad patterns of gene expression data. The method uses a bagging sampling principle, where the re-sampled arrays are used to identify the most informative genes. Frequency of selection is used in a repetitive process to identify the VIP genes. The putative informative genes are selected using two methods, t-statistic and discriminatory analysis. In the t-statistic, the informative genes are identified based on p-values. In the discriminatory analysis, disjoint Principal Component Analyses (PCAs) are conducted for each class of samples, and genes with high discrimination power (DP) are identified. The VIP gene selection approach was compared with the p-value ranking approach. The genes identified by the VIP method but not by the p-value ranking approach are also related to the disease investigated. More importantly, these genes are part of the pathways derived from the common genes shared by both the VIP and p-ranking methods. Moreover, the binary classifiers built from these genes are statistically equivalent to those built from the top 50 p-value ranked genes in distinguishing different types of samples.</p> <p>Conclusion</p> <p>The VIP gene selection approach could identify additional subsets of informative genes that would not always be selected by the p-value ranking method. These genes are likely to be additional true positives since they are a part of pathways identified by the p-value ranking method and expected to be related to the relevant biology. Therefore, these additional genes derived from the VIP method potentially provide valuable biological insights.</p

    Development and evaluation of machine learning algorithms for biomedical applications

    Get PDF
    Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches. This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques for selecting topological features suitable for link prediction in gene networks; a graph sparsification method for network sampling; (iii) combined supervised and unsupervised methods to infer gene networks; and (iv) sampling and boosting techniques for reverse engineering gene networks. For drug sensitivity prediction problem, the dissertation presents (i) an instance selection technique and hybrid method for drug sensitivity prediction; (ii) a link prediction approach to drug sensitivity prediction; a noise-filtering method for drug sensitivity prediction; and (iv) transfer learning approaches for enhancing the performance of drug sensitivity prediction. Substantial experiments are conducted to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results demonstrate the feasibility of the algorithms and their superiority over the existing approaches

    Differential introgression and the maintenance of species boundaries in an advanced generation avian hybrid zone

    Get PDF
    Background: Evolutionary processes, including selection and differential fitness, shape the introgression of genetic material across a hybrid zone, resulting in the exchange of some genes but not others. Differential introgression of molecular or phenotypic markers can thus provide insight into factors contributing to reproductive isolation. We characterized patterns of genetic variation across a hybrid zone between two tidal marsh birds, Saltmarsh (Ammodramus caudacutus) and Nelson’s (A. nelsoni) sparrows (n = 286), and compared patterns of introgression among multiple genetic markers and phenotypic traits. Results: Geographic and genomic cline analyses revealed variable patterns of introgression among marker types. Most markers exhibited gradual clines and indicated that introgression exceeds the spatial extent of the previously documented hybrid zone. We found steeper clines, indicating strong selection for loci associated with traits related to tidal marsh adaptations, including for a marker linked to a gene region associated with metabolic functions, including an osmotic regulatory pathway, as well as for a marker related to melanin-based pigmentation, supporting an adaptive role of darker plumage (salt marsh melanism) in tidal marshes. Narrow clines at mitochondrial and sex-linked markers also offer support for Haldane’s rule. We detected patterns of asymmetrical introgression toward A. caudacutus, which may be driven by differences in mating strategy or differences in population density between the two species. Conclusions: Our findings offer insight into the dynamics of a hybrid zone traversing a unique environmental gradient and provide evidence for a role of ecological divergence in the maintenance of pure species boundaries despite ongoing gene flow
    corecore