153,963 research outputs found
APPLE: Approximate Path for Penalized Likelihood Estimators
In high-dimensional data analysis, penalized likelihood estimators are shown
to provide superior results in both variable selection and parameter
estimation. A new algorithm, APPLE, is proposed for calculating the Approximate
Path for Penalized Likelihood Estimators. Both the convex penalty (such as
LASSO) and the nonconvex penalty (such as SCAD and MCP) cases are considered.
The APPLE efficiently computes the solution path for the penalized likelihood
estimator using a hybrid of the modified predictor-corrector method and the
coordinate-descent algorithm. APPLE is compared with several well-known
packages via simulation and analysis of two gene expression data sets.Comment: 24 pages, 9 figure
A Robust Hybrid Approach Based on Estimation of Distribution Algorithm and Support Vector Machine for Hunting Candidate Disease Genes
Microarray data are high dimension with high noise ratio and relatively small sample size, which makes it a challenge to use microarray data to identify candidate disease genes. Here, we have presented a hybrid method that combines estimation of distribution algorithm with support vector machine for selection of key feature genes. We have benchmarked the method using the microarray data of both diffuse B cell lymphoma and colon cancer to demonstrate its performance for identifying key features from the profile data of high-dimension gene expression. The method was compared with a probabilistic model based on genetic algorithm and another hybrid method based on both genetics algorithm and support vector machine. The results showed that the proposed method provides new computational strategy for hunting candidate disease genes from the profile data of disease gene expression. The selected candidate disease genes may help to improve the diagnosis and treatment for diseases
An evolutionary approach for balancing effectiveness and representation level in gene selection
As data mining develops and expands to new application areas, feature selection also reveals various aspects to be considered. This paper underlines two aspects that seem to categorize the large body of available feature selection algorithms: the effectiveness and the representation level. The effectiveness deals with selecting the minimum set of variables that maximize the accuracy of a classifier and the representation level concerns discovering how relevant the variables are for the domain of interest. For balancing the above aspects, the paper proposes an evolutionary framework for feature selection that expresses a hybrid method, organized in layers, each of them exploits a specific model of search strategy. Extensive experiments on gene selection from DNA-microarray datasets are presented and discussed. Results indicate that the framework compares well with different hybrid methods proposed in literature as it has the capability of finding well suited subsets of informative features while improving classification accurac
Filter-Wrapper Methods For Gene Selection In Cancer Classification
In microarray gene expression studies, finding the smallest subset of informative
genes from microarray datasets for clinical diagnosis and accurate cancer classification
is one of the most difficult challenges in machine learning task. Many researchers have
devoted their efforts to address this problem by using a filter method, a wrapper method
or a combination of both approaches. A hybrid method is a hybridisation approach between
filter and wrapper methods. It benefits from the speed of the filter approach
and the accuracy of the wrapper approach. Several hybrid filter-wrapper methods have
been proposed to select informative genes. However, hybrid methods encounter a number
of limitations, which are associated with filter and wrapper approaches. The gene
subset that is produced by filter approaches lacks predictiveness and robustness. The
wrapper approach encounters problems of complex interactions among genes and stagnation
in local optima. To address these drawbacks, this study investigates filter and
wrapper methods to develop effective hybrid methods for gene selection. This study
proposes new hybrid filter-wrapper methods based on Maximum Relevancy Minimum
Redundancy (MRMR) as a filter approach and adapted bat-inspired algorithm (BA) as
a wrapper approach. First, MRMR hybridisation and BA adaptation are investigated
to resolve the gene selection problem. The proposed method is called MRMR-BA
Very Important Pool (VIP) genes – an application for microarray-based molecular signatures
<p>Abstract</p> <p>Background</p> <p>Advances in DNA microarray technology portend that molecular signatures from which microarray will eventually be used in clinical environments and personalized medicine. Derivation of biomarkers is a large step beyond hypothesis generation and imposes considerably more stringency for accuracy in identifying informative gene subsets to differentiate phenotypes. The inherent nature of microarray data, with fewer samples and replicates compared to the large number of genes, requires identifying informative genes prior to classifier construction. However, improving the ability to identify differentiating genes remains a challenge in bioinformatics.</p> <p>Results</p> <p>A new hybrid gene selection approach was investigated and tested with nine publicly available microarray datasets. The new method identifies a Very Important Pool (VIP) of genes from the broad patterns of gene expression data. The method uses a bagging sampling principle, where the re-sampled arrays are used to identify the most informative genes. Frequency of selection is used in a repetitive process to identify the VIP genes. The putative informative genes are selected using two methods, t-statistic and discriminatory analysis. In the t-statistic, the informative genes are identified based on p-values. In the discriminatory analysis, disjoint Principal Component Analyses (PCAs) are conducted for each class of samples, and genes with high discrimination power (DP) are identified. The VIP gene selection approach was compared with the p-value ranking approach. The genes identified by the VIP method but not by the p-value ranking approach are also related to the disease investigated. More importantly, these genes are part of the pathways derived from the common genes shared by both the VIP and p-ranking methods. Moreover, the binary classifiers built from these genes are statistically equivalent to those built from the top 50 p-value ranked genes in distinguishing different types of samples.</p> <p>Conclusion</p> <p>The VIP gene selection approach could identify additional subsets of informative genes that would not always be selected by the p-value ranking method. These genes are likely to be additional true positives since they are a part of pathways identified by the p-value ranking method and expected to be related to the relevant biology. Therefore, these additional genes derived from the VIP method potentially provide valuable biological insights.</p
Development and evaluation of machine learning algorithms for biomedical applications
Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches.
This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques for selecting topological features suitable for link prediction in gene networks; a graph sparsification method for network sampling; (iii) combined supervised and unsupervised methods to infer gene networks; and (iv) sampling and boosting techniques for reverse engineering gene networks. For drug sensitivity prediction problem, the dissertation presents (i) an instance selection technique and hybrid method for drug sensitivity prediction; (ii) a link prediction approach to drug sensitivity prediction; a noise-filtering method for drug sensitivity prediction; and (iv) transfer learning approaches for enhancing the performance of drug sensitivity prediction. Substantial experiments are conducted to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results demonstrate the feasibility of the algorithms and their superiority over the existing approaches
Recommended from our members
Harnessing Saccharomyces cerevisiae Genetics for Cell Engineering
Cell engineering holds the promise of creating designer microorganisms that can address some of society's most pressing needs, ranging from the production of biofuels and drugs to the detection of disease states or environmental contaminants. Realizing these goals will require the extensive reengineering of cells, which will be a formidable task due both to our incomplete understanding of the cell at the systems level and to the technical difficulty of manipulating the genome on a large scale. In Chapter 1, we begin by discussing the potential of directed evolution approaches to overcome the challenges of cell engineering. We then cover the methodologies that are emerging to adapt the mutagenesis and selection steps of directed evolution for in vivo, multi-component systems. Yeast hybrid assays provide versatile systems for coupling a function of interest to a high-throughput growth selection for directed evolution. In Chapter 2, we develop an experimental framework to characterize and optimize the performance of yeast two- and three-hybrid growth selections. Using the LEU2 reporter gene as a model selectable marker, we show that quantitative characterization of these assay systems allows us to identify key junctures for optimization. In Chapter 3, we apply the same systematic characterization to the yeast three-hybrid counter selection, beginning with our previously reported URA3 reporter. We further develop a screening approach to identify effective new yeast three-hybrid counter selection reporters. Installing customized multi-gene pathways in the cell is arguably the first step of any cell engineering endeavor. Chapter 4 describes the design, construction, and initial validation of Reiterative Recombination, a robust in vivo DNA assembly method relying on homing endonuclease-stimulated homologous recombination. Reiterative Recombination elongates constructs of interest in a stepwise manner by employing pairs of alternating, orthogonal endonucleases and selectable markers. We anticipate that Reiterative Recombination will be a valuable tool for a variety of cell engineering endeavors because it is both highly efficient and technically straightforward. As an initial application, we illustrate Reiterative Recombination's utility in the area of metabolic engineering in Chapter 5. Specifically, we demonstrate that we can build functional biosynthetic pathways and generate large libraries of pathways in vivo. The facility of pathway construction by Reiterative Recombination should expedite strain optimization for metabolic engineering
Differential introgression and the maintenance of species boundaries in an advanced generation avian hybrid zone
Background: Evolutionary processes, including selection and differential fitness, shape the introgression of genetic material across a hybrid zone, resulting in the exchange of some genes but not others. Differential introgression of molecular or phenotypic markers can thus provide insight into factors contributing to reproductive isolation. We characterized patterns of genetic variation across a hybrid zone between two tidal marsh birds, Saltmarsh (Ammodramus caudacutus) and Nelson’s (A. nelsoni) sparrows (n = 286), and compared patterns of introgression among multiple genetic markers and phenotypic traits.
Results: Geographic and genomic cline analyses revealed variable patterns of introgression among marker types. Most markers exhibited gradual clines and indicated that introgression exceeds the spatial extent of the previously documented hybrid zone. We found steeper clines, indicating strong selection for loci associated with traits related to tidal marsh adaptations, including for a marker linked to a gene region associated with metabolic functions, including an osmotic regulatory pathway, as well as for a marker related to melanin-based pigmentation, supporting an adaptive role of darker plumage (salt marsh melanism) in tidal marshes. Narrow clines at mitochondrial and sex-linked markers also offer support for Haldane’s rule. We detected patterns of asymmetrical introgression toward A. caudacutus, which may be driven by differences in mating strategy or differences in population density between the two species.
Conclusions: Our findings offer insight into the dynamics of a hybrid zone traversing a unique environmental gradient and provide evidence for a role of ecological divergence in the maintenance of pure species boundaries despite ongoing gene flow
- …