Search CORE

327 research outputs found

A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data

Author: A Blum
A Tsymbal
Albert Y Zomaya
B Liu
Bing B Zhou
C Ding
C Ooi
D Ruta
G Bontempi
I Inza
IH Witten
J Hua
J Liu
JR Quinlan
JR Quinlan
L Lam
L Li
M Hassan
M Kudo
M Robnik-Šikonja
P Jafari
Pengyi Yang
R Kohavi
RL Somorjai
S Armstrong
S Dudoit
T Golub
T Jirapech-Umpai
T Mitchell
TG Dietterich
U Alon
W Li
X Chen
Y Saeys
Y Saeys
Y Su
Y Wang
YH Yang
Z Zhang
Z Zhang
Zili Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences. <br /

Deakin Research Online

Crossref

Springer - Publisher Connector

PubMed Central

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Intelligent techniques using molecular data analysis in leukaemia: an opportunity for personalized medicine support system

Author: Adelson D.
Banjar H.
Brown A.
Chaudhri N.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

The use of intelligent techniques in medicine has brought a ray of hope in terms of treating leukaemia patients. Personalized treatment uses patient’s genetic profile to select a mode of treatment. This process makes use of molecular technology and machine learning, to determine the most suitable approach to treating a leukaemia patient. Until now, no reviews have been published from a computational perspective concerning the development of personalized medicine intelligent techniques for leukaemia patients using molecular data analysis. This review studies the published empirical research on personalized medicine in leukaemia and synthesizes findings across studies related to intelligence techniques in leukaemia, with specific attention to particular categories of these studies to help identify opportunities for further research into personalized medicine support systems in chronic myeloid leukaemia. A systematic search was carried out to identify studies using intelligence techniques in leukaemia and to categorize these studies based on leukaemia type and also the task, data source, and purpose of the studies. Most studies used molecular data analysis for personalized medicine, but future advancement for leukaemia patients requires molecular models that use advanced machine-learning methods to automate decision-making in treatment management to deliver supportive medical information to the patient in clinical practice.Haneen Banjar, David Adelson, Fred Brown, and Naeem Chaudhr

Crossref

Adelaide Research & Scholarship

Directory of Open Access Journals

Recommended from our members

Spectral imaging in preclinical research and clinical pathology.

Author: Beechem Joseph
Levenson Richard
McNamara George
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Spectral imaging methods are attracting increased interest from researchers and practitioners in basic science, pre-clinical and clinical arenas. A combination of better labeling reagents and better optics creates opportunities to detect and measure multiple parameters at the molecular and cellular level. These tools can provide valuable insights into the basic mechanisms of life, and yield diagnostic and prognostic information for clinical applications. There are many multispectral technologies available, each with its own advantages and limitations. This chapter will present an overview of the rationale for spectral imaging, and discuss the hardware, software and sample labeling strategies that can optimize its usefulness in clinical settings

eScholarship - University of California

Gene Expression Analysis Methods on Microarray Data a A Review

Author: Prof G V Padma Raju
Publication venue: Global Journals Inc. (US)
Publication date: 14/05/2014
Field of study

In recent years a new type of experiments are changing the way that biologists and other specialists analyze many problems. These are called high throughput experiments and the main difference with those that were performed some years ago is mainly in the quantity of the data obtained from them. Thanks to the technology known generically as microarrays, it is possible to study nowadays in a single experiment the behavior of all the genes of an organism under different conditions. The data generated by these experiments may consist from thousands to millions of variables and they pose many challenges to the scientists who have to analyze them. Many of these are of statistical nature and will be the center of this review. There are many types of microarrays which have been developed to answer different biological questions and some of them will be explained later. For the sake of simplicity we start with the most well known ones: expression microarrays

Global Journal of Computer Science and Technology (GJCST)

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Crossref

Access to Research at National University of Ireland, Galway

Formal Concept Analysis Applications in Bioinformatics

Author: Roscoe Sarah
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 10/11/2020
Field of study

Bioinformatics is an important field that seeks to solve biological problems with the help of computation. One specific field in bioinformatics is that of genomics, the study of genes and their functions. Genomics can provide valuable analysis as to the interaction between how genes interact with their environment. One such way to measure the interaction is through gene expression data, which determines whether (and how much) a certain gene activates in a situation. Analyzing this data can be critical for predicting diseases or other biological reactions. One method used for analysis is Formal Concept Analysis (FCA), a computing technique based in partial orders that allows the user to examine the structural properties of binary data based on which subsets of the data set depend on each other. This thesis surveys, in breadth and depth, the current literature related to the use of FCA for bioinformatics, with particular focus on gene expression data. This includes descriptions of current data management techniques specific to FCA, such as lattice reduction, discretization, and variations of FCA to account for different data types. Advantages and shortcomings of using FCA for genomic investigations, as well as the feasibility of using FCA for this application are addressed. Finally, several areas for future doctoral research are proposed. Adviser: Jitender S. Deogu

Functional Analysis of Human Long Non-coding RNAs and Their Associations with Diseases

Author: Cogill Steven
Publication venue: Clemson University Libraries
Publication date: 01/12/2016
Field of study

Within this study, we sought to leverage knowledge from well-characterized protein coding genes to characterize the lesser known long non-coding RNA (lncRNA) genes using computational methods to find functional annotations and disease associations. Functional genome annotation is an essential step to a systems-level view of the human genome. With this knowledge, we can gain a deeper understanding of how humans develop and function, and a better understanding of human disease. LncRNAs are transcripts greater than 200 nucleotides, which do not code for proteins. LncRNAs have been found to regulate development, tissue and cell differentiation, and organ formation. Their dysregulation has been linked to several diseases including autism spectrum disorder (ASD) and cancer. While a great deal of research has been dedicated to protein-coding genes, the relatively recently discovered lncRNA genes have yet to be characterized. LncRNA function is tied closely to when and where they are expressed. Co-expression network analysis offer a means of functional annotation of uncharacterized genes through a guilt by association approach. We have constructed two co-expression networks using known disease-associated protein-coding genes and lncRNA genes. Through clustering of the networks, gene set enrichment analysis, and centrality measures, we found enrichment for disease association and functions as well as identified high-confidence lncRNA disease gene targets. We present a novel approach to the identification of disease state associations by demonstrating genes that are associated with the same disease states share patterns that can be discerned from transcriptomes of healthy tissues. Using a machine learning algorithm, we built a model to classify ASD versus non-ASD genes using their expression profiles from healthy developing human brain tissues. Feature selection during the model-building process also identified critical temporospatial points for the determination of ASD genes. We constructed a webserver tool for the prioritization of genes for ASD association. The webserver tool has a database containing prioritization and co-expression information for nearly every gene in the human genome

Clemson University: TigerPrints

Significant Gene Array Analysis and Cluster-Based Machine Learning for Disease Class Prediction

Author: Barreiro-Arevalo Myrine A.
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/08/2021
Field of study

Gene expression analysis has been of major interest to biostatisticians for many decades. Such studies are necessary for the understanding of disease risk assessment and prediction, so that medical professionals and scientists alike may learn how to better create treatment plans to lessen symptoms and perhaps even find cures. In this study, we will investigate various gene expression analyses and machine learning techniques for disease class prediction, as well as assess predictive validity of these models and uncover differentially expressed (DE) genes for their relevant pathology datasets. Multiple gene expression datasets will be used to test model accuracies and will be obtained using the Affymetrix U133A platform (GPL96). Significant Analysis of Microarrays (SAM) had been used to identify potential disease biomarkers, followed by these predictive models: (a) random forest, (b) random forest with Gene eXpression Network Analysis (GXNA), (c) RF++, (d) LASSO, and (e) Bayesian Neural Networks. One of the intended goals for this study is to find clusters of co-expressed genes and identify the effect of clustering classification based on knowledge in gene expression data/microarray data. The other goal is to determine the usefulness of Automatic Relevancy Determination in Bayesian neural networks

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Recent Trends in Cytogenetic Studies

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Recent Trends in Cytogenetic Studies - Methodologies and Applications deals with recent trends in cytogenetics with minute details of methodologies that can be adopted in clinical laboratories. The chapters deal with basic methods of primary cultures, cell lines and their applications; microtechnologies and automations; array CGH for the diagnosis of fetal conditions; approaches to acute lymphoblastic and myeloblastic leukemias in patients and survivors of atomic bomb exposure; use of digital image technology and using chromosomes as tools to discover biodiversity. While concentrating on the advanced methodologies in cytogenetic studies and their applications, authors have pointed out the need to develop cytogenetic labs with modern tools to facilitate precise and effective diagnosis to benefit the patient population

Directory of Open Access Books (DOAB)