Search CORE

8 research outputs found

WaveCNV: allele-specific copy number alterations in primary tumors and xenograft models from next-generation sequencing.

Author: Ali Johar
Arshadi Niloofar
Beck Tim
Holt Carson
Jang Gun Ho
Losic Bojan
McPherson John
Muthuswamy Lakshmi B
Pai Deepa
Syam Sujata
Trinh Quang
Zhao Zhen
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

MotivationCopy number variations (CNVs) are a major source of genomic variability and are especially significant in cancer. Until recently microarray technologies have been used to characterize CNVs in genomes. However, advances in next-generation sequencing technology offer significant opportunities to deduce copy number directly from genome sequencing data. Unfortunately cancer genomes differ from normal genomes in several aspects that make them far less amenable to copy number detection. For example, cancer genomes are often aneuploid and an admixture of diploid/non-tumor cell fractions. Also patient-derived xenograft models can be laden with mouse contamination that strongly affects accurate assignment of copy number. Hence, there is a need to develop analytical tools that can take into account cancer-specific parameters for detecting CNVs directly from genome sequencing data.ResultsWe have developed WaveCNV, a software package to identify copy number alterations by detecting breakpoints of CNVs using translation-invariant discrete wavelet transforms and assign digitized copy numbers to each event using next-generation sequencing data. We also assign alleles specifying the chromosomal ratio following duplication/loss. We verified copy number calls using both microarray (correlation coefficient 0.97) and quantitative polymerase chain reaction (correlation coefficient 0.94) and found them to be highly concordant. We demonstrate its utility in pancreatic primary and xenograft sequencing data.Availability and implementationSource code and executables are available at https://github.com/WaveCNV. The segmentation algorithm is implemented in MATLAB, and copy number assignment is implemented [email protected] informationSupplementary data are available at Bioinformatics online

CiteSeerX

PubMed Central

eScholarship - University of California

Feature Selection for Improving Case-Based Classifiers on High-Dimensional Data Sets

Author: Niloofar Arshadi
Publication venue
Publication date
Field of study

Case-based reasoning (CBR) is a suitable paradigm for class discovery in molecular biology, where the rules that define the domain knowledge are difficult to obtain, and there is not sufficient knowledge for formal knowledge representation. To extend the capabilities of this paradigm, we propose logistic regression for CBR (LR4CBR), a method that uses logistic regression as a feature selection (FS) method for CBR systems. Our method not only improves the prediction accuracy of CBR classifiers in biomedical domains, but also selects a subset of features that have meaningful relationships with their class labels. In this paper, we introduce two methods to rank features for logistic regression. We show that using logistic regression as a filter FS method outperforms other FS techniques, such as Fisher and t-test, which have been widely used in analyzing biological data sets. The FS methods are combined with a computational framework for a CBR system called TA3. We also evaluate the method on two mass spectrometry data sets, and show that the prediction accuracy of TA3 improves from 90 % to 98 % and from 79.2 % to 95.4%. Finally, we compare our list of discovered biomarkers with the lists of selected biomarkers from other studies for the mass spectrometry data sets, and show the overlapping biomarkers

CiteSeerX

Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset

Author: Arshadi Niloofar
Chang Billy
Kustra Rafal
Publication venue
Publication date: 27/03/2018
Field of study

Abstract In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic variations, and building a predictive model separately in each cluster. This allows us to control ethnicity without explicitly including it in the model, which could marginalize the genetic signal we are trying to discover. Clustering not only leads to more similar ethnicity groups but also, as our results show, increases the accuracy of our model when compared to the non-clustered approach. The highest accuracy is achieved with the model adjusted for population stratification, when the genetic axes of variation are included among the set of predictors, although this may be misleading given the confounding effects

University of Toronto Research Repository

Recommended from our members

WaveCNV: allele-specific copy number alterations in primary tumors and xenograft models from next-generation sequencing.

Author: Ali Johar
Arshadi Niloofar
Beck Tim
Holt Carson
Jang Gun Ho
Losic Bojan
McPherson John
Muthuswamy Lakshmi B
Pai Deepa
Syam Sujata
Trinh Quang
Zhao Zhen
Publication venue: eScholarship, University of California
Publication date: 01/03/2014
Field of study

eScholarship - University of California

Recommended from our members

Artificial Intelligence for COVID-19 Drug Discovery and Vaccine Development.

Author: Calad-Thomson Stacie
Collins Jennifer
Cruz Emmanuel
Diez-Cecilia Elena
Ghadirian Niloofar
Goodarzi Hani
Kelly Brendan
Keshavarzi Arshadi Arash
Salem Milad
Webb Julia
Yuan Jiann Shiun
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

SARS-COV-2 has roused the scientific community with a call to action to combat the growing pandemic. At the time of this writing, there are as yet no novel antiviral agents or approved vaccines available for deployment as a frontline defense. Understanding the pathobiology of COVID-19 could aid scientists in their discovery of potent antivirals by elucidating unexplored viral pathways. One method for accomplishing this is the leveraging of computational methods to discover new candidate drugs and vaccines in silico. In the last decade, machine learning-based models, trained on specific biomolecules, have offered inexpensive and rapid implementation methods for the discovery of effective viral therapies. Given a target biomolecule, these models are capable of predicting inhibitor candidates in a structural-based manner. If enough data are presented to a model, it can aid the search for a drug or vaccine candidate by identifying patterns within the data. In this review, we focus on the recent advances of COVID-19 drug and vaccine development using artificial intelligence and the potential of intelligent training for the discovery of COVID-19 therapeutics. To facilitate applications of deep learning for SARS-COV-2, we highlight multiple molecular targets of COVID-19, inhibition of which may increase patient survival. Moreover, we present CoronaDB-AI, a dataset of compounds, peptides, and epitopes discovered either in silico or in vitro that can be potentially used for training models in order to extract COVID-19 treatment. The information and datasets provided in this review can be used to train deep learning-based models and accelerate the discovery of effective viral therapies

eScholarship - University of California

WaveCNV: allele-specific copy number alterations in primary tumors and xenograft models from next-generation sequencing

Author: Abyzov
Baslan
Biankin
Bojan Losic
Carson Holt
Carter
Coifman
Conway
Deepa Pai
Gun Ho Jang
Huynh
Ivakhno
Johar Ali
John McPherson
Kim
Klambauer
Lakshmi B. Muthuswamy
Legarreta
Magi
Mallat
Medvedev
Miller
Morton
Navin
Niloofar Arshadi
Quang Trinh
Song
Sujata Syam
Sun
Tim Beck
Van Loo
Wang
Waszak
Xie
Yau
Yoon
Zhen Zhao
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Motivation: Copy number variations (CNVs) are a major source of genomic variability and are especially significant in cancer. Until recently microarray technologies have been used to characterize CNVs in genomes. However, advances in next-generation sequencing technology offer significant opportunities to deduce copy number directly from genome sequencing data. Unfortunately cancer genomes differ from normal genomes in several aspects that make them far less amenable to copy number detection. For example, cancer genomes are often aneuploid and an admixture of diploid/non-tumor cell fractions. Also patient-derived xenograft models can be laden with mouse contamination that strongly affects accurate assignment of copy number. Hence, there is a need to develop analytical tools that can take into account cancer-specific parameters for detecting CNVs directly from genome sequencing data. Results: We have developed WaveCNV, a software package to identify copy number alterations by detecting breakpoints of CNVs using translation-invariant discrete wavelet transforms and assign digitized copy numbers to each event using next-generation sequencing data. We also assign alleles specifying the chromosomal ratio following duplication/loss. We verified copy number calls using both microarray (correlation coefficient 0.97) and quantitative polymerase chain reaction (correlation coefficient 0.94) and found them to be highly concordant. We demonstrate its utility in pancreatic primary and xenograft sequencing data. Availability and implementation: Source code and executables are available at https://github.com/WaveCNV. The segmentation algorithm is implemented in MATLAB, and copy number assignment is implemented Perl. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California