25 research outputs found

    Automated Quality Control for Genome Wide Association Studies

    Get PDF
    This paper provides details on the necessary steps to assess and control data in genome wide association studies (GWAS) using genotype information on a large number of genetic markers for large number of individuals. Due to varied study designs and genotyping platforms between multiple sites/projects as well as potential genotyping errors, it is important to ensure high quality data. Scripts and directions are provided to facilitate others in this process

    How Low Can You Go? Feature Selection for Drug Discovery

    Get PDF
    The cost of bringing a drug to market depends on how quickly a candidate drug can be “discovered” and evaluated to ensure safety and effectiveness. In this work we develop a method for predicting whether a given drug and protein compound will “bind.” Our aim is to select a set of features to predict drug-protein interactions. This study focuses on kinases. Kinase inhibitors are the largest class of new cancer therapies. Selective inhibition is difficult due to high sequence similarity, leading to off-target interactions and side-effects. Pictured here human c-SRC

    Development of tools for the automated analysis of spectra generated by tandem mass spectrometry

    Get PDF
    Background While multiple tools exist for the analysis and identification of spectra generated in shotgun proteomics experiments, few easily implemented tools exist that allow for the automated analysis of the quality of spectra. A researcher’s knowledge of the quality of a spectra from an experiment can be helpful in determining possible reasons for misidentification or lack of identification of spectra in a sample. Materials and methods We are developing a automated high throughput method that analyses spectra from 2d-LC-MS/MS datasets to determine their quality and overall determines the quality of the run. We will then compare our programs to existing programs that perform a similar function. Our program calculates a quality score based on the following metrics: signal/noise ratio, absolute signal intensity, peak number, predicted mass distances between peak, and percent of incoming mass accounted for by peaks. These scores are then graphed against the outputs of common database search algorithms in order to display the following four categories: High-quality/Identified, High-quality/Unidentified, Low-quality/Identified, and Low-quality/Unidentified. We are currently testing the algorithm against 2d-LC-MS/MS runs of a mixed protein standard and blanks with no peptide spectra. The application samples are a time series of metaproteomes collected from environmental ground waters after biostimulation

    Towards a Better Understanding of On and Off Target Effects of the Lymphocyte-Specific Kinase LCK for the Development of Novel and Safer Pharmaceuticals

    Get PDF
    In this work we have developed a multi-tiered computational platform to study protein-drug interactions. At the beginning of the workflow more efficient and less accurate methods are used to enable large libraries of proteins in many conformations and massive chemical libraries to be screened. At each subsequent step in the workflow a subset of input data is investigated with increased accuracy and more computationally expensive methods. We demonstrate the developed workflow with the investigation of the lymphocyte-specific kinase LCK, which is implicated as a drug target in many cancers and also known to have toxic effects when unintentionally targeted. Several LCK states and conformations are investigated using molecular docking and generalized Born and surface area continuum solvation (MM/GBSA). Different variations in the drug screening process provide unique results that may elucidate the biological mechanisms underlying the drug interactions

    Development of tools for the automated analysis of spectra generated by tandem mass spectrometry

    Get PDF
    Background While multiple tools exist for the analysis and identification of spectra generated in shotgun proteomics experiments, few easily implemented tools exist that allow for the automated analysis of the quality of spectra. A researcher’s knowledge of the quality of a spectra from an experiment can be helpful in determining possible reasons for misidentification or lack of identification of spectra in a sample. Materials and methods We are developing a automated high throughput method that analyses spectra from 2d-LC-MS/MS datasets to determine their quality and overall determines the quality of the run. We will then compare our programs to existing programs that perform a similar function. Our program calculates a quality score based on the following metrics: signal/noise ratio, absolute signal intensity, peak number, predicted mass distances between peak, and percent of incoming mass accounted for by peaks. These scores are then graphed against the outputs of common database search algorithms in order to display the following four categories: High-quality/Identified, High-quality/Unidentified, Low-quality/Identified, and Low-quality/Unidentified. We are currently testing the algorithm against 2d-LC-MS/MS runs of a mixed protein standard and blanks with no peptide spectra. The application samples are a time series of metaproteomes collected from environmental ground waters after biostimulation

    Ensemble-based docking: From hit discovery to metabolism and toxicity predictions

    Get PDF
    This paper describes and illustrates the use of ensemble-based docking, i.e., using a collection of protein structures in docking calculations for hit discovery, the exploration of biochemical pathways and toxicity prediction of drug candidates. We describe the computational engineering work necessary to enable large ensemble docking campaigns on supercomputers. We show examples where ensemble-based docking has significantly increased the number and the diversity of validated drug candidates. Finally, we illustrate how ensemble-based docking can be extended beyond hit discovery and toward providing a structural basis for the prediction of metabolism and off-target binding relevant to pre-clinical and clinical trials

    KEAP1 Is Required for Artesunate Anticancer Activity in Non-Small-Cell Lung Cancer

    Get PDF
    Artesunate is the most common treatment for malaria throughout the world. Artesunate has anticancer activity likely through the induction of reactive oxygen species, the same mechanism of action utilized in Plasmodium falciparum infections. Components of the kelch-like ECH-associated protein 1 (KEAP1)/nuclear factor erythroid 2-related factor 2 (NRF2) pathway, which regulates cellular response to oxidative stress, are mutated in approximately 30% of non-small-cell lung cancers (NSCLC); therefore, we tested the hypothesis that KEAP1 is required for artesunate sensitivity in NSCLC. Dose response assays identified A549 cells, which have a G333C-inactivating mutation in KEAP1, as resistant to artesunate, with an IC50 of 23.6 µM, while H1299 and H1563 cells were sensitive to artesunate, with a 10-fold lower IC50. Knockdown of KEAP1 through siRNA caused increased resistance to artesunate in H1299 cells. Alternatively, the pharmacological inhibition of NRF2, which is activated downstream of KEAP1 loss, by ML385 partially restored sensitivity of A549 cells to artesunate, and the combination of artesunate and ML385 was synergistic in both A549 and H1299 cells. These findings demonstrate that KEAP1 is required for the anticancer activity of artesunate and support the further development of NRF2 inhibitors to target patients with mutations in the KEAP1/NRF2 pathway

    Serendipitous discoveries in microarray analysis

    Get PDF
    Background Scientists are capable of performing very large scale gene expression experiments with current microarray technologies. In order to find significance in the expression data, it is common to use clustering algorithms to group genes with similar expression patterns. Clusters will often contain related genes, such as co-regulated genes or genes in the same biological pathway. It is too expensive and time consuming to test all of the relationships found in large scale microarray experiments. There are many bioinformatics tools that can be used to infer the significance of microarray experiments and cluster analysis. Materials and methods In this project we review several existing tools and used a combination of them to narrow down the number of significant clusters from a microarray experiment. Microarray data was obtained through the Cerebellar Gene Regulation in Time and Space (Cb GRiTS) database [2]. The data was clustered using paraclique, a graph-based clustering algorithm. Each cluster was evaluated using Gene-Set Cohesion Analysis Tool (GCAT) [3], ONTO-Pathway Analysis [4], and Allen Brain Atlas data [1]. The clusters with the lowest p-values in each of the three analysis methods were researched to determine good candidate clusters for further experimental confirmation of gene relationships. Results and conclusion While looking for genes important to cerebellar development, we serendipitously came across interesting clusters related to neural diseases. For example, we found two clusters that contain genes known to be associated with Parkinson’s disease, Huntington’s disease, and Alzheimer’s disease pathways. Both clusters scored low in all three analyses and have very similar expression patterns but at different expression levels. Such unexpected discoveries help unlock the real power of high throughput data analysis

    Reassessment of Risk Genotypes (\u3cem\u3eGRN\u3c/em\u3e, \u3cem\u3eTMEM106B\u3c/em\u3e, and \u3cem\u3eABCC9\u3c/em\u3e Variants) Associated with Hippocampal Sclerosis of Aging Pathology

    Get PDF
    Hippocampal sclerosis of aging (HS-Aging) is a common high-morbidity neurodegenerative condition in elderly persons. To understand the risk factors for HS-Aging, we analyzed data from the Alzheimer’s Disease Genetics Consortium and correlated the data with clinical and pathologic information from the National Alzheimer’s Coordinating Center database. Overall, 268 research volunteers with HS-Aging and 2,957 controls were included; detailed neuropathologic data were available for all. The study focused on single-nucleotide polymorphisms previously associated with HS-Aging risk: rs5848 ( GRN ), rs1990622 ( TMEM106B ), and rs704180 ( ABCC9 ). Analyses of a subsample that was not previously evaluated (51 HS-Aging cases and 561 controls) replicated the associations of previously identified HS-Aging risk alleles. To test for evidence of gene-gene interactions and genotype-phenotype relationships, pooled data were analyzed. The risk for HS-Aging diagnosis associated with these genetic polymorphisms was not secondary to an association with either Alzheimer disease or dementia with Lewy body neuropathologic changes. The presence of multiple risk genotypes was associated with a trend for additive risk for HS-Aging pathology. We conclude that multiple genes play important roles in HS-Aging, which is a distinctive neurodegenerative disease of aging

    CTCF variants in 39 individuals with a variable neurodevelopmental disorder broaden the mutational and clinical spectrum

    Get PDF
    Purpose: Pathogenic variants in the chromatin organizer CTCF were previously reported in seven individuals with a neurodevelopmental disorder (NDD). Methods: Through international collaboration we collected data from 39 subjects with variants in CTCF. We performed transcriptome analysis on RNA from blood samples and utilized Drosophila melanogaster to investigate the impact of Ctcf dosage alteration on nervous system development and function. Results: The individuals in our cohort carried 2 deletions, 8 likely gene-disruptive, 2 splice-site, and 20 different missense variants, most of them de novo. Two cases were familial. The associated phenotype was of variable severity extending from mild developmental delay or normal IQ to severe intellectual disability. Feeding difficulties and behavioral abnormalities were common, and variable other findings including growth restriction and cardiac defects were observed. RNA-sequencing in five individuals identified 3828 deregulated genes enriched for known NDD genes and biological processes such as transcriptional regulation. Ctcf dosage alteration in Drosophila resulted in impaired gross neurological functioning and learning and memory deficits. Conclusion: We significantly broaden the mutational and clinical spectrum of CTCF-associated NDDs. Our data shed light onto the functional role of CTCF by identifying deregulated genes and show that Ctcf alterations result in nervous system defects in Drosophila.Peer reviewe
    corecore