24 research outputs found

    Predicting compound activity from phenotypic profiles and chemical structures

    Get PDF
    Predicting assay results for compounds virtually using chemical structures and phenotypic profiles has the potential to reduce the time and resources of screens for drug discovery. Here, we evaluate the relative strength of three high-throughput data sources—chemical structures, imaging (Cell Painting), and gene-expression profiles (L1000)—to predict compound bioactivity using a historical collection of 16,170 compounds tested in 270 assays for a total of 585,439 readouts. All three data modalities can predict compound activity for 6–10% of assays, and in combination they predict 21% of assays with high accuracy, which is a 2 to 3 times higher success rate than using a single modality alone. In practice, the accuracy of predictors could be lower and still be useful, increasing the assays that can be predicted from 37% with chemical structures alone up to 64% when combined with phenotypic data. Our study shows that unbiased phenotypic profiling can be leveraged to enhance compound bioactivity prediction to accelerate the early stages of the drug-discovery process

    Human Genetics in Rheumatoid Arthritis Guides a High-Throughput Drug Screen of the CD40 Signaling Pathway

    Get PDF
    Although genetic and non-genetic studies in mouse and human implicate the CD40 pathway in rheumatoid arthritis (RA), there are no approved drugs that inhibit CD40 signaling for clinical care in RA or any other disease. Here, we sought to understand the biological consequences of a CD40 risk variant in RA discovered by a previous genome-wide association study (GWAS) and to perform a high-throughput drug screen for modulators of CD40 signaling based on human genetic findings. First, we fine-map the CD40 risk locus in 7,222 seropositive RA patients and 15,870 controls, together with deep sequencing of CD40 coding exons in 500 RA cases and 650 controls, to identify a single SNP that explains the entire signal of association (rs4810485, P = 1.4×10(−9)). Second, we demonstrate that subjects homozygous for the RA risk allele have ∼33% more CD40 on the surface of primary human CD19+ B lymphocytes than subjects homozygous for the non-risk allele (P = 10(−9)), a finding corroborated by expression quantitative trait loci (eQTL) analysis in peripheral blood mononuclear cells from 1,469 healthy control individuals. Third, we use retroviral shRNA infection to perturb the amount of CD40 on the surface of a human B lymphocyte cell line (BL2) and observe a direct correlation between amount of CD40 protein and phosphorylation of RelA (p65), a subunit of the NF-κB transcription factor. Finally, we develop a high-throughput NF-κB luciferase reporter assay in BL2 cells activated with trimerized CD40 ligand (tCD40L) and conduct an HTS of 1,982 chemical compounds and FDA–approved drugs. After a series of counter-screens and testing in primary human CD19+ B cells, we identify 2 novel chemical inhibitors not previously implicated in inflammation or CD40-mediated NF-κB signaling. Our study demonstrates proof-of-concept that human genetics can be used to guide the development of phenotype-based, high-throughput small-molecule screens to identify potential novel therapies in complex traits such as RA

    Common Subsequences and Supersequences and Their Expected Length

    No full text
    . Let f(n; k; l) be the expected length of a longest common subsequence of l sequences of length n over an alphabet of size k. It is known that there are constants fl (l) k such that f(n; k; l) ! fl (l) k n, we show that fl (l) k = \Theta(k 1=l\Gamma1 ). Bounds for the corresponding constants for the expected length of a shortest common supersequence are also presented. 1 Introduction and preliminaries To find the expected length of a longest common subsequence of two sequences is a standard problem studied in the literature [7, 8]. In this paper we shall concentrate on the expected length of a longest common subsequence of several sequences. We show that this expected length for l sequences of length n over an alphabet of size k is \Theta( n k 1\Gamma1=l ) for n ?? k ?? l. We also consider a dual case, the expected length of a shortest common supersequence. Let \Sigma = f0; 1; : : : ; k \Gamma 1g be a fixed alphabet of size k. Let \Sigma be the set of all strings over \..

    Upper Bounds for the Expected Length of Longest Common Subsequences

    No full text
    Let f(n) be the expected length of a longest common subsequence of two random sequences over a fixed alphabet of size k. It is known that f(n) ! ck n for some constant ck . We define a collation as a pair of sequences with marked matches. A dominated collation is a collation that is not matched optimally. Upper bounds for ck can be derived from upper bounds for the number of nondominated collations. Using local properties of matches we can eliminate many nondominated collations and improve upper bounds for ck . 1 Introduction The problem of finding longest common subsequences arises in various situations. As typical we can mention approximate string matching and text comparisons (e.g. the diff function in UNIX) [1, 11]. Another important area where the longest common subsequence problem appears is molecular biology. The longest common subsequence problem is a special case of the more general sequence alignment problem. A survey on the longest common subsequence problem can be found in..

    Simple Maximum Likelihood Methods for the Optical Mapping Problem

    No full text
    Recently a new method for obtaining restriction maps was developed by David Schwartz at NYU. Using this method restriction maps are created from fluorescent images of individual molecules obtained using a microscope. For every individual observed molecule, image processing methods are used to generate a list of the approximate locations of the sites where the molecule is cut by the restriction enzyme. Our task is to find the location of all restriction sites given the observed cutting sites. This is also complicated by the fact that an orientation of the molecules is unknown, i.e. for a cut-site x we do not know whether x or 1 \Gamma x corresponds to a restriction site in a unit length molecule. First we consider the case that the orientation of all molecules and the number c of restriction sites are known. We suppose that for each restriction site location y j the corresponding measured cut-sites follow the normal distribution with the density function g(x; ` j ; oe j ) for some oe j ..

    Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry

    Get PDF
    Although protein identification by matching tandem mass spectra (MS/MS) against protein databases is a widespread tool in mass spectrometry, the question about reliability of such searches remains open. Absence of rigorous significance scores in MS/MS database search makes it difficult to discard random database hits and may lead to erroneous protein identification, particularly in the case of mutated or post-translationally modified peptides. This problem is especially important for high-throughput MS/MS projects when the possibility of expert analysis is limited. Thus, algorithms that sort out reliable database hits from unreliable ones and identify mutated and modified peptides are sought. Most MS/MS database search algorithms rely on variations of the Shared Peaks Count approach that scores pairs of spectra by the peaks (masses) they have in common. Although this approach proved to be useful, it has a high error rate in identification of mutated and modified peptides. We describe new MS/MS database search tools, MS-CONVOLUTION and MS-ALIGNMENT, which implement the spectral convolution and spectral alignment approaches to peptide identification. We further analyze these approaches to identification of modified peptides and demonstrate their advantages over the Shared Peaks Count. We also use the spectral alignment approach as a filter in a new database search algorithm that reliably identifies peptides differing by up to two mutations/modifications from a peptide in a database

    Local Rules for Protein Folding on a Triangular Lattice and Generalized Hydrophobicity in the HP Model

    No full text
    We consider the problem of determining the threedimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill [3], which models protein as a chain of amino acid residues that are either hydrophobic or polar, and hydrophobic interactions are the dominant initial driving force for the protein folding. Hart and Istrail [5] gave approximation algorithms for folding proteins on the cubic lattice under HP model. In this paper, we examine the choice of a lattice by considering its algorithmic and geometric implications and argue that triangular lattice is a more reasonable choice. We present a set of folding rules for a triangular lattice and analyze the approximation ratio which they achieve. In addition, we introduce a generalization of the HP model to account for residues having different levels of hydrophobicity. After describing the biological foundation for this generalization, we show that in the new model we ar..

    Predicting compound activity from phenotypic profiles and chemical structures

    Get PDF
    Experimental assays are used to determine if compounds cause a desired activity in cells. Here the authors demonstrate that computational methods can predict compound bioactivity given their chemical structure, imaging and gene expression data from historic screening libraries
    corecore