57 research outputs found

    A quantitative literature-curated gold standard for kinase-substrate pairs

    Get PDF
    We describe the Yeast Kinase Interaction Database (KID, http://www.moseslab.csb.utoronto.ca/KID/), which contains high- and low-throughput data relevant to phosphorylation events. KID includes 6,225 low-throughput and 21,990 high-throughput interactions, from greater than 35,000 experiments. By quantitatively integrating these data, we identified 517 high-confidence kinase-substrate pairs that we consider a gold standard. We show that this gold standard can be used to assess published high-throughput datasets, suggesting that it will enable similar rigorous assessments in the future

    Genetic and lifestyle risk factors for MRI-defined brain infarcts in a population-based setting.

    Get PDF
    OBJECTIVE: To explore genetic and lifestyle risk factors of MRI-defined brain infarcts (BI) in large population-based cohorts. METHODS: We performed meta-analyses of genome-wide association studies (GWAS) and examined associations of vascular risk factors and their genetic risk scores (GRS) with MRI-defined BI and a subset of BI, namely, small subcortical BI (SSBI), in 18 population-based cohorts (n = 20,949) from 5 ethnicities (3,726 with BI, 2,021 with SSBI). Top loci were followed up in 7 population-based cohorts (n = 6,862; 1,483 with BI, 630 with SBBI), and we tested associations with related phenotypes including ischemic stroke and pathologically defined BI. RESULTS: The mean prevalence was 17.7% for BI and 10.5% for SSBI, steeply rising after age 65. Two loci showed genome-wide significant association with BI: FBN2, p = 1.77 × 10-8; and LINC00539/ZDHHC20, p = 5.82 × 10-9. Both have been associated with blood pressure (BP)-related phenotypes, but did not replicate in the smaller follow-up sample or show associations with related phenotypes. Age- and sex-adjusted associations with BI and SSBI were observed for BP traits (p value for BI, p [BI] = 9.38 × 10-25; p [SSBI] = 5.23 × 10-14 for hypertension), smoking (p [BI] = 4.4 × 10-10; p [SSBI] = 1.2 × 10-4), diabetes (p [BI] = 1.7 × 10-8; p [SSBI] = 2.8 × 10-3), previous cardiovascular disease (p [BI] = 1.0 × 10-18; p [SSBI] = 2.3 × 10-7), stroke (p [BI] = 3.9 × 10-69; p [SSBI] = 3.2 × 10-24), and MRI-defined white matter hyperintensity burden (p [BI] = 1.43 × 10-157; p [SSBI] = 3.16 × 10-106), but not with body mass index or cholesterol. GRS of BP traits were associated with BI and SSBI (p ≀ 0.0022), without indication of directional pleiotropy. CONCLUSION: In this multiethnic GWAS meta-analysis, including over 20,000 population-based participants, we identified genetic risk loci for BI requiring validation once additional large datasets become available. High BP, including genetically determined, was the most significant modifiable, causal risk factor for BI

    Automated analysis of high‐content microscopy data with deep learning

    No full text
    Abstract Existing computational pipelines for quantitative analysis of high‐content microscopy data rely on traditional machine learning approaches that fail to accurately classify more than a single dataset without substantial tuning and training, requiring extensive analysis. Here, we demonstrate that the application of deep learning to biological image data can overcome the pitfalls associated with conventional machine learning classifiers. Using a deep convolutional neural network (DeepLoc) to analyze yeast cell images, we show improved performance over traditional approaches in the automated classification of protein subcellular localization. We also demonstrate the ability of DeepLoc to classify highly divergent image sets, including images of pheromone‐arrested cells with abnormal cellular morphology, as well as images generated in different genetic backgrounds and in different laboratories. We offer an open‐source implementation that enables updating DeepLoc on new microscopy datasets. This study highlights deep learning as an important tool for the expedited analysis of high‐content microscopy data

    Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins

    Get PDF
    <div><p>Protein subcellular localization has been systematically characterized in budding yeast using fluorescently tagged proteins. Based on the fluorescence microscopy images, subcellular localization of many proteins can be classified automatically using supervised machine learning approaches that have been trained to recognize predefined image classes based on statistical features. Here, we present an unsupervised analysis of protein expression patterns in a set of high-resolution, high-throughput microscope images. Our analysis is based on 7 biologically interpretable features which are evaluated on automatically identified cells, and whose cell-stage dependency is captured by a continuous model for cell growth. We show that it is possible to identify most previously identified localization patterns in a cluster analysis based on these features and that similarities between the inferred expression patterns contain more information about protein function than can be explained by a previous manual categorization of subcellular localization. Furthermore, the inferred cell-stage associated to each fluorescence measurement allows us to visualize large groups of proteins entering the bud at specific stages of bud growth. These correspond to proteins localized to organelles, revealing that the organelles must be entering the bud in a stereotypical order. We also identify and organize a smaller group of proteins that show subtle differences in the way they move around the bud during growth. Our results suggest that biologically interpretable features based on explicit models of cell morphology will yield unprecedented power for pattern discovery in high-resolution, high-throughput microscopy images.</p></div

    Data from: Ellipsoid segmentation model for analyzing light-attenuated 3D confocal image stacks of fluorescent multi-cellular spheroids

    No full text
    In oncology, two-dimensional in-vitro culture models are the standard test beds for the discovery and development of cancer treatments, but in the last decades, evidence emerged that such models have low predictive value for clinical efficacy. Therefore they are increasingly complemented by more physiologically relevant 3D models, such as spheroid micro-tumor cultures. If suitable fluorescent labels are applied, confocal 3D image stacks can characterize the structure of such volumetric cultures and, for example, cell proliferation. However, several issues hamper accurate analysis. In particular, signal attenuation within the tissue of the spheroids prevents the acquisition of a complete image for spheroids over 100 micrometers in diameter. And quantitative analysis of large 3D image data sets is challenging, creating a need for methods which can be applied to large-scale experiments and account for impeding factors. We present a robust, computationally inexpensive 2.5D method for the segmentation of spheroid cultures and for counting proliferating cells within them. The spheroids are assumed to be approximately ellipsoid in shape. They are identified from information present in the Maximum Intensity Projection (MIP) and the corresponding height view, also known as Z-buffer. It alerts the user when potential bias-introducing factors cannot be compensated for and includes a compensation for signal attenuation

    S4_File: 3D image stack, LNCaP cancer spheroids in co-culture with CAFs, control sample (DMSO), well A05, field 3, stack 335

    No full text
    3D image stack of 3D fluorescent multi-cellular spheroid culture existing out of LNCaP human prostate cancer cells (ATCC, Rockville, USA) and CAF-PF179T human cancer associated fibroblasts (a cell line obtained from the Weizmann Institute, via the PREDECT consortium)

    Yeast cell identification.

    No full text
    <p>a) Shows the mother-bud assignment heuristic. Pairs of circular objects that reciprocally have largest and smallest sizes among neighboring areas are said to be ‘mother’ cells (indicated by M) and ‘bud’ cells (indicated by B, mother-bud pairs indicated by bidirectional arrows), unless the potential ‘bud’ cell has a smaller neighbor than itself (indicated by a unidirectional arrow). Any other cells are labelled as ‘lone’ cells (L). b) Example of low and high confidence objects. The cyan lines in each image represent the cell contours produced, and the white dots indicate the predicted bud neck position. The dashed objects represent obvious artifacts that were filtered using thresholds (See text for details). Objects on the edge of images were not automatically filtered out, but are expected to have low confidence.</p

    Time profile clustering result.

    No full text
    <p>A heatmap with 4004 GFP-tagged strains ordered using maximum likelihood agglomerative clustering based on the time profiles of protein abundance and 5 morphological measures. Within manually selected clusters (colored bars), the fraction of proteins in the cluster that have the same subcellular localization or GO Annotation (the latter indicated with stars) is listed under Fraction. Log p-values were computed using the hypergeometric distribution to test against the null hypothesis that the cluster was drawn randomly from the protein annotations. Fold enrichment indicates the ratio of the Fraction of proteins in the cluster with each annotation compared to that in the protein collection. Nuclear proteins appear in the bud at a specific time (dashed line).</p

    Morphological distances.

    No full text
    <p>a) Heatmap of the mean morphological distance features for each of the 3 cell classes automatically labelled: ‘bud’, ‘mother’ and ‘lone’ (columns indicated by ‘B’, ‘M’ and ‘L’ respectively). The proteins at the two extremes are enriched in cell periphery and nucleolus proteins. b) Three examples of the morphological distances extracted from the heatmap. Although the heatmap only shows the mean, we also compute the standard deviation (error bars). c) Examples of cells from the strains indicated in b). The spread of GFP fluorescence is greater than the RFP for the first three proteins, and less than RFP for the last three.</p
    • 

    corecore