49 research outputs found

    Fuzzy clustering of CPP family in plants with evolution and interaction analyses

    Get PDF
    Background: Transcription factors have been studied intensively because they play an important role in gene expression regulation. However, the transcription factors in the CPP family (cystein-rich polycomb-like protein), compared with other transcription factor families, have not received sufficient attention, despite their wide prevalence in a broad spectrum of species, from plants to animals. The total number of known CPP transcription factors in plants is 111 from 16 plants, but only 2 of them have been studied so far, namely TSO1 and CPP1 in Arabidopsis thaliana and soybean, respectively. Methods: In this work, to study their functions, we applied the fuzzy clustering method to all plant CPP transcription factors. The feature vector of each protein sequence for the fuzzy clustering method is encoded by the short length peptides and the combination of functional domain models. Results and conclusions: With the fuzzy clustering method, all plant CPP transcription factors are grouped into two subfamilies. A systems approach, including Expressed Sequence Tag analysis, evolutionary analysis, proteinprotein interaction network analysis and co-expression analysis, is employed to validate the clustering results, the results of which also indicates that the transcription factors from different subfamilies show uncorrelated responses

    New methods to measure residues coevolution in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The covariation of two sites in a protein is often used as the degree of their coevolution. To quantify the covariation many methods have been developed and most of them are based on residues position-specific frequencies by using the mutual information (MI) model.</p> <p>Results</p> <p>In the paper, we proposed several new measures to incorporate new biological constraints in quantifying the covariation. The first measure is the mutual information with the amino acid background distribution (MIB), which incorporates the amino acid background distribution into the marginal distribution of the MI model. The modification is made to remove the effect of amino acid evolutionary pressure in measuring covariation. The second measure is the mutual information of residues physicochemical properties (MIP), which is used to measure the covariation of physicochemical properties of two sites. The third measure called MIBP is proposed by applying residues physicochemical properties into the MIB model. Moreover, scores of our new measures are applied to a robust indicator <it>conn(k) </it>in finding the covariation signal of each site.</p> <p>Conclusions</p> <p>We find that incorporating amino acid background distribution is effective in removing the effect of evolutionary pressure of amino acids. Thus the MIB measure describes more biological background information for the coevolution of residues. Besides, our analysis also reveals that the covariation of physicochemical properties is a new aspect of coevolution information.</p

    Hard-Aware Point-to-Set Deep Metric for Person Re-identification

    Full text link
    Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion. Deep metric learning provides a satisfactory solution to person re-ID by training a deep network under supervision of metric loss, e.g., triplet loss. However, the performance of deep metric learning is greatly limited by traditional sampling methods. To solve this problem, we propose a Hard-Aware Point-to-Set (HAP2S) loss with a soft hard-mining scheme. Based on the point-to-set triplet loss framework, the HAP2S loss adaptively assigns greater weights to harder samples. Several advantageous properties are observed when compared with other state-of-the-art loss functions: 1) Accuracy: HAP2S loss consistently achieves higher re-ID accuracies than other alternatives on three large-scale benchmark datasets; 2) Robustness: HAP2S loss is more robust to outliers than other losses; 3) Flexibility: HAP2S loss does not rely on a specific weight function, i.e., different instantiations of HAP2S loss are equally effective. 4) Generality: In addition to person re-ID, we apply the proposed method to generic deep metric learning benchmarks including CUB-200-2011 and Cars196, and also achieve state-of-the-art results.Comment: Accepted to ECCV 201

    Proteogenomic characterization of endometrial carcinoma

    Get PDF
    We undertook a comprehensive proteogenomic characterization of 95 prospectively collected endometrial carcinomas, comprising 83 endometrioid and 12 serous tumors. This analysis revealed possible new consequences of perturbations to the p53 and Wnt/β-catenin pathways, identified a potential role for circRNAs in the epithelial-mesenchymal transition, and provided new information about proteomic markers of clinical and genomic tumor subgroups, including relationships to known druggable pathways. An extensive genome-wide acetylation survey yielded insights into regulatory mechanisms linking Wnt signaling and histone acetylation. We also characterized aspects of the tumor immune landscape, including immunogenic alterations, neoantigens, common cancer/testis antigens, and the immune microenvironment, all of which can inform immunotherapy decisions. Collectively, our multi-omic analyses provide a valuable resource for researchers and clinicians, identify new molecular associations of potential mechanistic significance in the development of endometrial cancers, and suggest novel approaches for identifying potential therapeutic targets

    L1pred: A Sequence-Based Prediction Tool for Catalytic Residues in Enzymes with the L1-logreg Classifier

    Get PDF
    To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance

    Frozen tissue coring and layered histological analysis improves cell type-specific proteogenomic characterization of pancreatic adenocarcinoma

    Get PDF
    Abstract Background Omics characterization of pancreatic adenocarcinoma tissue is complicated by the highly heterogeneous and mixed populations of cells. We evaluate the feasibility and potential benefit of using a coring method to enrich specific regions from bulk tissue and then perform proteogenomic analyses. Methods We used the Biopsy Trifecta Extraction (BioTExt) technique to isolate cores of epithelial-enriched and stroma-enriched tissue from pancreatic tumor and adjacent tissue blocks. Histology was assessed at multiple depths throughout each core. DNA sequencing, RNA sequencing, and proteomics were performed on the cored and bulk tissue samples. Supervised and unsupervised analyses were performed based on integrated molecular and histology data. Results Tissue cores had mixed cell composition at varying depths throughout. Average cell type percentages assessed by histology throughout the core were better associated with KRAS variant allele frequencies than standard histology assessment of the cut surface. Clustering based on serial histology data separated the cores into three groups with enrichment of neoplastic epithelium, stroma, and acinar cells, respectively. Using this classification, tumor overexpressed proteins identified in bulk tissue analysis were assigned into epithelial- or stroma-specific categories, which revealed novel epithelial-specific tumor overexpressed proteins. Conclusions Our study demonstrates the feasibility of multi-omics data generation from tissue cores, the necessity of interval H&E stains in serial histology sections, and the utility of coring to improve analysis over bulk tissue data

    Identification of RNA silencing components in soybean and sorghum

    Get PDF
    Background: RNA silencing is a process triggered by 21–24 small RNAs to repress gene expression. Many organisms including plants use RNA silencing to regulate development and physiology, and to maintain genome stability. Plants possess two classes of small RNAs: microRNAs (miRNAs) and small interfering RNAs (siRNAs). The frameworks of miRNA and siRNA pathways have been established in the model plant, Arabidopsis thaliana (Arabidopsis). Results: Here we report the identification of putative genes that are required for the generation and function of miRNAs and siRNAs in soybean and sorghum, based on knowledge obtained from Arabidopsis. The gene families, including DCL, HEN1, SE, HYL1, HST, RDR, NRPD1, NRPD2/NRPE2, NRPE1, and AGO, were analyzed for gene structures, phylogenetic relationships, and protein motifs. The gene expression was validated using RNA-seq, expressed sequence tags (EST), and reverse transcription PCR (RT-PCR). Conclusions: The identification of these components could provide not only insight into RNA silencing mechanism in soybean and sorghum but also basis for further investigation. All data are available at http://sysbio.unl.edu/

    Regulation of miRNA abundance by RNA binding protein TOUGH in \u3ci\u3eArabidopsis\u3c/i\u3e

    Get PDF
    MicroRNAs (miRNAs) are regulators of gene expression in plants and animals. The biogenesis of miRNAs is precisely controlled to secure normal development of organisms. Here we report that TOUGH (TGH) is a component of the DCL1–HYL1–SERRATE complex that processes primary transcripts of miRNAs [i.e., primary miRNAs (pri-miRNAs)] into miRNAs in Arabidopsis. Lack of TGH impairs multiple DCL activities in vitro and reduces the accumulation of miRNAs and siRNAs in vivo. TGH is an RNA-binding protein, binds pri-miRNAs and precursor miRNAs in vivo, and contributes to pri-miRNA–HYL1 interaction. These results indicate that TGH might regulate abundance of miRNAs through promoting DCL1 cleavage efficiency and/or recruitment of pri-miRNAs. Includes Supporting Information
    corecore