244 research outputs found

    A Multi-Label Predictor for Identifying the Subcellular Locations of Singleplex and Multiplex Eukaryotic Proteins

    Get PDF
    Subcellular locations of proteins are important functional attributes. An effective and efficient subcellular localization predictor is necessary for rapidly and reliably annotating subcellular locations of proteins. Most of existing subcellular localization methods are only used to deal with single-location proteins. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. To better reflect characteristics of multiplex proteins, it is highly desired to develop new methods for dealing with them. In this paper, a new predictor, called Euk-ECC-mPLoc, by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and hybridizing gene ontology with dipeptide composition information, has been developed that can be used to deal with systems containing both singleplex and multiplex eukaryotic proteins. It can be utilized to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centrosome, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome, (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Experimental results on a stringent benchmark dataset of eukaryotic proteins by jackknife cross validation test show that the average success rate and overall success rate obtained by Euk-ECC-mPLoc were 69.70% and 81.54%, respectively, indicating that our approach is quite promising. Particularly, the success rates achieved by Euk-ECC-mPLoc for small subsets were remarkably improved, indicating that it holds a high potential for simulating the development of the area. As a user-friendly web-server, Euk-ECC-mPLoc is freely accessible to the public at the website http://levis.tongji.edu.cn:8080/bioinfo/Euk-ECC-mPLoc/. We believe that Euk-ECC-mPLoc may become a useful high-throughput tool, or at least play a complementary role to the existing predictors in identifying subcellular locations of eukaryotic proteins

    Prediction of Protein Domain with mRMR Feature Selection and Analysis

    Get PDF
    The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28–40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine

    STM-induced light emission from thin films of perylene derivatives on the HOPG and Au substrates

    Get PDF
    We have investigated the emission properties of N,N'-diheptyl-3,4,9,10-perylenetetracarboxylic diimide thin films by the tunneling-electron-induced light emission technique. A fluorescence peak with vibronic progressions with large Stokes shifts was observed on both highly ordered pyrolytic graphite (HOPG) and Au substrates, indicating that the emission was derived from the isolated-molecule-like film condition with sufficient π-π interaction of the perylene rings of perylenetetracarboxylic diimide molecules. The upconversion emission mechanism of the tunneling-electron-induced emission was discussed in terms of inelastic tunneling including multiexcitation processes. The wavelength-selective enhanced emission due to a localized tip-induced surface plasmon on the Au substrate was also obtained

    A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency

    Get PDF
    BackgroundOncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance.ResultsIn reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100x more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels.ConclusionThese new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.Peer reviewe

    Heterogeneity of mammary lesions represent molecular differences

    Get PDF
    BACKGROUND: Human breast cancer is a heterogeneous disease, histopathologically, molecularly and phenotypically. The molecular basis of this heterogeneity is not well understood. We have used a mouse model of DCIS that consists of unique lines of mammary intraepithelial neoplasia (MIN) outgrowths, the premalignant lesion in the mouse that progress to invasive carcinoma, to understand the molecular changes that are characteristic to certain phenotypes. Each MIN-O line has distinguishable morphologies, metastatic potentials and estrogen dependencies. METHODS: We utilized oligonucleotide expression arrays and high resolution array comparative genomic hybridization (aCGH) to investigate whole genome expression patterns and whole genome aberrations in both the MIN-O and tumor from four different MIN-O lines that each have different phenotypes. From the whole genome analysis at 35 kb resolution, we found that chromosome 1, 2, 10, and 11 were frequently associated with whole chromosome gains in the MIN-Os. In particular, two MIN-O lines had the majority of the chromosome gains. Although we did not find any whole chromosome loss, we identified 3 recurring chromosome losses (2F1-2, 3E4, 17E2) and two chromosome copy number gains on chromosome 11. These interstitial deletions and duplications were verified with a custom made array designed to interrogate the specific regions at approximately 550 bp resolution. RESULTS: We demonstrated that expression and genomic changes are present in the early premalignant lesions and that these molecular profiles can be correlated to phenotype (metastasis and estrogen responsiveness). We also identified expression changes associated with genomic instability. Progression to invasive carcinoma was associated with few additional changes in gene expression and genomic organization. Therefore, in the MIN-O mice, early premalignant lesions have the major molecular and genetic changes required and these changes have important phenotypic significance. In contrast, the changes that occur in the transition to invasive carcinoma are subtle, with few consistent changes and no association with phenotype. CONCLUSION: We propose that the early lesions carry the important genetic changes that reflect the major phenotypic information, while additional genetic changes that accumulate in the invasive carcinoma are less associated with the overall phenotype

    Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property

    Get PDF
    Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area

    Cross-oncopanel study reveals high sensitivity and accuracy with overall analytical performance depending on genomic regions

    Get PDF
    BackgroundTargeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing.ResultsAll panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden.ConclusionThis comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.Peer reviewe
    • …
    corecore