554 research outputs found

    A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine

    Get PDF
    AbstractApoptosis proteins have a central role in the development and homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. Based on the idea of coarse-grained description and grouping in physics, a new feature extraction method with grouped weight for protein sequence is presented, and applied to apoptosis protein subcellular localization prediction associated with support vector machine. For the same training dataset and the same predictive algorithm, the overall prediction accuracy of our method in Jackknife test is 13.2% and 15.3% higher than the accuracy based on the amino acid composition and instability index. Especially for the else class apoptosis proteins, the increment of prediction accuracy is 41.7 and 33.3 percentile, respectively. The experiment results show that the new feature extraction method is efficient to extract the structure information implicated in protein sequence and the method has reached a satisfied performance despite its simplicity. The overall prediction accuracy of EBGW_SVM model on dataset ZD98 reach 92.9% in Jackknife test, which is 8.2–20.4 percentile higher than other existing models. For a new dataset ZW225, the overall prediction accuracy of EBGW_SVM achieves 83.1%. Those implied that EBGW_SVM model is a simple but efficient prediction model for apoptosis protein subcellular location prediction

    Methods for prediction of secondary structure in proteins

    Get PDF
    Zkoumání proteinových struktur má klíčový význam při zjišťování způsobů působení bílkovin v organizmu. Práce zpracovává problematiku 1D, 2D a 3D struktur, do kterých se proteiny v prostoru uspořádavají. Důraz je kladen na sekundární strukturu, kterou lze predikovat přímo ze sekvencí aminokyselin a následně ji využít pro odhad prostorové struktury. Tomuto postupu se věnují výpočetní metody, které pomocí algoritmů konvertují sled aminokyselin na sled sekundárních struktur. Přímým určením struktury, pomocí vytváření strukturních modelů, se zabývá část věnovaná experimentálním metodám (NMR spektroskopiie, RTG krystalografie). Hlavním cílem práce je programová realizace metody predikující sekundární strukturu proteinů. Vytvořený program je doplněn o grafické uživatelské rozhraní. Výsledky programu, navrženého na základě metody Chou-Fasman, jsou v závěrečné části práce porovnány s výstupy volně dostupných softwarů z internetu.The examination of protein structure is crucial in determining protein function in organism. This work deals with the issue of 1D, 2D and 3D structures, into which are proteins organized in space. Emphasis is placed on secondary structure, which can be predicted directly from the amino acid sequences and then used for the estimation of spatial structure. On this procedure are focused computational methods, using algorithms that convert the order of amino acids into the order of preferences for secondary structures. To direct determination of the structure by creating structural models is devoted chapter Experimental Methods (NMR spectroscopy, RTG crystallography). The main aim of this work is practical realization of protein secondary structure prediction method. The created program is supplemented by graphical user interface. In the final part the results of the program based on Chou- Fasman method are compared to the outputs of freely available softwares from the Internet.

    Early molecular insights into thanatin analogues binding to A. baumannii LptA

    Full text link
    The cationic antimicrobial ß-hairpin, thanatin, was recently developed into drug-like analogues active against carbapenem-resistant Enterobacteriaceae (CRE). The analogues represent new antibiotics with a novel mode of action targeting LptA in the periplasm and disrupting LPS transport. The compounds lose antimicrobial efficacy when the sequence identity to E. coli LptA falls below 70%. We wanted to test the thanatin analogues against LptA of a phylogenetic distant organism and investigate the molecular determinants of inactivity. Acinetobacter baumannii (A. baumannii) is a critical Gram-negative pathogen that has gained increasing attention for its multi-drug resistance and hospital burden. A. baumannii LptA shares 28% sequence identity with E. coli LptA and displays an intrinsic resistance to thanatin and thanatin analogues (MIC values > 32 μg/mL) through a mechanism not yet described. We investigated the inactivity further and discovered that these CRE- optimized derivatives can bind to LptA of A. baumannii in vitro, despite the high MIC values. Herein, we present a high-resolution structure of A. baumannii LptAm in complex with a thanatin derivative 7 and binding affinities of selected thanatin derivatives. Together, these data offer structural insights into why thanatin derivatives are inactive against A. baumannii LptA, despite binding events in vitro

    Characterisation of disulfide-rich peptides exploring potential wound healing properties

    Get PDF
    Rozita Takjoo explored the potential of peptides as wound healing agents. She identified a bioactive region involved in cell proliferation and provided insight into the evolution of a disulfide-rich motif. These outcomes are likely to facilitate future drug development studies aimed at developing novel wound healing agents for diabetic ulcers

    Median topographic maps for biomedical data sets

    Full text link
    Median clustering extends popular neural data analysis methods such as the self-organizing map or neural gas to general data structures given by a dissimilarity matrix only. This offers flexible and robust global data inspection methods which are particularly suited for a variety of data as occurs in biomedical domains. In this chapter, we give an overview about median clustering and its properties and extensions, with a particular focus on efficient implementations adapted to large scale data analysis

    Donor selection in a pediatric stem cell transplantation cohort using PIRCHE and HLA‐DPB1 typing

    Get PDF
    Background: New strategies to optimize donor selection for hematopoietic stem cell transplantation (HSCT) have mainly been evaluated in adults, but the disease spectrum requiring HSCT differs significantly in children and has consequences for the risk of complications, such as graft-versus-host disease (GvHD). Procedures: Here we evaluated whether HLA-DPB1 and Predicted Indirectly ReCognizable HLA-Epitope (PIRCHE) matching can improve donor selection and minimize risks specific for a pediatric cohort undergoing HSCT in Berlin between 2014 and 2016. Results: The percentage of HLA-DPB1–mismatched HSCT in the pediatric cohort was in line with the general distribution among matched unrelated donor HSCT. Nonpermissive HLA-DPB1 mismatches were not associated with a higher incidence of GvHD, but the incidence of relapse was higher in patients undergoing HSCT from HLA-DPB1–matched transplantations. High PIRCHE-I scores were associated with a significantly higher risk for developing GvHD in patients undergoing HSCT from nine of ten matched unrelated donors. This finding persisted after including HLA-DPB1 into the PIRCHE analysis. Conclusions: Implementing PIRCHE typing in the donor selection process for HSCT in children could particularly benefit children with nonmalignant diseases and support further validation of PIRCHE-based donor selection in a larger number of children treated at different sites

    Räumliche Statistik zur Analyse Chemischer Datensätze zur Validierung von Techniken des Virtuellen Screenings

    Get PDF
    A common finding of many reports evaluating virtual screening methods is that validation results vary considerably with changing benchmark datasets. It is widely assumed that these effects are caused by the redundancy and cluster structure inherent to those datasets. These phenomena manifest themselves in descriptor space, which is termed the dataset topology. A methodology for the characterization of dataset topology based on spatial statistics is introduced. With this methodology it is possible to associate differences in virtual screening performance on different datasets with differences in dataset topology. Moreover, the better virtual screening performance of certain descriptors can be explained by their ability of representing the benchmark datasets by a more favorable topology. It is shown, that the composition of some benchmark datasets causes topologies that lead to over-optimistic validation results even in very "simple" descriptor spaces. Spatial statistics analysis as proposed here facilitates the detection of such biased datasets and provides a tool for the design of unbiased benchmark datasets. General principles for the design of benchmark datasets, which are not affected by topological bias, were developed. Refined Nearest Neighbor Analysis was used to design benchmark datasets based on PubChem bioactivity data. A workflow is devised that purges datasets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies was applied to generate corresponding datasets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These datasets provide a tool for an Maximum Unbiased Validation (MUV) of virtual screening methods. The datasets and a MATLAB toolbox for spatial statistics are freely available on the enclosed CD-ROM or via the internet at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.Ein Ergebnis vieler Arbeiten zur Validierung von Methoden des Virtuellen Screenings ist, dass die Ergebnisse stark von den Validierdatensätzen abhängen. Es wird angenommen, dass diese Effekte durch die Redundanz und Clusterstruktur der Datensätze verursacht werden. Die Abbildung eines Datensatzes im Deskriptorraum, die ``Datensatztopologie'' , spiegelt diese Phänomene wider. Im Rahmen der Arbeit wird eine Methode aus dem Bereich der räumlichen Statistik zur Charakterisierung der Datensatztopologie eingeführt. Mit dieser Methode ist es möglich, Unterschiede in den Ergebnissen von Validierexperimenten mit Unterschieden in der Datensatztopologie zu erklären. Darüberhinaus kann das bessere Abschneiden einiger Deskriptoren mit deren Fähigkeit erklärt werden, günstigere Topologien zu erzeugen. Die Zusammensetzung mancher Validierdatensätze bedingt Topologien, die zu überoptimistischen Validierergebnissen führen. Die vorgestellte Methodik ermöglicht es, solche Datensätze vor der Validierung zu erkennen. Weiterhin kann die Methode verwendet werden, um zielgerichtet Datensätze zu konstruieren, die unverfälschte Validierergebnisse sicherstellen. Auf diesen Ergebnissen aufbauend werden generelle Kriterien für die Konstruktion von Validierdatensätzen entwickelt. Mit Hilfe von Methoden der ``Refined Nearest Neighbor Analysis” werden verzerrungsfreie Datesätze generiert. Als Basis dienen Datensätze von Substanzen mit Bioaktivität aus PubChem. Ein neu entwickeltes Verfahren ermöglicht es, Substanzen mit unspezifischer Bioaktivität aus diesen Datensätzen zu entfernen. Durch Optimierung der Datensatztopologie werden korrespondierende Datensätze von Aktiven und Inaktiven erstellt, die eine Maximal Unverfälschte Validierung (MUV) von Techniken des Virtuellen Screenings ermöglichen. Diese Datensätze und eine MATLAB Toolbox für räumliche Statistik sind auf der beiliegenden CD-ROM oder im Internet unter http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html frei verfügbar

    Uncovering Intratumoral And Intertumoral Heterogeneity Among Single-Cell Cancer Specimens

    Get PDF
    While several tools have been developed to map axes of variation among individual cells, no analogous approaches exist for identifying axes of variation among multicellular biospecimens profiled at single-cell resolution. Developing such an approach is of great translational relevance and interest, as single-cell expression data are now often collected across numerous experimental conditions (e.g., representing different drug perturbation conditions, CRISPR knockdowns, or patients undergoing clinical trials) that need to be compared. In this work, “Phenotypic Earth Mover\u27s Distance” (PhEMD) is presented as a solution to this problem. PhEMD is a general method for embedding a “manifold of manifolds,” in which each datapoint in the higher-level manifold (of biospecimens) represents a collection of points that span a lower-level manifold (of cells). PhEMD is applied to a newly-generated, 300-biospecimen mass cytometry drug screen experiment to map small-molecule inhibitors based on their differing effects on breast cancer cells undergoing epithelial–mesenchymal transition (EMT). These experiments highlight EGFR and MEK1/2 inhibitors as strongly halting EMT at an early stage and PI3K/mTOR/Akt inhibitors as enriching for a drug-resistant mesenchymal cell subtype characterized by high expression of phospho-S6. More generally, these experiments reveal that the final mapping of perturbation conditions has low intrinsic dimension and that the network of drugs demonstrates manifold structure, providing insight into how these single-cell experiments should be computational modeled and visualized. In the presented drug-screen experiment, the full spectrum of perturbation effects could be learned by profiling just a small fraction (11%) of drugs. Moreover, PhEMD could be integrated with complementary datasets to infer the phenotypes of biospecimens not directly profiled with single-cell profiling. Together, these findings have major implications for conducting future drug-screen experiments, as they suggest that large-scale drug screens can be conducted by measuring only a small fraction of the drugs using the most expensive high-throughput single-cell technologies—the effects of other drugs may be inferred by mapping and extending the perturbation space. PhEMD is also applied to patient tumor biopsies to assess intertumoral heterogeneity. Applied to a melanoma dataset and a clear-cell renal cell carcinoma dataset (ccRCC), PhEMD maps tumors similarly to how it maps perturbation conditions as above in order to learn key axes along which tumors vary with respect to their tumor-infiltrating immune cells. In both of these datasets, PhEMD highlights a subset of tumors demonstrating a marked enrichment of exhausted CD8+ T-cells. The wide variability in tumor-infiltrating immune cell abundance and particularly prominent exhausted CD8+ T-cell subpopulation highlights the importance of careful patient stratification when assessing clinical response to T cell-directed immunotherapies. Altogether, this work highlights PhEMD’s potential to facilitate drug discovery and patient stratification efforts by uncovering the network geometry of a large collection of single-cell biospecimens. Our varied experiments demonstrate that PhEMD is highly scalable, compatible with leading batch effect correction techniques, and generalizable to multiple experimental designs, with clear applicability to modern precision oncology efforts

    Self-organization of polymers in bulk and at interfaces

    Get PDF
    Fully atomistic analysis of polymeric systems is computationally very demanding because the time and length scales involved span over several orders of magnitude. At the same time many properties of polymers are universal in the sense that they do not depend on the chemical nature of the comprising monomers. This makes coarse-grained methods, such as self-consistent field (SCF) modeling, an ideal tool for studying them. In this thesis we employ SCF modeling to study intra- and intermolecular self-organization organization of polymers and ordering of polymers near interfaces. Where possible, the results are compared to experiments and predictions of analytical theories
    corecore