18 research outputs found

    The High Throughput Sequence Annotation Service (HT-SAS) – the shortcut from sequence to true Medline words

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Advances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts. Ordering, managing and describing these raw results is the first step which allows facts to become knowledge. Currently there are limited ways to automatically annotate such data, especially utilizing information deposited in published literature.</p> <p>Results</p> <p>To aid researchers in describing results from high-throughput experiments we developed HT-SAS, a web service for automatic annotation of proteins using general English words. For each protein a poll of Medline abstracts connected to homologous proteins is gathered using the UniProt-Medline link. Overrepresented words are detected using binomial statistics approximation. We tested our automatic approach with a protein test set from SGD to determine the accuracy and usefulness of our approach. We also applied the automatic annotation service to improve annotations of proteins from <it>Plasmodium bergei </it>expressed exclusively during the blood stage.</p> <p>Conclusion</p> <p>Using HT-SAS we created new, or enriched already established annotations for over 20% of proteins from <it>Plasmodium bergei </it>expressed in the blood stage, deposited in PlasmoDB. Our tests show this approach to information extraction provides highly specific keywords, often also when the number of abstracts is limited. Our service should be useful for manual curators, as a complement to manually curated information sources and for researchers working with protein datasets, especially from poorly characterized organisms.</p

    Mammalian DNA methyltransferases

    No full text
    DNA methylation is an epigenetic process affecting gene expression and chromatin organization. It can heritably silence or activate transcription of genes without any change in their nucleotide sequences, and for a long time was not recognized as an important regulatory mechanism. However, during the recent years it has been shown that improper methylation, especially hypermethylation of promoter regions, is observed in nearly all steps of tumorigenesis. Aberrant methylation is also the cause of several major pathologies including developmental disorders involving chromosome instabilities and mental retardation. A great progress has been made in our understanding of the enzymatic machinery involved in establishing and maintaining methylation patterns. This allowed for the development of new diagnostic tools and epigenetic treatment therapies. The new approaches hold a great potential; several inhibitors of DNA methyltransferases have already shown very promising therapeutic effects

    Development of a Protein-Ligand Extended Connectivity (PLEC) Fingerprint and Its Application for Binding Affinity Predictions.

    No full text
    Fingerprints (FPs) are the most common small molecule representation in cheminformatics. There are a wide variety of fingerprints, and the Extended Connectivity Fingerprint (ECFP) is one of the best-suited for general applications. Despite the overall FP abundance, only a few FPs represent the 3D structure of the molecule, and hardly any encode protein-ligand interactions. Here, we present a Protein-Ligand Extended Connectivity (PLEC) fingerprint that implicitly encodes protein-ligand interactions by pairing the ECFP environments from the ligand and the protein. PLEC fingerprints were used to construct different machine learning (ML) models tailored for predicting protein-ligand affinities (pKi/d). Even the simplest linear model built on the PLEC fingerprint achieved Rp=0.83 on the PDBbind v2016 "core set”, demonstrating its descriptive power. The PLEC fingerprint has been implemented in the Open Drug Discovery Toolkit (https://github.com/oddt/oddt).</div

    Development of a Protein-Ligand Extended Connectivity (PLEC) Fingerprint and Its Application for Binding Affinity Predictions.

    No full text
    <div>Fingerprints (FPs) are the most common small molecule representation in cheminformatics. There are a wide variety of fingerprints, and the Extended Connectivity Fingerprint (ECFP) is one of the best-suited for general applications. Despite the overall FP abundance, only a few FPs represent the 3D structure of the molecule, and hardly any encode protein-ligand interactions. Here, we present a Protein-Ligand Extended Connectivity (PLEC) fingerprint that implicitly encodes protein-ligand interactions by pairing the ECFP environments from the ligand and the protein. PLEC fingerprints were used to construct different machine learning (ML) models tailored for predicting protein-ligand affinities (pK<sub>i/d</sub>). Even the simplest linear model built on the PLEC fingerprint achieved R<sub>p</sub>=0.83 on the PDBbind v2016 "core set”, demonstrating its descriptive power. The PLEC fingerprint has been implemented in the Open Drug Discovery Toolkit (https://github.com/oddt/oddt).</div
    corecore