38 research outputs found

    The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009.</p> <p>Results</p> <p>Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs.</p> <p>Conclusions</p> <p>Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.</p

    Building a molecular glyco-phenotype ontology to decipher undiagnosed diseases

    Get PDF
    Abstract-Hundreds of rare diseases are due to mutation on genes related to glycans synthesis, degradation or recognition. These glycan-related defects are well described in the literature but largely absent in ontologies and databases of chemical entities and phenotypes, limiting the application of computational methods and ontology-driven tools for characterization and discovery of glycan related diseases. We are curating articles and textbooks in glycobiology related to genetic diseases to inform the content and the structure of an ontology of Molecular GlycoPhenotypes (MGPO). MGPO will be applied toward use cases including disease diagnosis and disease gene candidate prioritization, using semantic similarity and pattern matching at the glycan level with glycomics data from patient of the Undiagnosed Diseases Network

    Drug Repositioning for Congenital Disorders of Glycosylation (CDG)

    Get PDF
    R.F. and acknowledge the funding from the Fundação para a CiĂȘncia e Tecnologia (FCT), Portugal. S.B. was supported by CDG & Allies—PAIN funding. M.A. acknowledges PhD program at the DISTABIF, UniversitĂ  degli Studi della Campania “Luigi Vanvitelli”, PhD fellowship POR Campania FSE 2014/2020 “Dottorati di Ricerca Con Caratterizzazione Industriale”.Advances in research have boosted therapy development for congenital disorders of glycosylation (CDG), a group of rare genetic disorders affecting protein and lipid glycosylation and glycosylphosphatidylinositol anchor biosynthesis. The (re)use of known drugs for novel medical purposes, known as drug repositioning, is growing for both common and rare disorders. The latest innovation concerns the rational search for repositioned molecules which also benefits from artificial intelligence (AI). Compared to traditional methods, drug repositioning accelerates the overall drug discovery process while saving costs. This is particularly valuable for rare diseases. AI tools have proven their worth in diagnosis, in disease classification and characterization, and ultimately in therapy discovery in rare diseases. The availability of biomarkers and reliable disease models is critical for research and development of new drugs, especially for rare and heterogeneous diseases such as CDG. This work reviews the literature related to repositioned drugs for CDG, discovered by serendipity or through a systemic approach. Recent advances in biomarkers and disease models are also outlined as well as stakeholders' views on AI for therapy discovery in CDG.publishersversionpublishe

    The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*

    Get PDF
    Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies

    DEEP LEARNING METHODS FOR PREDICTION OF AND ESCAPE FROM PROTEIN RECOGNITION

    Get PDF
    Protein interactions drive diverse processes essential to living organisms, and thus numerous biomedical applications center on understanding, predicting, and designing how proteins recognize their partners. While unfortunately the number of interactions of interest still vastly exceeds the capabilities of experimental determination methods, computational methods promise to fill the gap. My thesis pursues the development and application of computational methods for several protein interaction prediction and design tasks. First, to improve protein-glycan interaction specificity prediction, I developed GlyBERT, which learns biologically relevant glycan representations encapsulating the components most important for glycan recognition within their structures. GlyBERT encodes glycans with a branched biochemical language and employs an attention-based deep language model to embed the correlation between local and global structural contexts. This approach enables the development of predictive models from limited data, supporting applications such as lectin binding prediction. Second, to improve protein-protein interaction prediction, I developed a unified geometric deep neural network, ‘PInet’ (Protein Interface Network), which leverages the best properties of both data- and physics-driven methods, learning and utilizing models capturing both geometrical and physicochemical molecular surface complementarity. In addition to obtaining state-of-the-art performance in predicting protein-protein interactions, PInet can serve as the backbone for other protein-protein interaction modeling tasks such as binding affinity prediction. Finally, I turned from ii prediction to design, addressing two important tasks in the context of antibodyantigen recognition. The first problem is to redesign a given antigen to evade antibody recognition, e.g., to help biotherapeutics avoid pre-existing immunity or to focus vaccine responses on key portions of an antigen. The second problem is to design a panel of variants of a given antigen to use as “bait” in experimental identification of antibodies that recognize different parts of the antigen, e.g., to support classification of immune responses or to help select among different antibody candidates. I developed a geometry-based algorithm to generate variants to address these design problems, seeking to maximize utility subject to experimental constraints. During the design process, the algorithm accounts for and balances the effects of candidate mutations on antibody recognition and on antigen stability. In retrospective case studies, the algorithm demonstrated promising precision, recall, and robustness of finding good designs. This work represents the first algorithm to systematically design antigen variants for characterization and evasion of polyclonal antibody responses

    Integrating Protein Data Resources through Semantic Web Services

    Get PDF
    Understanding the function of every protein is one major objective of bioinformatics. Currently, a large amount of information (e.g., sequence, structure and dynamics) is being produced by experiments and predictions that are associated with protein function. Integrating these diverse data about protein sequence, structure, dynamics and other protein features allows further exploration and establishment of the relationships between protein sequence, structure, dynamics and function, and thereby controlling the function of target proteins. However, information integration in protein data resources faces challenges at technology level for interfacing heterogeneous data formats and standards and at application level for semantic interpretation of dissimilar data and queries. In this research, a semantic web services infrastructure, called Web Services for Protein data resources (WSP), for flexible and user-oriented integration of protein data resources, is proposed. This infrastructure includes a method for modeling protein web services, a service publication algorithm, an efficient service discovery (matching) algorithm, and an optimal service chaining algorithm. Rather than relying on syntactic matching, the matching algorithm discovers services based on their similarity to the requested service. Therefore, users can locate services that semantically match their data requirements even if they are syntactically distinctive. Furthermore, WSP supports a workflow-based approach for service integration. The chaining algorithm is used to select and chain services, based on the criteria of service accuracy and data interoperability. The algorithm generates a web services workflow which automatically integrates the results from individual services.A number of experiments are conducted to evaluate the performance of the matching algorithm. The results reveal that the algorithm can discover services with reasonable performance. Also, a composite service, which integrates protein dynamics and conservation, is experimented using the WSP infrastructure

    Integrative methods for analysing big data in precision medicine

    Get PDF
    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    Prediction of bladder cancer treatment side effects using an ontology-based reasoning for enhanced patient health safety

    Get PDF
    Predicting potential cancer treatment side effects at time of prescription could decrease potential health risks and achieve better patient satisfaction. This paper presents a new approach, founded on evidence-based medical knowledge, using as much information and proof as possible to help a computer program to predict bladder cancer treatment side effects and support the oncologist’s decision. This will help in deciding treatment options for patients with bladder malignancies. Bladder cancer knowledge is complex and requires simplification before any attempt to represent it in a formal or computerized manner. In this work we rely on the capabilities of OWL ontologies to seamlessly capture and conceptualize the required knowledge about this type of cancer and the underlying patient treatment process. Our ontology allows case-based reasoning to effectively predict treatment side effects for a given set of contextual information related to a specific medical case. The ontology is enriched with proofs and evidence collected from online biomedical research databases using “web crawlers”. We have exclusively designed the crawler algorithm to search for the required knowledge based on a set of specified keywords. Results from the study presented 80.3% of real reported bladder cancer treatment side-effects prediction and were close to really occurring adverse events recorded within the collected test samples when applying the approach. Evidence-based medicine combined with semantic knowledge-based models is prominent in generating predictions related to possible health concerns. The integration of a diversity of knowledge and evidence into one single integrated knowledge-base could dramatically enhance the process of predicting treatment risks and side effects applied to bladder cancer oncotherapy
    corecore