16 research outputs found

    CASSANDRA: drug gene association prediction via text mining and ontologies

    Get PDF
    The amount of biomedical literature has been increasing rapidly during the last decade. Text mining techniques can harness this large-scale data, shed light onto complex drug mechanisms, and extract relation information that can support computational polypharmacology. In this work, we introduce CASSANDRA, a fully corpus-based and unsupervised algorithm which uses the MEDLINE indexed titles and abstracts to infer drug gene associations and assist drug repositioning. CASSANDRA measures the Pointwise Mutual Information (PMI) between biomedical terms derived from Gene Ontology (GO) and Medical Subject Headings (MeSH). Based on the PMI scores, drug and gene profiles are generated and candidate drug gene associations are inferred when computing the relatedness of their profiles. Results show that an Area Under the Curve (AUC) of up to 0.88 can be achieved. The algorithm can successfully identify direct drug gene associations with high precision and prioritize them over indirect drug gene associations. Validation shows that the statistically derived profiles from literature perform as good as (and at times better than) the manually curated profiles. In addition, we examine CASSANDRA’s potential towards drug repositioning. For all FDA-approved drugs repositioned over the last 5 years, we generate profiles from publications before 2009 and show that the new indications rank high in these profiles. In summary, co-occurrence based profiles derived from the biomedical literature can accurately predict drug gene associations and provide insights onto potential repositioning cases

    SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis

    Get PDF
    Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality

    Maternal and neonatal data collection systems in low- and middle-income countries: Scoping review protocol

    Get PDF
    Background: Pregnant women and neonates represent one of the most vulnerable groups, especially in low- and middle-income countries (LMICs). A recent analysis reported that most vaccine pharmacovigilance systems in LMICs consist of spontaneous (passive) adverse event reporting. Thus, LMICs need effective active surveillance approaches, such as pregnancy registries. We intend to identify currently active maternal and neonatal data collection systems in LMICs, with the potential to inform active safety electronic surveillance for novel vaccines using standardized definitions. Methods: A scoping review will be conducted based on established methodology. Multiple databases of indexed and grey literature will be searched with a specific focus on existing electronic and paper-electronic systems in LMICs that collect continuous, prospective, and individual-level data from antenatal care, delivery, neonatal care (up to 28 days), and postpartum (up to 42 days) at the facility and community level, at the national and district level, and at large hospitals. Also, experts will be contacted to identify unpublished information on relevant data collection systems. General and specific descriptions of Health Information Systems (HIS) extracted from the different sources will be combined and duplicated HIS will be removed, producing a list of unique statements. We will present a final list of Maternal, Newborn, and Child Health systems considered flexible enough to be updated with necessary improvements to detect, assess and respond to safety concerns during the introduction of vaccines and other maternal health interventions. Selected experts will participate in an in-person consultation meeting to select up to three systems to be further explored in situ. Results and knowledge gaps will be synthesized after expert consultation.Fil: Berrueta, Mabel. Instituto de Efectividad ClĂ­nica y Sanitaria; ArgentinaFil: Bardach, Ariel Esteban. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Parque Centenario. Centro de Investigaciones en EpidemiologĂ­a y Salud PĂșblica. Instituto de Efectividad ClĂ­nica y Sanitaria. Centro de Investigaciones en EpidemiologĂ­a y Salud PĂșblica; ArgentinaFil: Ciapponi, AgustĂ­n. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Parque Centenario. Centro de Investigaciones en EpidemiologĂ­a y Salud PĂșblica. Instituto de Efectividad ClĂ­nica y Sanitaria. Centro de Investigaciones en EpidemiologĂ­a y Salud PĂșblica; ArgentinaFil: Xiong, Xu. University of Tulane; Estados UnidosFil: Stergachis, Andy. University of Washington; Estados UnidosFil: Zaraa, Sabra. University of Washington; Estados UnidosFil: Buekens, Pierre. University of Tulane; Estados UnidosFil: Absalon, Judith. No especifĂ­ca;Fil: Anderson, Steve. No especifĂ­ca;Fil: Althabe, Fernando. Instituto de Efectividad ClĂ­nica y Sanitaria; Argentina. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas; ArgentinaFil: Madhi, Shabir A.. No especifĂ­ca;Fil: McClure, Elizabeth. No especifĂ­ca;Fil: Munoz, Flor M.. No especifĂ­ca;Fil: Mwamwitwa, Kissa W.. No especifĂ­ca;Fil: Nakimuli, Annettee. No especifĂ­ca;Fil: Clark Nelson, Jennifer. No especifĂ­ca;Fil: Noguchi, Lisa. No especifĂ­ca;Fil: Panagiotakopoulos, Lakshmi. No especifĂ­ca;Fil: Sevene, Esperanca. No especifĂ­ca;Fil: Zuber, Patrick. No especifĂ­ca;Fil: Belizan, Maria. No especifĂ­ca;Fil: Bergel, Eduardo. No especifĂ­ca;Fil: Rodriguez Cairoli, Federico. No especifĂ­ca;Fil: Castellanos, Fabricio. No especifĂ­ca;Fil: Ciganda, Alvaro. No especifĂ­ca;Fil: Comande, Daniel. No especifĂ­ca;Fil: Pingray, Veronica. No especifĂ­ca

    Erythroid/myeloid progenitors and hematopoietic stem cells originate from distinct populations of endothelial cells

    Get PDF
    SummaryHematopoietic stem cells (HSCs) and an earlier wave of definitive erythroid/myeloid progenitors (EMPs) differentiate from hemogenic endothelial cells in the conceptus. EMPs can be generated in vitro from embryonic or induced pluripotent stem cells, but efforts to produce HSCs have largely failed. The formation of both EMPs and HSCs requires the transcription factor Runx1 and its non-DNA binding partner core binding factor ÎČ (CBFÎČ). Here we show that the requirements for CBFÎČ in EMP and HSC formation in the conceptus are temporally and spatially distinct. Panendothelial expression of CBFÎČ in Tek-expressing cells was sufficient for EMP formation, but was not adequate for HSC formation. Expression of CBFÎČ in Ly6a-expressing cells, on the other hand, was sufficient for HSC, but not EMP, formation. The data indicate that EMPs and HSCs differentiate from distinct populations of hemogenic endothelial cells, with Ly6a expression specifically marking the HSC-generating hemogenic endothelium

    CASSANDRA: drug gene association prediction via text mining and ontologies

    No full text
    The amount of biomedical literature has been increasing rapidly during the last decade. Text mining techniques can harness this large-scale data, shed light onto complex drug mechanisms, and extract relation information that can support computational polypharmacology. In this work, we introduce CASSANDRA, a fully corpus-based and unsupervised algorithm which uses the MEDLINE indexed titles and abstracts to infer drug gene associations and assist drug repositioning. CASSANDRA measures the Pointwise Mutual Information (PMI) between biomedical terms derived from Gene Ontology (GO) and Medical Subject Headings (MeSH). Based on the PMI scores, drug and gene profiles are generated and candidate drug gene associations are inferred when computing the relatedness of their profiles. Results show that an Area Under the Curve (AUC) of up to 0.88 can be achieved. The algorithm can successfully identify direct drug gene associations with high precision and prioritize them over indirect drug gene associations. Validation shows that the statistically derived profiles from literature perform as good as (and at times better than) the manually curated profiles. In addition, we examine CASSANDRA’s potential towards drug repositioning. For all FDA-approved drugs repositioned over the last 5 years, we generate profiles from publications before 2009 and show that the new indications rank high in these profiles. In summary, co-occurrence based profiles derived from the biomedical literature can accurately predict drug gene associations and provide insights onto potential repositioning cases

    Business Process Management Analysis with Cost Information in Public Organizations: A Case Study at an Academic Library

    No full text
    Public organizations must provide high-quality services at a lower cost. In order to accomplish this goal, they need to apply well accepted cost methods and evaluate the efficiency of their processes using Business Process Management (BPM). However, only a few studies have evaluated the addition of cost information to a process model in a public organization. The aim of the research is to evaluate the combination of cost data to process modeling in an academic library. Our research suggests a new and easy to implement process analysis in three phases. We have combined qualitative (i.e., interviews with the library staff) and quantitative research methods (i.e., estimation of time and cost for each activity and process) to model two important processes of the academic library of the University of Macedonia (UoM). We have modeled the lending and return processes using Business Process Model and Notation (BPMN) in an easy-to-understand format. We have evaluated the costs of each process and sub process with the use of Time-Driven Activity-Based Costing (TDABC) method. The library’s managers found our methodology and results very helpful. Our analysis confirmed that the combination of workflow and cost analysis may significantly improve the decision-making procedure and the efficiency of an organization’s processes. However, we need to further research and evaluate the appropriateness of the combination of various cost and BPM methods in other public organizations

    Formalizing biomedical concepts from textual definition.

    No full text
    International audienc

    Formalizing biomedical concepts from textual definitions: Research Article

    No full text
    Background Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. Results We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. Conclusions The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL
    corecore