37 research outputs found

    DFP: a Bioconductor package for fuzzy profile identification and gene reduction of microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Expression profiling assays done by using DNA microarray technology generate enormous data sets that are not amenable to simple analysis. The greatest challenge in maximizing the use of this huge amount of data is to develop algorithms to interpret and interconnect results from different genes under different conditions. In this context, fuzzy logic can provide a systematic and unbiased way to both (<it>i</it>) find biologically significant insights relating to meaningful genes, thereby removing the need for expert knowledge in preliminary steps of microarray data analyses and (<it>ii</it>) reduce the cost and complexity of later applied machine learning techniques being able to achieve interpretable models.</p> <p>Results</p> <p>DFP is a new Bioconductor R package that implements a method for discretizing and selecting differentially expressed genes based on the application of fuzzy logic. DFP takes advantage of fuzzy membership functions to assign linguistic labels to gene expression levels. The technique builds a reduced set of relevant genes (FP, <it>Fuzzy Pattern</it>) able to summarize and represent each underlying class (pathology). A last step constructs a biased set of genes (DFP, <it>Discriminant Fuzzy Pattern</it>) by intersecting existing fuzzy patterns in order to detect discriminative elements. In addition, the software provides new functions and visualisation tools that summarize achieved results and aid in the interpretation of differentially expressed genes from multiple microarray experiments.</p> <p>Conclusion</p> <p>DFP integrates with other packages of the Bioconductor project, uses common data structures and is accompanied by ample documentation. It has the advantage that its parameters are highly configurable, facilitating the discovery of biologically relevant connections between sets of genes belonging to different pathologies. This information makes it possible to automatically filter irrelevant genes thereby reducing the large volume of data supplied by microarray experiments. Based on these contributions <smcaps>GENE</smcaps>CBR, a successful tool for cancer diagnosis using microarray datasets, has recently been released.</p

    Marky: a tool supporting annotation consistency in multi-user and iterative document annotation projects

    Get PDF
    Background and Objectives Document annotation is a key task in the development of Text Mining methods and applications. High quality annotated corpora are invaluable, but their preparation requires a considerable amount of resources and time. Although the existing annotation tools offer good user interaction interfaces to domain experts, project management and quality control abilities are still limited. Therefore, the current work introduces Marky, a new Web-based document annotation tool equipped to manage multi-user and iterative projects, and to evaluate annotation quality throughout the project life cycle. Methods At the core, Marky is a Web application based on the open source CakePHP framework. User interface relies on HTML5 and CSS3 technologies. Rangy library assists in browser-independent implementation of common DOM range and selection tasks, and Ajax and JQuery technologies are used to enhance user-system interaction. Results Marky grants solid management of inter- and intra-annotator work. Most notably, its annotation tracking system supports systematic and on-demand agreement analysis and annotation amendment. Each annotator may work over documents as usual, but all the annotations made are saved by the tracking system and may be further compared. So, the project administrator is able to evaluate annotation consistency among annotators and across rounds of annotation, while annotators are able to reject or amend subsets of annotations made in previous rounds. As a side effect, the tracking system minimises resource and time consumption. Conclusions Marky is a novel environment for managing multi-user and iterative document annotation projects. Compared to other tools, Marky offers a similar visually intuitive annotation experience while providing unique means to minimise annotation effort and enforce annotation quality, and therefore corpus consistency. Marky is freely available for non-commercial use at http://sing.ei.uvigo.es/markyThe authors thank the project PTDC/SAU-ESA/646091/2006/FCOMP-01-0124-FEDER-007480FCT, the Strategic Project PEst-OE/EQB/LA0023/2013, the Project "Bio-Health - Biotechnology and Bioengineering approaches to improve health quality", Ref. NORTE-07-0124-FEDER-000027, co-funded by the Programa Operacional Regional do Norte (ON.2 - O Novo Norte), QREN, FEDER, the project "RECI/BBB-EBI/0179/2012 - Consolidating Research Expertise and Resources on Cellular and Molecular Biotechnology at CEB/IBB", Ref. FCOMP-01-0124-FEDER-027462, FEDER, and the Agrupamento INBIOMED from DXPCTSUG-FEDER unha maneira de facer Europa (2012/273). The research leading to these results has received funding from the European Union's Seventh Framework Programme FP7/REGPOT-2012-2013.1 under grant agreement no. 316265 (BIOCAPS) and the [14VI05] Contract-Programme from the University of Vigo. This document reflects only the author's views and the European Union is not liable for any use that may be made of the information contained herein

    geneCBR: a translational tool for multiple-microarray analysis and integrative information retrieval for aiding diagnosis in cancer research

    Get PDF
    8 pages, 5 figures, 3 additional files.-- Software.[Background] Bioinformatics and medical informatics are two research fields that serve the needs of different but related communities. Both domains share the common goal of providing new algorithms, methods and technological solutions to biomedical research, and contributing to the treatment and cure of diseases. Although different microarray techniques have been successfully used to investigate useful information for cancer diagnosis at the gene expression level, the true integration of existing methods into day-to-day clinical practice is still a long way off. Within this context, case-based reasoning emerges as a suitable paradigm specially intended for the development of biomedical informatics applications and decision support systems, given the support and collaboration involved in such a translational development. With the goals of removing barriers against multi-disciplinary collaboration and facilitating the dissemination and transfer of knowledge to real practice, case-based reasoning systems have the potential to be applied to translational research mainly because their computational reasoning paradigm is similar to the way clinicians gather, analyze and process information in their own practice of clinical medicine.[Results] In addressing the issue of bridging the existing gap between biomedical researchers and clinicians who work in the domain of cancer diagnosis, prognosis and treatment, we have developed and made accessible a common interactive framework. Our geneCBR system implements a freely available software tool that allows the use of combined techniques that can be applied to gene selection, clustering, knowledge extraction and prediction for aiding diagnosis in cancer research. For biomedical researches, geneCBR expert mode offers a core workbench for designing and testing new techniques and experiments. For pathologists or oncologists, geneCBR diagnostic mode implements an effective and reliable system that can diagnose cancer subtypes based on the analysis of microarray data using a CBR architecture. For programmers, geneCBR programming mode includes an advanced edition module for run-time modification of previous coded techniques.[Conclusion] geneCBR is a new translational tool that can effectively support the integrative work of programmers, biomedical researches and clinicians working together in a common framework. The code is freely available under the GPL license and can be obtained at http://www.genecbr.org (webcite).This work is supported in part by the projects Research on Translational Bioinformatics (ref. 08VIB6) from University of Vigo and Development of computational tools for the classification and clustering of gene expression data in order to discover meaningful biological information in cancer diagnosis (ref. VA100A08) from JCyL (Spain). The work of D. Glez-Peña is supported by a "María Barbeito" contract from Xunta de Galicia.Peer reviewe

    Agent-based spatiotemporal simulation of biomolecular systems within the open source MASON framework

    Get PDF
    Agent-based modelling is being used to represent biological systems with increasing frequency and success. This paper presents the implementation of a new tool for biomolecular reaction modelling in the open source Multiagent Simulator of Neighborhoods framework. The rationale behind this new tool is the necessity to describe interactions at the molecular level to be able to grasp emergent and meaningful biological behaviour. We are particularly interested in characterising and quantifying the various effects that facilitate biocatalysis. Enzymes may display high specificity for their substrates and this information is crucial to the engineering and optimisation of bioprocesses. Simulation results demonstrate that molecule distributions, reaction rate parameters, and structural parameters can be adjusted separately in the simulation allowing a comprehensive study of individual effects in the context of realistic cell environments. While higher percentage of collisions with occurrence of reaction increases the affinity of the enzyme to the substrate, a faster reaction (i.e., turnover number) leads to a smaller number of time steps. Slower diffusion rates and molecular crowding (physical hurdles) decrease the collision rate of reactants, hence reducing the reaction rate, as expected. Also, the random distribution of molecules affects the results significantly.The authors thank the Agrupamento INBIOMED from DXPCTSUG-FEDER unha maneira de facer Europa (2012/273). The research leading to these results has received funding from the European Union's Seventh Framework Programme FP7/REGPOT-2012-2013.1 under Grant Agreement no. 316265 (BIOCAPS) and the [14VI05] Contract-Programme from the University of Vigo. This document reflects only the authors' views and the European Union is not liable for any use that may be made of the information contained herein

    A framework for the development of biomedical text mining software tools

    Get PDF
    Over the last few years, a growing number of techniques has been successfully proposed to tackle diverse challenges in the Biomedical Text Mining (BioTM) arena. However, the set of available software tools to researchers has not grown in a similar way. This work makes a contribution to close this gap, proposing a framework to ease the development of user-friendly and interoperable applications in this field, based on a set of available modular components. These modules can be connected in diverse ways to create applications that fit distinct user roles. Also, developers of new algorithms have a framework that allows them to easily integrate their implementations with state-of-the-art BioTM software for related tasks.This work was supported in part by the research projects recSysBio (ref. POCI/BIO/60139/2004) and MOBioPro (ref. POSC/EW59899/2004) of the University of Minho, financed by the Portuguese Fundaao para a Ciencia e Tecnologia. The work of SC is supported by a PhD grant from the same institution (ref. SFRH/BD/22863/2005)

    Web scraping technologies in an API world

    Get PDF
    Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.This work was partially funded by (i) the [TIN2009-14057-C03-02] project from the Spanish Ministry of Science and Innovation, the Plan E from the Spanish Government and the European Union from the European Regional Development Fund (ERDF), (ii) the Portugal-Spain cooperation action sponsored by the Foundation of Portuguese Universities [E 48/11] and the Spanish Ministry of Science and Innovation [AIB2010PT-00353] and (iii) the Agrupamento INBIOMED [2012/273] from the DXPCTSUG (Direccion Xeral de Promocion Cientifica e Tecnoloxica do Sistema Universitario de Galicia) from the Galician Government and the European Union from the ERDF unha maneira de facer Europa. H. L. F. was supported by a pre-doctoral fellowship from the University of Vigo

    Compi: a framework for portable and reproducible pipelines

    Get PDF
    Compi is an application framework to develop end-user, pipeline-based applications with a primary emphasis on: (i) user interface generation, by automatically generating a command-line interface based on the pipeline specific parameter definitions; (ii) application packaging, with compi-dk, which is a version-control-friendly tool to package the pipeline application and its dependencies into a Docker image; and (iii) application distribution provided through a public repository of Compi pipelines, named Compi Hub, which allows users to discover, browse and reuse them easily. By addressing these three aspects, Compi goes beyond traditional workflow engines, having been specially designed for researchers who want to take advantage of common workflow engine features (such as automatic job scheduling or logging, among others) while keeping the simplicity and readability of shell scripts without the need to learn a new programming language. Here we discuss the design of various pipelines developed with Compi to describe its main functionalities, as well as to highlight the similarities and differences with similar tools that are available. An open-source distribution under the Apache 2.0 License is available from GitHub (available at https://github.com/sing-group/compi). Documentation and installers are available from https://www.sing-group.org/compi. A specific repository for Compi pipelines is available from Compi Hub (available at https://www.sing-group.org/compihub

    Biomedical text mining applied to document retrieval and semantic indexing

    Get PDF
    In Biomedical research, the ability to retrieve the adequate information from the ever growing literature is an extremely important asset. This work provides an enhanced and general purpose approach to the process of document retrieval that enables the filtering of PubMed query results. The system is based on semantic indexing providing, for each set of retrieved documents, a network that links documents and relevant terms obtained by the annotation of biological entities (e.g. genes or proteins). This network provides distinct user perspectives and allows navigation over documents with similar terms and is also used to assess document relevance. A network learning procedure, based on previous work from e-mail spam filtering, is proposed, receiving as input a training set of manually classified documents
    corecore