72 research outputs found
BioSStore: A Client Interface for a Repository of Semantically Annotated Bioinformatics Web Services
Bioinformatics has shown itself to be a domain in which Web services are being used extensively. In this domain, simple but real services are being developed. Thus, there are huge repositories of real services available (for example BioMOBY main repository includes more than 1500 services). Besides, bioinformatics repositories usually have active communities using and working on improvements. However, these kinds of repositories do not exploit the full potential of Web services (and SOA, Service Oriented Applications, in general). On the other hand, sophisticated technologies have been proposed to improve SOA, including the annotation on Web services to explicitly describe them. However, these approaches are lacking in repositories with real services. In the work presented here, we address the drawbacks present in bioinformatics services and try to improve the current semantic model by introducing the use of the W3C standard Semantic Annotations for WSDL and XML Schema (SAWSDL) and related proposals (WSMO Lite). This paper focuses on a user interface that takes advantage of a repository of semantically annotated bioinformatics Web services. In this way, we exploit semantics for the discovery of Web services, showing how the use of semantics will improve the user searches. The BioSStore is available at http://biosstore.khaos.uma.es. This portal will contain also future developments of this proposal
Biopax and Semantics
Biopax community is producing sets of data in RDF files, but most of them are not available through query interfaces. The publication of SPARQL endpoints is feasible with current sets of data, but the use of reasoning in these interfaces is unfeasible in many cases. The use of large scale reasoners is a need to take advantage of these data sets
A Service for Flexible Management and Analysis of Heterogeneous Clinical Data
Este documento describe FIMED 2.0, un servicio para la gestión flexible
y análisis de datos clínicos heterogéneos. Esta herramienta de software
permite la gestión flexible de datos clínicos de múltiples ensayos, lo que puede
ayudar a mejorar la calidad de los datos clínicos y facilitar los ensayos clínicos. El
servicio propuesto se ha desarrollado sobre una base de datos NoSQL (MongoDB)
que permite recoger e integrar los datos clínicos en esquemas dinámicos
e incrementales en función de sus necesidades y de los requisitos de la investigación clínica.
requisitos de la investigación clínica. Basándonos en nuestras experiencias con la Gestión Flexible
de Datos Biomédicos (FIMED), hemos desarrollado esta nueva versión de la
herramienta con el objetivo no sólo de replicar la anterior, sino también de incluir más
análisis de redes reguladoras de genes y visualización de datos orientados a
anotar la funcionalidad de los genes e identificar los genes centrales. Esta versión permite
Esta versión permite al profesional utilizar cuatro métodos diferentes de construcción de redes, como
como la asimilación de datos, la interpolación lineal, el conjunto basado en árboles o la regresión
Boosting Machine. Puede encontrar una versión gratuita de esta herramienta
en la web https://khaos.uma.es/fimedV2. Se ha creado una cuenta de usuario de demostración
para proporcionar una demostración de usuario, "iwbbio", utilizando la contraseña
"demo". Un caso de uso real para un ensayo clínico en la enfermedad del melanoma
también se incluye en esta demostración, que sí ha sido anonimizada.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec
Bioqueries: a collaborative environment to create, explore and share SPARQL queries in Life Sciences
Bioqueries provides a collaborative environment to create, explore, execute, clone and share SPARQL queries (including Federated Queries). Federated SPARQL queries can retrieve information from more than one data source.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Melanoma expression analysis with Big Data technologies
Melanoma is a highly immunogenic tumor. Therefore, in recent years physicians have incorporated drugs that alter the immune system into their therapeutic arsenal against this disease, revolutionizing in the treatment of patients in an advanced stage of the disease. This has led us to explore and deepen our knowledge of the immunology surrounding melanoma, in order to optimize its approach. At present, immunotherapy for metastatic melanoma is based on stimulating an individual’s own immune system through the use of specific monoclonal antibodies. The use of immunotherapy has meant that many of patients with melanoma have survived and therefore it constitutes a present and future treatment in this field. At the same time, drugs have been developed targeting specific mutations, specifically BRAF, resulting in large responses in tumor regression (set up in this clinical study to 18 months), as well as a higher percentage of long-term survivors. The analysis of the gene expression changes and their correlation with clinical changes can be developed using the tools provided by those companies which currently provide gene expression platforms. The gene expression platform used in this clinical study is NanoString, which provides nCounter. However, nCounter has some limitations as the type of analysis is restricted to a predefined set, and the introduction of clinical features is a complex task. This paper presents an approach to collect the clinical information using a structured database and a Web user interface to introduce this information, including the results of the gene expression measurements, to go a step further than the nCounter tool. As part of this work, we present an initial analysis of changes in the gene expression of a set of patients before and after targeted therapy. This analysis has been carried out using Big Data technologies (Apache Spark) with the final goal being to scale up to large numbers of patients, even though this initial study has a limited number of enrolled patients (12 in the first analysis). This is not a Big Data problem, but the underlaying study aims at targeting 20 patients per year just in Málaga, and this could be extended to be used to analyze the 3.600 patients diagnosed with melanoma per year.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work was funded in part by Grants TIN2014-58304-R (Ministerio de Ciencia e Innovación) and P11-TIC-7529 and P12-TIC-1519 (Plan Andaluz de Investigación, Desarrollo e Innovación). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
NORA: Scalable OWL reasoner based on NoSQL databasesand Apache Spark
Reasoning is the process of inferring new knowledge and identifying inconsis-tencies within ontologies. Traditional techniques often prove inadequate whenreasoning over large Knowledge Bases containing millions or billions of facts.This article introduces NORA, a persistent and scalable OWL reasoner built ontop of Apache Spark, designed to address the challenges of reasoning over exten-sive and complex ontologies. NORA exploits the scalability of NoSQL databasesto effectively apply inference rules to Big Data ontologies with large ABoxes. Tofacilitatescalablereasoning,OWLdata,includingclassandpropertyhierarchiesand instances, are materialized in the Apache Cassandra database. Spark pro-grams are then evaluated iteratively, uncovering new implicit knowledge fromthe dataset and leading to enhanced performance and more efficient reasoningover large-scale ontologies. NORA has undergone a thorough evaluation withdifferent benchmarking ontologies of varying sizes to assess the scalability of thedeveloped solution.Funding for open access charge: Universidad de Málaga / CBUA
This work has been partially funded by grant (funded by MCIN/AEI/10.13039/501100011033/) PID2020-112540RB-C41,AETHER-UMA (A smart data holistic approach for context-aware data analytics: semantics and context exploita-tion). Antonio Benítez-Hidalgo is supported by Grant PRE2018-084280 (Spanish Ministry of Science, Innovation andUniversities)
KNIT: Ontology reusability through knowledge graph exploration
Ontologies have become a standard for knowledge representation across several domains. In Life Sciences, numerous ontologies have been introduced to represent human knowledge, often providing overlapping or conflicting perspectives. These ontologies are usually published as OWL or OBO, and are often registered in open repositories, e.g., BioPortal. However, the task of finding the concepts (classes and their properties) defined in the existing ontologies and the relationships between these concepts across different ontologies – for example, for developing a new ontology aligned with the existing ones – requires a great deal of manual
effort in searching through the public repositories for candidate ontologies and their entities. In this work, we develop a new tool, KNIT, to automatically explore open repositories to help users fetch the previously designed concepts using keywords. User-specified keywords are then used to retrieve matching names of classes or properties. KNIT then creates a draft knowledge graph populated with the concepts and relationships retrieved from the existing ontologies. Furthermore, following the process of ontology learning, our tool refines this first draft of an ontology. We present three BioPortal-specific use cases for our tool. These use cases outline the development of new knowledge graphs and ontologies in the sub-domains of biology: genes and diseases,
virome and drugs.This work has been funded by grant PID2020-112540RB-C4121, AETHER-UMA (A smart data holistic approach for context-aware data analytics: semantics and context exploitation).
Funding for open access charge: Universidad de Málaga / CBUA
Pattern recognition frequency-based feature selection with multi-objective discrete evolution strategy for high-dimensional medical datasets
Feature selection has a prominent role in high-dimensional datasets to increase classification accuracy, decrease the learning algorithm computational time, and present the most informative features to decision-makers. This paper proposes a two-stage hybrid feature selection for high-dimensional medical datasets: Maximum Pattern Recognition - Multi-objective Discrete Evolution Strategy (MPR-MDES). MPR is a rapid filter ranker that significantly outperforms existing frequency-based rankers in recognizing non-linear patterns, effectively eliminating a majority of non-informative features. Then, the wrapper Multi-objective Discrete Evolution Strategy (MDES) uses the remaining features and obtains sets of solutions which are automatically presented to decision-makers. The experiments conducted on large medical datasets demonstrate that MPR-MDES achieves considerable improvements compared to state-of-the-art methods, in terms of both classification accuracy and dimensionality reduction. In this sense, the proposal successfully performs when presenting informative feature sets to decision-makers. The implementation is available on https://github.com/KhaosResearch/MPR-MDES.Funding for open access charge: Universidad de Málaga/CBUA .
This work has been partially funded by grants (funded by MCIN/AEI/10.13039/501100011033/) PID2020-112540RB-C41, AETHER-UMA (A smart data holistic approach for context-aware data analytics: semantics and context exploitation), and QUAL21 010UMA (Junta de Andalucía)
A Fine Grain Sentiment Analysis with Semantics in Tweets
Social networking is nowadays a major source of new information in the world. Microblogging sites like Twitter have millions of active users (320 million active users on Twitter on the 30th September 2015) who share their opinions in real time, generating huge amounts of data. These data are, in most cases, available to any network user. The opinions of Twitter users have become something that companies and other organisations study to see whether or not their users like the products or services they offer. One way to assess opinions on Twitter is classifying the sentiment of the tweets as positive or negative. However, this process is usually done at a coarse grain level and the tweets are classified as positive or negative. However, tweets can be partially positive and negative at the same time, referring to different entities. As a result, general approaches usually classify these tweets as “neutral”. In this paper, we propose a semantic analysis of tweets, using Natural Language Processing to classify the sentiment with regards to the entities mentioned in each tweet. We offer a combination of Big Data tools (under the Apache Hadoop framework) and sentiment analysis using RDF graphs supporting the study of the tweet’s lexicon. This work has been empirically validated using a sporting event, the 2014 Phillips 66 Big 12 Men’s Basketball Championship. The experimental results show a clear correlation between the predicted sentiments with specific events during the championship
Ensemble-based genetic algorithm explainer with automized image segmentation: A case study on melanoma detection dataset
Explainable Artificial Intelligence (XAI) makes AI understandable to the human user particularly when the
model is complex and opaque. Local Interpretable Model-agnostic Explanations (LIME) has an image explainer
package that is used to explain deep learning models. The image explainer of LIME needs some parameters to
be manually tuned by the expert in advance, including the number of top features to be seen and the number
of superpixels in the segmented input image. This parameter tuning is a time-consuming task. Hence, with the
aim of developing an image explainer that automizes image segmentation, this paper proposes Ensemblebased Genetic Algorithm Explainer (EGAE) for melanoma cancer detection that automatically detects and
presents the informative sections of the image to the user. EGAE has three phases. First, the sparsity of
chromosomes in GAs is determined heuristically. Then, multiple GAs are executed consecutively. However,
the difference between these GAs are in different number of superpixels in the input image that result in
different chromosome lengths. Finally, the results of GAs are ensembled using consensus and majority votings.
This paper also introduces how Euclidean distance can be used to calculate the distance between the actual
explanation (delineated by experts) and the calculated explanation (computed by the explainer) for accuracy
measurement. Experimental results on a melanoma dataset show that EGAE automatically detects informative
lesions, and it also improves the accuracy of explanation in comparison with LIME efficiently. The python
codes for EGAE, the ground truths delineated by clinicians, and the melanoma detection dataset are available
at https://github.com/KhaosResearch/EGAEThis work has been partially funded by grant PID2020-112540RBC41 (funded by MCIN/AEI/10.13039/501100011033/, Spain), AETHERUMA, Spain (A smart data holistic approach for context-aware data analytics: semantics and context exploitation). Funding for open access charge: Universidad de Málaga/CBUA. Additionally, we thank Dr. Miguel Ángel Berciano Guerrero from Unidad de Oncología Intercentros, Hospitales Univesitarios Regional Virgen de la Victoria de Málaga, and Instituto de Investigaciones Biomédicas (IBIMA), Málaga, Spain, for his support in images selection and general medical orientation in the particular case of Melanoma
- …