27 research outputs found

    SILE: A Method for the Efficient Management of Smart Genomic Information

    Full text link
    [ES] A lo largo de las 煤ltimas dos d茅cadas, los datos generados por las tecnolog铆as de secuenciaci贸n de nueva generaci贸n han revolucionado nuestro entendimiento de la biolog铆a humana. Es m谩s, nos han permitido desarrollar y mejorar nuestro conocimiento sobre c贸mo los cambios (variaciones) en el ADN pueden estar relacionados con el riesgo de sufrir determinadas enfermedades. Actualmente, hay una gran cantidad de datos gen贸micos disponibles de forma p煤blica, que son consultados con frecuencia por la comunidad cient铆fica para extraer conclusiones significativas sobre las asociaciones entre los genes de riesgo y los mecanismos que producen las enfermedades. Sin embargo, el manejo de esta cantidad de datos que crece de forma exponencial se ha convertido en un reto. Los investigadores se ven obligados a sumergirse en un lago de datos muy complejos que est谩n dispersos en m谩s de mil repositorios heterog茅neos, representados en m煤ltiples formatos y con diferentes niveles de calidad. Adem谩s, cuando se trata de resolver una tarea en concreto s贸lo una peque帽a parte de la gran cantidad de datos disponibles es realmente significativa. Estos son los que nosotros denominamos datos "inteligentes". El principal objetivo de esta tesis es proponer un enfoque sistem谩tico para el manejo eficiente de datos gen贸micos inteligentes mediante el uso de t茅cnicas de modelado conceptual y evaluaci贸n de calidad de los datos. Este enfoque est谩 dirigido a poblar un sistema de informaci贸n con datos que sean lo suficientemente accesibles, informativos y 煤tiles para la extracci贸n de conocimiento de valor.[CA] Al llarg de les 煤ltimes dues d猫cades, les dades generades per les tecnologies de secuenciaci贸 de nova generaci贸 han revolucionat el nostre coneixement sobre la biologia humana. 脡s mes, ens han perm猫s desenvolupar i millorar el nostre coneixement sobre com els canvis (variacions) en l'ADN poden estar relacionats amb el risc de patir determinades malalties. Actualment, hi ha una gran quantitat de dades gen貌miques disponibles de forma p煤blica i que s贸n consultats amb freq眉猫ncia per la comunitat cient铆fica per a extraure conclusions significatives sobre les associacions entre gens de risc i els mecanismes que produeixen les malalties. No obstant aix貌, el maneig d'aquesta quantitat de dades que creix de forma exponencial s'ha convertit en un repte i els investigadors es veuen obligats a submergir-se en un llac de dades molt complexes que estan dispersos en mes de mil repositoris heterogenis, representats en m煤ltiples formats i amb diferents nivells de qualitat. A m\茅s, quan es tracta de resoldre una tasca en concret nom茅s una petita part de la gran quantitat de dades disponibles 茅s realment significativa. Aquests s贸n els que nosaltres anomenem dades "intel路ligents". El principal objectiu d'aquesta tesi 茅s proposar un enfocament sistem脿tic per al maneig eficient de dades gen貌miques intel路ligents mitjan莽ant l'煤s de t猫cniques de modelatge conceptual i avaluaci贸 de la qualitat de les dades. Aquest enfocament est脿 dirigit a poblar un sistema d'informaci贸 amb dades que siguen accessibles, informatius i 煤tils per a l'extracci贸 de coneixement de valor.[EN] In the last two decades, the data generated by the Next Generation Sequencing Technologies have revolutionized our understanding about the human biology. Furthermore, they have allowed us to develop and improve our knowledge about how changes (variants) in the DNA can be related to the risk of developing certain diseases. Currently, a large amount of genomic data is publicly available and frequently used by the research community, in order to extract meaningful and reliable associations among risk genes and the mechanisms of disease. However, the management of this exponential growth of data has become a challenge and the researchers are forced to delve into a lake of complex data spread in over thousand heterogeneous repositories, represented in multiple formats and with different levels of quality. Nevertheless, when these data are used to solve a concrete problem only a small part of them is really significant. This is what we call "smart" data. The main goal of this thesis is to provide a systematic approach to efficiently manage smart genomic data, by using conceptual modeling techniques and the principles of data quality assessment. The aim of this approach is to populate an Information System with data that are accessible, informative and actionable enough to extract valuable knowledge.This thesis was supported by the Research and Development Aid Program (PAID-01-16) under the FPI grant 2137.Le贸n Palacio, A. (2019). SILE: A Method for the Efficient Management of Smart Genomic Information [Tesis doctoral]. Universitat Polit猫cnica de Val猫ncia. https://doi.org/10.4995/Thesis/10251/131698TESISPremios Extraordinarios de tesis doctorale

    Conceptual Model of Proteins

    Full text link
    The following conceptual model represents the knowledge associated to the protein domain, including protein-protein interactions, pathways, functionality, post translational modifications, and association with disease.Le贸n Palacio, A.; Pastor L贸pez, O. (2020). Conceptual Model of Proteins. http://hdl.handle.net/10251/14788

    Towards a Shared, Conceptual Model-Based Understanding of Proteins and Their Interactions

    Full text link
    [EN] Understanding the human genome is a big research challenge. The huge complexity and amount of genome data require extremely effective and efficient data management policies. A first crucial point is to obtain a shared understanding of the domain, which becomes a very hard task considering the number of different genome data sources. To make things more complicated, those data sources deal with different parts of genome-based information: we not only need to understand them well, but also to integrate and intercommunicate all the relevant information. The protein perspective is a good example: rich, well-known repositories such as UniProt provide a lot of valuable information that it is not easy to interpret and manage when we want to generate useful results. Proteomes and basic information, protein-protein interaction, protein structure, protein processing events, protein function, etc. provide a lot of information is that needs to be conceptually characterized and delimited. To facilitate the essential common understanding of the domain, this paper uses the case of proteins to analyze the data provided by Uniprot in order to make a sound conceptualization work for identifying the relevant domain concepts. A conceptual model of proteins is the result of this conceptualization process, explained in detail in this work. This holistic conceptual model of proteins presented in this paper is the result of achieving a precise ontological commitment. It establishes concepts and their relationships that are significant in order to have a solid basis to efficiently manage relevant genome data related to proteins.This work was supported in part by the Spanish State Research Agency under Grant TIN2016-80811-P, and in part by the Generalidad Valenciana under Grant PROMETEO/2018/176, co-financed with ERDF.Le贸n-Palacio, A.; Pastor L贸pez, O. (2021). Towards a Shared, Conceptual Model-Based Understanding of Proteins and Their Interactions. IEEE Access. 9:73608-73623. https://doi.org/10.1109/ACCESS.2021.3080040S7360873623

    Conceptual Modeling of Proteins Based on UniProt

    Full text link
    Clinical disease states reflect the interaction of a myriad of genetic and environ-mental contributions. In this context, a major challenge is to develop information systems and algorithms that can describe this complexity to facilitate an under-standing of the disease mechanisms as well as to guide the development and ap-plication of therapies. This work focuses on describing how a shared understand-ing of the domain can be achieved by analyzing the conceptual precision of the main concepts that should constitute the ontological commitment that is strictly required when studying an important area of research: the role that proteins play in the different functions carried out within the cell of any living systems. The contribution of this paper is to show the conceptual complexity of the UniProtKB database, and to let users face and manage that complexity by providing a sound and well-grounded conceptual background to achieve the shared understanding of the domain, a crucial aspect to allow the design of any fruitful data analytics-based strategy. A conceptual model for proteins is carefully developed taking the UniProtKB database as data source, explaining in detail the problems that have been faced together with their corresponding solutions.Le贸n Palacio, A.; Pastor L贸pez, O. (2020). Conceptual Modeling of Proteins Based on UniProt. http://hdl.handle.net/10251/14561

    Enhancing Precision Medicine: A Big Data-Driven Approach for the Management of Genomic Data

    Full text link
    [EN] The management of the exponential growth of data that Next Generation Sequencing techniques produce has become a challenge for researchers that are forced to delve into an ocean of complex data in order to extract new insights to unravel the secrets of human diseases. Initially, this can be faced as a Big Data-related problem, but the genomic data have particular and relevant challenges that make them different from other Big Data working domains. Genomic data are much more heterogeneous; they are spread in hundreds of repositories, represented in multiple formats, and have different levels of quality. In addition, getting meaningful conclusions from genomic data requires considering all of the relevant surrounding knowledge that is under continuous evolution. In this scenario, the precise identification of what makes Genome Data Management so different is essential in order to provide effective Big Data-based solutions. Genomic projects require dealing with the technological problems associated with data management, nomenclature standards, and quality issues that only robust Information Systems that use Big Data techniques can provide. The main contribution of this paper is to present a Big Data-driven approach for managing genomic data, that is adapted to the particularities of the domain and to show its applicability to improve genetic diagnoses, which is the core of the development of accurate Precision Medicine.This work was supported by the Spanish State Research Agency (grant number TIN2016-80811-P) and the Generalitat Valenciana (grant number PROMETEO/2018/176), and co-financed with ERDF.Le贸n-Palacio, A.; Pastor L贸pez, O. (2021). Enhancing Precision Medicine: A Big Data-Driven Approach for the Management of Genomic Data. Big Data Research. 26:1-11. https://doi.org/10.1016/j.bdr.2021.100253S1112

    Evaluaci贸n del grado de agilismo basado en los objetivos y necesidades de los equipos de trabajo

    Full text link
    Le贸n Palacio, A. (2015). Evaluaci贸n del grado de agilismo basado en los objetivos y necesidades de los equipos de trabajo. http://hdl.handle.net/10251/65696Archivo delegad

    Genomic Information Systems applied to Precision Medicine: Genomic Data Management for Alzheimer鈥檚 Disease Treatment

    Get PDF
    The Alzheimer鈥檚 Disease is one of the most prevalent neurological disorders in our current society. The study of the genetic characteristics of every patient, makes possible the study of significant DNA variations in order to ease an early diagnosis, essential to stop the progression of the disorder. The problem is that the vast amount of available information makes necessary the use of a method designed to adequately store and manage this data in an optimal way for its exploitation. In this context, the Information Systems Engineering in general and the conceptual modelling techniques in particular, provide a suitable solution in order to determine which data is relevant and how to manage the corresponding information. With these fundamentals in mind, this paper introduces a particular example to bear the methodological treatment of the search, filter and load of genomic variations related to Alzheimer鈥檚 Disease for its later exploitation with clinical purposes

    An Advanced Search System to Manage SARS-CoV-2 and COVID-19 Data Using a Model-Driven Development Approach

    Full text link
    [EN] The pandemic outbreak of COVID-19 has allowed the proliferation of an unprecedented amount of data that must be organized and connected in a way that allows its efficient management. Nevertheless, the speed at which all of this knowledge is being generated has highlighted the shortcomings of the research community in creating well-organized, standardized, and structured databases. Despite the efforts of the community to develop advanced integrative platforms such as CovidGraph, we have identified some limitations when using these solutions that we think are derived from the lack of a sound ontological schema to guide the collection, standardization, and integration of data. This work explores the advantages and disadvantages for the final user of building advanced information systems using a Model Driven Development approach to integrate heterogeneous and complex data using an ontological background as a basis. As a proof of concept, we built a database (CovProt) to integrate data about different aspects of SARS-CoV-2 using this approach, we analyzed the advantages and disadvantages of using this approach compared to CovidGraph by performing a set of queries in CovProt and CovidGraph, and finally, we compared the structure and redundancy of the retrieved data.This work was supported in part by the Spanish State Research Agency and the Generalitat Valenciana under the Project PROMETEO/2018/176 and Project INNEST/2021/57, in part by the Spanish Ministry of Universities and the Universitat Politecnica de Valencia under the Margarita Salas Next Generation EU Grant, and in part by European Regional Development Fund (ERDF) and the European Union NextGenerationEU/Plan de Recuperacion, Transformacion y Resiliencia (PRTR).Le贸n-Palacio, A.; Garc铆a-Sim贸n, A.; Pastor L贸pez, O. (2022). An Advanced Search System to Manage SARS-CoV-2 and COVID-19 Data Using a Model-Driven Development Approach. IEEE Access. 10:43528-43534. https://doi.org/10.1109/ACCESS.2022.316926843528435341

    Smart Data for Genomic Information Systems: the SILE Method

    Get PDF
    [EN] During the last two decades, data generated by Next Generation Sequencing Technologies have revolutionized our understanding of human biology and improved the study on how changes (variations) in the DNA are involved in the risk of suffering a certain disease. A huge amount of genomic data is publicly available and frequently used by the research community in order to extract meaningful and reliable gene-disease relationships. However, the management of this exponential growth of data has become a challenge for biologists. Under such a Big Data problem perspective, they are forced to delve into a lake of complex data spread in over thousand heterogeneous repositories, represented in multiple formats and with different levels of quality; but when data are used to solve a concrete problem only a small part of that "data lake" is really significant; this is what we call the "smart" data perspective. By using conceptual models and the principles of data quality management, adapted to the genomic domain, we propose a systematic approach called SILE method to move from a Big Data to a Smart Data perspective. The aim of this approach is to populate an Information System with genomic data which are accessible, informative and actionable enough to extract valuable knowledge.The authors would like to thank the members of the PROS Research Centre Genome group for the fruitful discussions regarding the application of CM in the medicine field. This work has been developed with the financial support of the Spanish State Research Agency and the Generalitat Valenciana under the projects TIN2016-80811-P and PROMETEO/2018/176, cofinanced with ERDF. It was also supported by the Research and Development Aid Program (PAID-01-16) of the Universitat Polit猫cnica de Val猫ncia under the FPI grant 2137.Le贸n-Palacio, A.; Pastor L贸pez, O. (2018). Smart Data for Genomic Information Systems: the SILE Method. Complex Systems Informatics and Modeling Quarterly. (17):1-23. https://doi.org/10.7250/csimq.2018-17.01S1231

    Integration of clinical and genomic data to enhance precision medicine: a case of study applied to the retina-macula

    Full text link
    [EN] Age-related macular degeneration is a complex, multifactorial, and neurodegenerative disease that is the third cause of blindness after cataracts and glaucoma. To date, there are no effective remedies available for treating the disease. Therefore, the main goal of the scientific community is to uncover the underlying role that both genetics and environmental factors play in the development of the disease. Nevertheless, the complexity of the domain, the heterogeneity of the information, and the massive amounts of existing data hinder the daily work of clinical experts to provide an accurate diagnosis and treatment. In this work, we present how clinicians can benefit from the development of ontologically well-grounded information systems to support the management of both clinical and genomic data. First, we summarize the results obtained in a previous work that cover the clinical perspective using an information system called G-MAC, that has been specially developed for the management of clinical data. Then, we present the results of an exhaustive study of the genetic factors of age-related macular degeneration by using an information system that was developed with the aim of enhancing the management of complex genomic data. Finally, we state how the connection of both perspectives through the use of conceptual models can benefit clinicians and patients through a more accurate Medicine of Precision.The authors would like to thank the members of the PROS Research Center Genome group for the fruitful discussions regarding the application of CM in the medical field. This work was supported by the Valencian Innovation Agency and Innovation through the OGMIOS project (INNEST/2021/57), the Preparatory Action-UPVFISABIO (A36-G-MAC, 2019), the Generalitat Valenciana through the CoMoDiD project (CIPROM/2021/023), and the Spanish State Research Agency through the DELFOS (PDC2021-121243-I00) and SREC (PID2021-123824OB-I00) projects, MICIN/AEI/10.13039/501 100011033 and co-financed with ERDF and the European Union Next Generation EU/PRTR.Reyes Rom谩n, JF.; Le贸n-Palacio, A.; Garc铆a-Sim贸n, A.; Cabrera Beyrouti, R.; Pastor L贸pez, O. (2023). Integration of clinical and genomic data to enhance precision medicine: a case of study applied to the retina-macula. Software & Systems Modeling. 22(1):159-174. https://doi.org/10.1007/s10270-022-01039-415917422
    corecore