977 research outputs found

    A Stochastic Mobility Prediction Algorithm for finding Delay and Energy Efficient Routing Paths considering Movement Patterns in Mobile IoT Networks

    Get PDF
    In Mobile IoT Networks, the network nodes are constantly moving in a field, causing interruptions in the communication paths and, thus, generating long delays at the time of building a communication path from a source IoT node to the gateway (destination node). Communication interruptions affect the delay performance in delay-sensitive applications such as health and military scenarios. In addition, these IoT nodes are equipped with batteries, whereby it is also necessary to accomplish energy consumption requirements. In summary, a gateway node should not receive messages or packets coming from the IoT nodes with undesired delays, whereby it is pertinent to propose new algorithms or techniques for minimizing the delay and energy consumption experimented in the IoT network. Due to IoT nodes are attached to humans, animals or objects, they present a specific movement pattern that can be analyzed to improve the path-building with the aim of reducing the end-to-end delay. Therefore, we propose the usage of a mobility prediction technique based on a Stochastic Model to predict nodes’ positions in order to obtain minimum cost paths in terms of energy consumption and delay in mobile IoT networks. Our stochastic model is tuned and evaluated under the Markov-Gauss mobility model, considering different levels of movement randomness in order to test how the capability prediction of our proposal can impact the delay and energy consumption in mobile IoT networks in comparison with others routing algorithms

    Machine Learning Analysis of TCGA Cancer Data

    Get PDF
    [Abstract] In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.This work was supported by the “Collaborative Project in Genomic Data Integration (CICLOGEN)” PI17/01826 funded by the Carlos III Health Institute from the Spanish National plan for Scientific and Technical Research and Innovation 2013–2016 and the European Regional Development Funds (FEDER)—“A way to build Europe.” and the General Directorate of Culture, Education and University Management of Xunta de Galicia (Ref. ED431D 2017/16), the “Galician Network for Colorectal Cancer Research” (Ref. ED431D 2017/23) and Competitive Reference Groups (Ref. ED431C 2018/49). CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidades from Xunta de Galicia”, supported in an 80% through ERDF Funds, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2019/01). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscriptXunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431D 2017/23Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431G 2019/0

    Population Subset Selection for the Use of a Validation Dataset for Overfitting Control in Genetic Programming

    Get PDF
    [Abstract] Genetic Programming (GP) is a technique which is able to solve different problems through the evolution of mathematical expressions. However, in order to be applied, its tendency to overfit the data is one of its main issues. The use of a validation dataset is a common alternative to prevent overfitting in many Machine Learning (ML) techniques, including GP. But, there is one key point which differentiates GP and other ML techniques: instead of training a single model, GP evolves a population of models. Therefore, the use of the validation dataset has several possibilities because any of those evolved models could be evaluated. This work explores the possibility of using the validation dataset not only on the training-best individual but also in a subset with the training-best individuals of the population. The study has been conducted with 5 well-known databases performing regression or classification tasks. In most of the cases, the results of the study point out to an improvement when the validation dataset is used on a subset of the population instead of only on the training-best individual, which also induces a reduction on the number of nodes and, consequently, a lower complexity on the expressions.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431D 2017/23Instituto de Salud Carlos III; PI17/0182

    A generalized linear model for cardiovascular complications prediction in PD patients

    Get PDF
    [Abstract] This study was conducted using machine learning models to identify patient non-invasive information for cardiovascular complications prediction in peritoneal dialysis patients. Nowadays is well known that cardiovascular diseases are the key to mortality in patients undergoing peritoneal dialysis as the risk of cardiovascular disease increases with the progression of renal failure. Primary aim is to establish variables most associated with cardiovascular complications. To achieve this goal four different machine learning techniques were used. We found that the best classification algorithm was a Generalized Linear Model, which achieved AUC values above 96% using a small subset of the original variables following a feature selection approach. Our approach allows us to increase the interpretability of the combinations of traditional factors, advanced chronic kidney disease factors and peritoneal dialysis factors all related with cardiovascular risk profile. The final model is based primarily in the traditional factors.Instituto de Salud Carlos III; PI17/01826Xuinta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/1Xunta de Galicia; ED431D 2017/2Ministerio de Economía y Competitividad; UNLC08-1E-002Ministerio de Economía y Competitividad; UNLC13-13-350

    Energy supply optimization for unregulated consumers

    Get PDF
    El presente artículo propone un modelo de optimización del portafolio de abastecimiento de energía eléctrica para consumidores finales no regulados en el mercado de electricidad colombiano. El propósito del modelo es determinar la cantidad óptima de energía que debe ser suministrada por cada una de las tres formas de abastecimiento disponibles para el usuario: compra basada en mercado spot, compra mediante contratos bilaterales y cogeneración, minimizando el costo esperado de abastecimiento de energía y el valor en riesgo asociado. Para este objetivo se usa un modelo de optimización estocástica y el indicador de riesgo empleado es el valor en riesgo condicional ( Conditional Value at Risk-CVaR). Finalmente, se estudian los resultados del modelo a través de escenarios de precios simulados basados en los precios reportados en el sistema de información NEON administrado por XM S.A., operador del mercado de electricidad colombiano y se selecciona el mejor ejemplo de aversión al riesgo. Abstract A supply electricity portfolio optimization model for unregulated consumers in the Colombian electricity market is proposed in this paper. The purpose is to choose between three supply alternatives available to the consumers: spot market purchase, purchase by bilateral contracts and self-generation, minimizing the total expected cost and the risk associated to these decisions. For this objective, a stochastic optimization model is used and the risk indicator is the conditional value at risk (CVaR). Finally, the model results are analyzed through the application of simulated prices based on real price observations from the database managed by XM – the Colombian Market Operator, and the best instance of risk aversion is selected

    Molecular Docking and Machine Learning Analysis of Abemaciclib in Colon Cancer

    Get PDF
    [Abstract] Background - The main challenge in cancer research is the identification of different omic variables that present a prognostic value and personalised diagnosis for each tumour. The fact that the diagnosis is personalised opens the doors to the design and discovery of new specific treatments for each patient. In this context, this work offers new ways to reuse existing databases and work to create added value in research. Three published signatures with significante prognostic value in Colon Adenocarcinoma (COAD) were indentified. These signatures were combined in a new meta-signature and validated with main Machine Learning (ML) and conventional statistical techniques. In addition, a drug repurposing experiment was carried out through Molecular Docking (MD) methodology in order to identify new potential treatments in COAD. Results - The prognostic potential of the signature was validated by means of ML algorithms and differential gene expression analysis. The results obtained supported the possibility that this meta-signature could harbor genes of interest for the prognosis and treatment of COAD. We studied drug repurposing following a molecular docking (MD) analysis, where the different protein data bank (PDB) structures of the genes of the meta-signature (in total 155) were confronted with 81 anti-cancer drugs approved by the FDA. We observed four interactions of interest: GLTP - Nilotinib, PTPRN - Venetoclax, VEGFA - Venetoclax and FABP6 - Abemaciclib. The FABP6 gene and its role within different metabolic pathways were studied in tumour and normal tissue and we observed the capability of the FABP6 gene to be a therapeutic target. Our in silico results showed a significant specificity of the union of the protein products of the FABP6 gene as well as the known action of Abemaciclib as an inhibitor of the CDK4/6 protein and therefore, of the cell cycle. Conclusions - The results of our ML and differential expression experiments have first shown the FABP6 gene as a possible new cancer biomarker due to its specificity in colonic tumour tissue and no expression in healthy adjacent tissue. Next, the MD analysis showed that the drug Abemaciclib characteristic affinity for the different protein structures of the FABP6 gene. Therefore, in silico experiments have shown a new opportunity that should be validated experimentally, thus helping to reduce the cost and speed of drug screening. For these reasons, we propose the validation of the drug Abemaciclib for the treatment of colon cancer.This work was supported by the “Collaborative Project in Genomic Data Integration (CICLOGEN)” PI17/01826 funded by the Carlos III Health Institute from the Spanish National plan for Scientific and Technical Research and Innovation 2013–2016 and the European Regional Development Funds (FEDER)—“A way to build Europe.” and the General Directorate of Culture, Education and University Management of Xunta de Galicia (Ref. ED431G/01, ED431D 2017/16), the “Galician Network for Colorectal Cancer Research” (Ref. ED431D 2017/23) and Competitive Reference Groups (Ref. ED431C 2018/49). The calculations were performed on resources provided by the Spanish Ministry of Economy and Competitiveness via funding of the unique installation BIOCAI (UNLC08-1E-002, UNLC13-13-3503) and the European Regional Development Funds (FEDER). The funding body did not have a role in the experimental design; data collection, analysis and interpretation; and writing of this manuscriptXunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431D 2017/23Xunta de Galicia; ED431C 2018/4

    Integrative Multi-Omics Data-Driven Approach for Metastasis Prediction in Cancer

    Get PDF
    [Abstract] Nowadays biomedical research is generating huge amounts of omic data, covering all levels of genetic information from nucleotide sequencing to protein metabolism. In the beginning, data were analyzed independently losing a great deal of essential information in the models. Even so, complex metabolic routes and genetic diseases could be determined. In the last decade, there has been an ever-increasing number of research projects that follow a systemic biological approach by integrating multiple omic datasets obtaining more complex, powerful and informative models that provide a deeper knowledge about the genotype-phenotype interactions. These models greatly contributed to the study of complex multi-factorial diseases such as cancer. The onset and development of any type of cancer can be influenced by multiple variables. Integrate as many as possible omic datasets is therefore the best approach to extract all the underlying knowledge. A significant factor in the mortality of this disease is the metastatic process. The identification of the factors involved in this cell behavior may be helpful in the diagnosis and hopefully in the disease prevention. The development of novel integrative multiomics approaches is an opportunity to fill the gaps between our ability to generate data and the difficulties to understand the biology behind them. In this work we propose a methodology pipeline for analyze multi-omics data using machine learning.Instituto de Salud Carlos III; PI17/01826Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/1Xunta de Galicia; ED431D 2017/2Ministerio de Economía y Competitividad; UNLC08-1E-002Ministerio de Economía y Competitividad; UNLC13-13-350

    Addressing Cooperation between Mobile Operators in Telecommunication Networks via Optimization: A Lexicographic Approach and Case Study in Colombia

    Get PDF
    Cooperation between Telecommunications (Telco) operators has been limited both by regulation and competition in previous years. However, cooperation could not only allow an overall growth in quality of service (QoS) but also may benefit companies with under exploited nodes in their network infrastructure. This way, both fully deployed infrastructure by single Telco companies, as well as smaller companies with increasing service demand but low infrastructure deployment could potentially benefit from cooperation agreements. This article proposes a lexicographic mixed-integer linear optimization model for Telco cooperation composed by two phases: Phase 1 maximizes the number of services connected to the current infrastructure assuming cooperation between operators while Phase 2 minimizes the costs of connecting such services. We built a simple base scenario that allowed us to validate the intuition behind our model. Furthermore, to demonstrate the applicability of our lexicographic optimization model for cooperation between mobile operators, we present a real-world case study in a rural area in Colombia that allowed us to find the marginal costs of additional national roaming connections, as well as marginal profits under the cooperation schema. Our results could help mobile operators to benefit from cooperation and, since the model adapts to the local necessities of the company, cooperation could be restricted to any desired level
    corecore