32 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Application of dynamic programming for the analysis of complex water resources systems : a case study on the Mahaweli River basin development in Sri Lanka

    Get PDF
    The technique of Stochastic Dynamic Programming (SDP) is ideally suited for operation policy analyses of water resources systems. However SDP has a major drawback which is appropriately termed as its "curse of dimensionality".Aggregation/Disaggregation techniques based on SDP and simulation are presented to analyze a complex water resources system. The system under consideration serves two major purposes: hydropower generation and irrigation. The identification of subsystems by their functional and physical characteristics was an important first step in the analysis. Subsequently each subsystem is represented by a hypothetical composite reservoir to arrive at an operation policy for the interface point of the subsystems. A more detailed analysis which considers the real configurations of the subsystems is performed by following this operation policy of the interface point. Two approaches: sequential optimization and iterative optimization are presented. In these approaches, each subsystem is individually analyzed using two-reservoir SDP models.The applicability of an Implicit Stochastic Approach in which the operation of the system is optimized for a number of deterministic hydrologic data series is also investigated. To complement the aggregation technique of the Composite Reservoir, subsequent disaggregation techniques are proposed. Three different techniques: (1) A statistical disaggregation, (2) An optimization/simulation-based technique, and (3) The disaggregation of the composite policy in the actual operation by incorporating a single-time-step optimization are tested.The accuracy of the sequential and iterative optimization approaches are evaluated by applying them to a subsystem of three reservoirs in a cascade for which the deterministic optimum pattern is also determined by an Incremental Dynamic Programming (IDP) model. In the case of the Implicit Stochastic Approach, the results are compared with the results of the explicit SDP approach and the deterministic optimum operation pattern, in addition to the historical operation pattern of the system. The results of the Composite Policy Disaggregation techniques are compared to the results obtained by real multireservoir optimizations carried out by the use of explicit SDP models

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse

    Estudio comparativo completo de varios métodos basados en datos para la gestión de los recursos hídricos en ambientes mediterráneos a través de diferentes escalas temporales

    Get PDF
    Since the beginning of time, there has been innovation in the knowledge and technology of water and the hydraulic systems, to achieve an efficient and upgrade management of them. In this project, as an opening hypothesis, we will apply computational techniques and Artificial Intelligence concepts. Given that the primary asset of these studies is data, we have preferred to use the term ”Data-Driven”, as the term Artificial Intelligence can cause confusion in non-experts. This is an expanding field in all aspects of science and life, where the computing and processing powers are increasing periodic, so does the generation of information. There we have 5G technology, or the Internet of things, where the exponential build up in the volume of data utilised, pushes us to set up frameworks for the treatment and analysis of the information.Data-Driven techniques offers enormous potential to transform our perception to understand,monitor and predict the states of hydro-meteorological variables. Its application provides benefits, however, performing these exercises requires practice and explicit knowledge. Therefore, a deeper understanding of the capabilities and limitations of novel computational techniques within our field of knowledge is needed. Hence, it is essential to carry out ”hydro-informatics” experiences under this assumption. For the development of these models, we identify which points are the most relevant and need to be taken into account in regional conditions or frameworks. In consequence, we will work with the time series collected in the different monitoring networks, selecting the hydrological points of interest, in order to further develop hydrological frameworks that are useful for water management and optimisation. Here, we are interested in seeing the practical applicability to hydro-meteorology under Mediterranean conditions, where data are sometimes scarce, by selecting two hydrographic basins in south-east Andalusia: the Guadalhorce river (Málaga) and the Guadalfeo river (Granada). In chapter 1, an introduction to the doctoral thesis is made. Likewise, we establish the general and the specific objectives, and the motivation of the thesis. Afterwards, we describe the three fundamental exercises to be carried out in the research work: Regression, Classification and Optimisation. Ultimately, we carry out a brief review of previous works under Mediterranean climatic conditions and similar assumptions. Chapter 2 presents the study areas, analysing the spatial and temporal characteristics of two Andalusian Mediterranean basins in south-east Spain: Guadalhorce (GH) and Guadalfeo (GF). These are hydrographic basins with highly variable/heterogeneous spacetime patterns. The first hydrological system, GH, contains an area of socio-economic importance, such is the city of M´alaga. The second, GF, to the north has the Sierra Nevada National Park, crowned by the Mulhac´en peak and flowing in a few kilometres into the area of Motril. In this particular water system, we find large gradients of the geophysical agents. Both systems have regulation structures of great interest for the development and study of their optimisation. We also review the monitoring networks available in these basins, and which environmental agents and/or processes should be taken into account to meet the objectives of this work. We carry out a bibliographic review of the most relevant historical floods, listing the factors associated with these extreme events. In the data analysis stage of this chapter, we focus on the spatialtemporal evolution of the risk of flooding in the two mouths of the Guadalhorce and Guadalfeo Rivers into the Albor´an Sea. We quantify that had stepped up in recent years, noting that dangerous practices have increased the risk of flooding because of the intrusion of land uses with high-costs. This chapter also analyses collected data within the monitoring networks, to understand the occurrence of floods in the river GH related to upstream discharges. We found that this basin has limitations in regulation and cannot mitigate costs downstream. The results got, were part of the work presented in Egüen et al. (2015). These analyses allow us to identify in which parts of the flood management of this hydrological system need a more precise optimisation. Finally, a summary of another important hydrological risk is carried out, such as droughts, and how these water deficits can be represented by standardised indices, both in rainfall and the flow rates. The various approaches and methodologies for hydro-meteorological time series modelling are discussed in the chapter 3. The contrasting concepts are exposed antagonistically, to focus on the different design choices that we need to make: black box vs. grey box vs. white box, parametric vs. non-parametric, static vs. dynamic, linear vs. non-linear, frequency vs. Bayesian, single vs. multiple, among others..., detailing the advantages and disadvantages of each approach. We presented some ideas that emerged in this part of the research in Herrero et al. (2014). The partition, management and data transformation steps for the correct application of these experimental methods are also discussed. This is of great importance, since part of the hard work in the application of these methods comes from the transformation of the data. So that, the algorithms and transfer functions work correctly. Finally, we focus on how to test and validate the deterministic and probabilistic behaviours through evaluative coefficients to avoid coefficients that mask the results, and therefore focus on the behaviours of our interest, in our case precision and predictability. We have also taken parsimony into account in models based on neural networks, since they can easily fall into over-parameterisation. In chapter 4, we present the experimental work, where seven short-term, six daily and one hourly rainfall-runoff regressions are performed. The case studies correspond to various points of interest within the study areas with important implications for hydrological management. On an hourly scale, we analyse the efficiency and predictive capacities of the MLR and BNN at ten time horizons for the level of the Guadalhorce River in Cártama. We found that, for closer predictive horizons, a simpler approach such as linear (MLR) can outperform other with a priori higher capabilities, such as non-linear (BNN). This finding could simplify greatly its development and application. At a daily scale, we establish a comparative framework between the two previous models and a complete Bayesian method such as the Gaussian Processes. This DD computational technique, allows us to apply different transfer functions under a single model. This is an advantage over the other two DD models, since the results show that they work well in one domain, but do not work well in the other. During the construction of the models, we do the selection of the input variables in a progressive way, through a trial-and-error method, where the significant improvements with respect to the last predictor structure are taken into account preserving the principle of parsimony. Here, we have used different types of data: real data collected in the monitoring networks, and data generated in parallel from physically based hydrological modelling (WiMMed). The results are robust, where the major limitation is the high computational cost by the recurrent and iterative method used. Some results of this chapter, were presented in Gulliver et al. (2014). In chapter 5 three medium-term time scale prediction experiments are performed. We base the first modelling experiment on a quarterly scale, where a hydrological time scheme determines the cumulative flow for specific time horizons. We start the scheme according to the relevant dates where hydrological planning takes place. It is validated that the forecasts are more prosperous after have been consumed the first six months of the hydrological year. Instead of the three months in which we carry out the evaluations. The observed input variables quantified in the water system are: cumulative stream flow, cumulative rainfall, cumulative snowfall values and atmospheric oscillations (AO). At the level of modelling with DD, this experience has shown the importance of combining mixed regression classification models instead of only regression models within static frameworks. In this manner, we reduce and narrow the space of possible solutions and, therefore, we optimised the predictive behaviour of the DD model. During the development of this exercise, we have also carried out a classification practice comparing three DD classifiers: Probabilistic Neural Network (PNN), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM). We see that the SVM behaves better than the others with our data. However, more research is still needed on classifiers in hydro-meteorological frameworks like ours, because of their variability. We showed this part of the doctoral thesis in Gulliver et al. (2016). In the second section of this chapter (Sec. 5.3), we carry out a rain forecast exercise on a monthly scale. To do so, we use BNN following the same construction method of the SVI model exposed in the previous chapter (Sec. Ref. Chapter 4), thus validating it in another time scale. However, the results in predictive terms are poor for this hydro-meteorological variable. This confirms the difficulty of predicting this variable from historical data and without the incorporation of dynamic tools. Thus, the need for complex hydrodynamic modelling for the prediction of this important variable is confirmed. On the other hand, this case serves to empirically infer the causality of the most relevant atmospheric oscillations in the points of study. From multiple simulations with the model-based approach it has been possible to establish which indices have a greater influence. In the last section of this chapter (Section 5.4), an exercise was carried out to predict the deviation or anomaly of rainfall and runoff indices for four time series representative of different locations within the Guadalfeo BR. In this case, we verified the suitability of seven statistical distributions to characterize the anomalies/deviations under Mediterranean conditions. Under this hypothesis, the indices that passed the Shapiro-Wilk normality test were modelled to analyse the capabilities of BNN to predict these indices at various time horizons. Here, predictions of negative phases (droughts or deficit periods) have been poor, and the behaviour of the models for positive phases (wet periods) has been more successful. Regarding the causal inference of IC and its possible influence on the study area, we found out how NAO and WEMO help forecasts for shorter time horizons, while MOI helps for longer cumulative time horizons/times. We have analysed the relevance of these atmospheric variables in each case where sometimes their introduction was convenient and sometimes not, following the rules of construction and detailing them in each case study. Throughout the work, the usefulness of mixed modelling approaches has been verified, using models based on observed data from the different monitoring networks with physical modelling for the reproduction of essential hydrological processes. With the proposed methodology, a positive influence of atmospheric oscillations has been observed for medium-term prediction within the study regions, finding no evidence for short-term predictions (daily scale). The final conclusions and the most important points for future work are presented in the chapter 6. Applications of this type of methods are currently necessary. They help us to establish relationships based on measured hydro-meteorological data and thus ”based on real data”, without hypothesizing any assumptions. These data-based experiences are very useful for limiting future uncertainty and optimizing water resources. The establishment of temporal relationships between different environmental agents allows us, through supervised methods, to establish causal relationships. From here a physical inference exercise is necessary to add coherence and establish a robust scientific exercise. The results obtained in this work, reaffirm the practicality of implementing this Data- Driven frameworks, in both the public and private spheres, being a good starting point for technology transfer. Most of the routines and models provided in this thesis, could be directly applied in Hydro-meteorological Services, or Decision Support Systems for water officials. This includes potential users as varied as public administrations and basin organisations, reservoir managers, energy companies that manage hydroelectric generation, irrigation communities, water bottling plants,... etc. The establishment of iterative and automatic frameworks for data processing and modelling, needs to be implemented, to make the most of the data collected in the water systems.Desde el inicio de los tiempos, se innova en el conocimiento y la tecnología de los sistemas hídricos e hidráulicos con el fin de conseguir una eficiente y correcta gestión de los mismos. En este proyecto, como hipótesis de partida, se van a aplicar diversas técnicas computacionales y conceptos de Inteligencia Artificial. Dado que el principal activo de estas aplicaciones son los datos, optamos por el término ”Data-Driven” (DD), ya que el término de Inteligencia Artificial puede causar confusión en los no expertos. Este es un campo en expansión en todos los aspectos de la ciencia y de la vida, donde al tiempo que se incrementan las capacidades de computación y de procesamiento, se incrementa la generación de datos. Ahí tenemos la tecnología 5G, o el internet de las cosas, donde el incremento exponencial del volumen de datos que se utilizan nos obliga a desarrollar marcos para el tratamiento y el análisis de los mismos. Los métodos DD tienen un enorme potencial para transformar nuestra habilidad de establecer un seguimiento supervisado y predecir estados de variables hidro-meteorológicas. Su aplicación provee claramente de beneficios, sin embargo realizar estos ejercicios requiere una práctica y un conocimiento específico. Por ello, es necesario un entendimiento más profundo de las capacidades y de las limitaciones de estas técnicas computacionales, dentro de nuestro campo de conocimiento y casos específicos. Por estos motivos, es esencial realizar experiencias ”hidro-informáticas” bajo este supuesto, identificando así que puntos son los más relevantes y a tener en cuenta en el desarrollo y la validación de estos modelos en condiciones o marcos más regionales. Para ello, trabajaremos con las series temporales recogidas en las diferentes redes de monitorización, con series resultantes de modelado hidro-meteorológico y con series de las oscilaciones atmosféricas más relevantes en la zona de estudio. El objetivo principal de este trabajo es el desarrollo y la validación de marcos metodológicos basados en datos. Para ello, se seleccionan puntos de interés, con el fin de desarrollar marcos hidro-meteorológicos ´útiles en la gestión y optimización de los recursos hídricos. En este supuesto, nos interesa ver la aplicabilidad práctica de estas herramientas de aprendizaje automático, machine learning, en condiciones mediterráneas y locales, donde los datos a veces son escasos o de baja calidad. En el primer capítulo (Cap.1) se realiza una introducción a la tesis doctoral, estableciendo los objetivos tanto generales como específicos, y la motivación de la tesis. Seguidamente se realiza a modo introductorio una descripción de los tres ejercicios fundamentales a realizar en el trabajo de investigación: Regresión, Clasificación y Optimización. Finalmente, se realiza una revisión del estado del arte de trabajos previos bajo condiciones climáticas mediterráneas y similares. El capítulo 2 presenta las zonas de estudio, analizando las características espacio-temporales de dos cuencas mediterráneas andaluzas situadas en el sureste español: río Guadalhorce (GH) y río Guadalfeo (GF). Son cuencas hidrográficas con unos patrones espaciotemporales altamente variables/heterogéneos. El primer sistema hidrológico, GH, contiene una zona de gran importancia socio-económica como es la ciudad de Málaga. El segundo, GF, al norte tiene situado el Parque Nacional de Sierra Nevada, coronado por el pico Mulhacén y desemboca a pocos kilómetros en la costa de Motril. Esto hace que este sea un sistema con grandes gradientes geo-morfológicos e hidro-meteorológicos. En ambas cuencas existen estructuras de regulación de gran interés para el desarrollo y estudio de su optimización. También se revisan las redes de monitorización disponibles en estas cuencas, y que agentes deben ser tenidos en cuenta para la consecución de los objetivos del presente trabajo. En la etapa de análisis de datos de este capítulo, nos centramos en la evolución espacio temporal del riesgo frente a las inundaciones en las desembocaduras de ambos sistemas hidrológicos al mar de Alborán. Se cuantifica el aumento del riesgo frente a inundaciones ante la intrusión de usos del suelo con altos costes en las zonas potencialmente inundables en estos ´últimos años, constatando así una mala práctica en la planificación del territorio dentro de la zona de estudio. También, en este capítulo se analizan los datos registrados con el fin de comprender la ocurrencia de avenidas en el río GH y su relación con los desembalses aguas arriba. En este análisis se pudo identificar, como ante algunos eventos pluviométricos extremos (> 100mm/24h), esta cuenca tiene limitaciones en la regulación, no pudiendo así mitigar los costes aguas abajo. Parte de los resultados obtenidos formaron parte del trabajo presentado en Egüen et al. (2015). Estos análisis nos permiten identificar la necesidad de una optimización temporal más precisa en la gestión de avenidas en este sistema hidrológico. Finalmente, realizamos un análisis de otro riesgo hidrológico importante como son las sequías, y cómo podemos representar este déficit hídrico mediante índices estandarizados, tanto para la pluviometría como para la escorrentía. En el capítulo 3 se analizan los diversos enfoques y metodologías para el modelado de series temporales hidro-meteorológicas. Los enfoques se exponen de forma antagonista entre las diferentes opciones de modelado que tenemos: caja negra vs. caja gris vs. caja blanca, paramétricos vs. no-paramétricos, estático vs. dinámico, lineal vs. no-lineal, frecuentista vs. bayesiano, único vs múltiple, entre otros..., enumerando las ventajas e inconvenientes de cada enfoque. Algunas ideas surgidas en esta parte de la investigación fueron expuestas en Herrero et al. (2014). Por otro lado, también se discuten los pasos de partición, gestión y transformación de los datos para una correcta aplicación de este tipo de métodos experimentales. Esto es de gran importancia, ya que parte del trabajo duro en la aplicación de este tipo de metodologías, proviene de la transformación de los datos para que los algoritmos y las funciones de transferencia funcionen correctamente. En la parte final de este capítulo, nos centramos en cómo evaluar y validar el comportamiento determinista y probabilístico mediante coeficientes evaluativos. En este punto, prestamos especial atención en evitar la utilización de coeficientes que enmascaren los resultados o muy generalistas, y por lo tanto nos centramos en aquellos que evalúan las capacidades predictivas y de precisión de los modelos. También se ha tenido en cuenta la parsimonia para los modelos basados en redes neuronales, ya que pueden caer fácilmente en una sobre-parametrización. El capítulo 4 expone trabajo puramente experimental, donde se realizan siete regresiones lluvia escorrentía a corto plazo, seis diarias y una horaria. Los casos de estudio corresponden a diversos puntos de interés dentro de las zonas de estudio, con importantes implicaciones en la gestión hidrológica. A escala horaria se analiza las capacidades de eficiencia y predictivas de la Regresión Lineal Múltiple (MLR) y Redes Neuronales Bayesianas (BNN) a diez horizontes temporales para el nivel del río Guadalhorce en el puente de Cártama. Se encontró que, para horizontes predictivos más cercanos, un enfoque más sencillo como puede ser el lineal (MLR), puede superar a uno con mayores capacidades predictivas a priori, como pueden ser uno no lineal (BNN). Simplificando así, el desarrollo y la implementación de este tipo de técnicas computacionales bajo este tipo de marcos hidrológicos. Por otro lado, a escala diaria se establece un marco comparativo entre los dos modelos anteriores, MLR y BNN, y un método bayesiano completo: Procesos Gaussianos (GP). Esta técnica computacional, nos permite aplicar funciones de transferencia de diferente naturaleza bajo un único modelo. Esto es una ventaja con respecto a los otros dos modelos computacionales, ya que los resultados nos indican que a veces funcionan bien en un dominio, pero no funcionan bien en el contrario. Durante la construcción de los modelos, la selección de las variables de entrada se realiza de forma progresiva, mediante un método de prueba y error, donde se tienen en cuenta las mejoras significativas con respecto a la última estructura de predictores preservando el principio de parsimonia. Se han utilizado datos de diferente naturaleza: datos reales recogidos en las redes de monitorización y datos generados paralelamente de modalización hidrológica con base física (WiMMed). Los resultados son robustos donde la principal limitación es el alto coste computacional por el método recurrente e iterativo. Resultados de este capítulo fueron presentados en Gulliver et al. (2014). En el capítulo 5 se realizan tres

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    “How can Entrepreneurs lead Themselves?” Empirically-Based Development and Testing of Interventions for Healthy and Effective Self-Regulation in the Context of Entrepreneurship to Trigger Positive Individual and Collective Effects

    Get PDF
    Various studies identify self-regulation as being particularly challenging for entrepreneurs, who often have to lead themselves independently. If they use dysfunctional self-regulatory processes, they are exposed and rather unprotected to the high working demands of new venture creation. Not only does it imply negative consequences on the individual level, but also on the collective level, as entrepreneurs are recognized as engines for economic growth and ecologically sustainable development. Despite their need for guidance on healthy and effective self-regulation, relevant research is sparse and fragmented. This dissertation is intended to address the need for guidance on healthy and effective self-regulation for entrepreneurs. In the first two studies, a causal model of healthy and effective self-regulation that can be applied in the context of entrepreneurship has been empirically developed and tested. The work is based on a meta-theory of human motivation, called self-determination theory (SDT), which focuses on selfregulation. Structural equation modeling has been applied based on cross-sectional quantitative data (N=1,024). The results indicate that mindfulness, clarity about personal values, intrinsic values orientation, and autonomy of goals are potential psychological constructs to foster, in case healthy and effective self-regulation of individuals is intended. In the second study, a causal model as a knowledge base has been applied to empirically develop and test two interventions that foster the four psychological constructs in aspiring and practicing entrepreneurs. Both interventions are conducted as non-controlled field experiments with post-measurement in the form of two iterations (N1 = 55; N2 = 13) of the design science research approach. The first intervention is a self-assessment and action plan, called the Values Finder. The second intervention is a four-hour workshop block on personality development called Core Values Workshop. It is empirically validated that both interventions can be described as functional, efficient, and usable in the scope of the ISO evaluation standard 9126. Thus, they can be used as cutting-edge interventions to leverage entrepreneurs’ self-regulation, triggering positive individual and collective effects
    corecore