1,457 research outputs found

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    Unsupervised tracking of time-evolving data streams and an application to short-term urban traffic flow forecasting

    Get PDF
    I am indebted to many people for their help and support I receive during my Ph.D. study and research at DIBRIS-University of Genoa. First and foremost, I would like to express my sincere thanks to my supervisors Prof.Dr. Masulli, and Prof.Dr. Rovetta for the invaluable guidance, frequent meetings, and discussions, and the encouragement and support on my way of research. I thanks all the members of the DIBRIS for their support and kindness during my 4 years Ph.D. I would like also to acknowledge the contribution of the projects Piattaforma per la mobili\ue0 Urbana con Gestione delle INformazioni da sorgenti eterogenee (PLUG-IN) and COST Action IC1406 High Performance Modelling and Simulation for Big Data Applications (cHiPSet). Last and most importantly, I wish to thanks my family: my wife Shaimaa who stays with me through the joys and pains; my daughter and son whom gives me happiness every-day; and my parents for their constant love and encouragement

    Data-driven Soft Sensors in the Process Industry

    Get PDF
    In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

    Tracking time evolving data streams for short-term traffic forecasting

    Get PDF
    YesData streams have arisen as a relevant topic during the last few years as an efficient method for extracting knowledge from big data. In the robust layered ensemble model (RLEM) proposed in this paper for short-term traffic flow forecasting, incoming traffic flow data of all connected road links are organized in chunks corresponding to an optimal time lag. The RLEM model is composed of two layers. In the first layer, we cluster the chunks by using the Graded Possibilistic c-Means method. The second layer is made up by an ensemble of forecasters, each of them trained for short-term traffic flow forecasting on the chunks belonging to a specific cluster. In the operational phase, as a new chunk of traffic flow data presented as input to the RLEM, its memberships to all clusters are evaluated, and if it is not recognized as an outlier, the outputs of all forecasters are combined in an ensemble, obtaining in this a way a forecasting of traffic flow for a short-term time horizon. The proposed RLEM model is evaluated on a synthetic data set, on a traffic flow data simulator and on two real-world traffic flow data sets. The model gives an accurate forecasting of the traffic flow rates with outlier detection and shows a good adaptation to non-stationary traffic regimes. Given its characteristics of outlier detection, accuracy, and robustness, RLEM can be fruitfully integrated in traffic flow management systems

    Variational inference for robust sequential learning of multilayered perceptron neural network

    Get PDF
    U radu je prikazan i izveden novi sekvencijalni algoritam za obučavanje višeslojnog perceptrona u prisustvu autlajera. Autlajeri predstavljaju značajan problem, posebno ukoliko sprovodimo sekvencijalno obučavanje ili obučavanje u realnom vremenu. Linearizovani Kalmanov filtar robustan na autlajere (LKF-RA), je statistički generativni model u kome je matrica kovarijansi šuma merenja modelovana kao stohastički proces, a apriorna informacija usvojena kao inverzna Višartova raspodela. Izvođenje svih jednakosti je bazirano na prvim principima Bajesovske metodologije. Da bi se rešio korak modifikacije primenjen je varijacioni metod, u kome rešenje problema tražimo u familiji raspodela odgovarajuće funkcionalne forme. Eksperimentalni rezultati primene LKF-RA, dobijeni korišćenjem stvarnih vremenskih serija, pokazuju da je LKF-RA bolji od konvencionalnog linearizovanog Kalmanovog filtra u smislu generisanja niže greške na test skupu podataka. Prosečna vrednost poboljšanja određena u eksperimentalnom procesu je 7%.We derive a new sequential learning algorithm for Multilayered Perceptron (MLP) neural network robust to outliers. Presence of outliers in data results in failure of the model especially if data processing is performed on-line or in real time. Extended Kalman filter robust to outliers (EKF-OR) is probabilistic generative model in which measurement noise covariance is modeled as stochastic process over the set of symmetric positive-definite matrices in which prior is given as inverse Wishart distribution. Derivation of expressions comes straight form first principles, within Bayesian framework. Analytical intractability of Bayes' update step is solved using Variational Inference (VI). Experimental results obtained using real world stochastic data show that MLP network trained with proposed algorithm achieves low error and average improvement rate of 7% when compared directly to conventional EKF learning algorithm

    Variational inference for robust sequential learning of multilayered perceptron neural network

    Get PDF
    U radu je prikazan i izveden novi sekvencijalni algoritam za obučavanje višeslojnog perceptrona u prisustvu autlajera. Autlajeri predstavljaju značajan problem, posebno ukoliko sprovodimo sekvencijalno obučavanje ili obučavanje u realnom vremenu. Linearizovani Kalmanov filtar robustan na autlajere (LKF-RA), je statistički generativni model u kome je matrica kovarijansi šuma merenja modelovana kao stohastički proces, a apriorna informacija usvojena kao inverzna Višartova raspodela. Izvođenje svih jednakosti je bazirano na prvim principima Bajesovske metodologije. Da bi se rešio korak modifikacije primenjen je varijacioni metod, u kome rešenje problema tražimo u familiji raspodela odgovarajuće funkcionalne forme. Eksperimentalni rezultati primene LKF-RA, dobijeni korišćenjem stvarnih vremenskih serija, pokazuju da je LKF-RA bolji od konvencionalnog linearizovanog Kalmanovog filtra u smislu generisanja niže greške na test skupu podataka. Prosečna vrednost poboljšanja određena u eksperimentalnom procesu je 7%.We derive a new sequential learning algorithm for Multilayered Perceptron (MLP) neural network robust to outliers. Presence of outliers in data results in failure of the model especially if data processing is performed on-line or in real time. Extended Kalman filter robust to outliers (EKF-OR) is probabilistic generative model in which measurement noise covariance is modeled as stochastic process over the set of symmetric positive-definite matrices in which prior is given as inverse Wishart distribution. Derivation of expressions comes straight form first principles, within Bayesian framework. Analytical intractability of Bayes' update step is solved using Variational Inference (VI). Experimental results obtained using real world stochastic data show that MLP network trained with proposed algorithm achieves low error and average improvement rate of 7% when compared directly to conventional EKF learning algorithm

    Big data-driven fuzzy cognitive map for prioritising IT service procurement in the public sector

    Get PDF
    YesThe prevalence of big data is starting to spread across the public and private sectors however, an impediment to its widespread adoption orientates around a lack of appropriate big data analytics (BDA) and resulting skills to exploit the full potential of big data availability. In this paper, we propose a novel BDA to contribute towards this void, using a fuzzy cognitive map (FCM) approach that will enhance decision-making thus prioritising IT service procurement in the public sector. This is achieved through the development of decision models that capture the strengths of both data analytics and the established intuitive qualitative approach. By taking advantages of both data analytics and FCM, the proposed approach captures the strength of data-driven decision-making and intuitive model-driven decision modelling. This approach is then validated through a decision-making case regarding IT service procurement in public sector, which is the fundamental step of IT infrastructure supply for publics in a regional government in the Russia federation. The analysis result for the given decision-making problem is then evaluated by decision makers and e-government expertise to confirm the applicability of the proposed BDA. In doing so, demonstrating the value of this approach in contributing towards robust public decision-making regarding IT service procurement.EU FP7 project Policy Compass (Project No. 612133
    corecore