5,631 research outputs found

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

    A survey on online active learning

    Full text link
    Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work

    Incremental algorithm for Decision Rule generation in data stream contexts

    Get PDF
    Actualmente, la ciencia de datos está ganando mucha atención en diferentes sectores. Concretamente en la industria, muchas aplicaciones pueden ser consideradas. Utilizar técnicas de ciencia de datos en el proceso de toma de decisiones es una de esas aplicaciones que pueden aportar valor a la industria. El incremento de la disponibilidad de los datos y de la aparición de flujos continuos en forma de data streams hace emerger nuevos retos a la hora de trabajar con datos cambiantes. Este trabajo presenta una propuesta innovadora, Incremental Decision Rules Algorithm (IDRA), un algoritmo que, de manera incremental, genera y modifica reglas de decisión para entornos de data stream para incorporar cambios que puedan aparecer a lo largo del tiempo. Este método busca proponer una nueva estructura de reglas que busca mejorar el proceso de toma de decisiones, planteando una base de conocimiento descriptiva y transparente que pueda ser integrada en una herramienta decisional. Esta tesis describe la lógica existente bajo la propuesta de IDRA, en todas sus versiones, y propone una variedad de experimentos para compararlas con un método clásico (CREA) y un método adaptativo (VFDR). Conjuntos de datos reales, juntamente con algunos escenarios simulados con diferentes tipos y ratios de error, se utilizan para comparar estos algoritmos. El estudio prueba que IDRA, específicamente la versión reactiva de IDRA (RIDRA), mejora la precisión de VFDR y CREA en todos los escenarios, tanto reales como simulados, a cambio de un incremento en el tiempo.Nowadays, data science is earning a lot of attention in many different sectors. Specifically in the industry, many applications might be considered. Using data science techniques in the decision-making process is a valuable approach among the mentioned applications. Along with this, the growth of data availability and the appearance of continuous data flows in the form of data stream arise other challenges when dealing with changing data. This work presents a novel proposal of an algorithm, Incremental Decision Rules Algorithm (IDRA), that incrementally generates and modify decision rules for data stream contexts to incorporate the changes that could appear over time. This method aims to propose new rule structures that improve the decision-making process by providing a descriptive and transparent base of knowledge that could be integrated in a decision tool. This work describes the logic underneath IDRA, in all its versions, and proposes a variety of experiments to compare them with a classical method (CREA) and an adaptive method (VFDR). Some real datasets, together with some simulated scenarios with different error types and rates are used to compare these algorithms. The study proved that IDRA, specifically the reactive version of IDRA (RIDRA), improves the accuracies of VFDR and CREA in all the studied scenarios, both real and simulated, in exchange of more time

    Flood Frequency Analysis of Partial Duration Series Using Soft Computing Techniques for Mahanadi River Basin in India

    Get PDF
    In flood frequency analysis, the modeling based on Annual Maximum Flood (AMF) series remains the most popular approach. An alternative approach based on the “partial duration series (PDS) or peaks over threshold (POT)” has been considered in recent years, which captures more information about extreme events by fixing appropriate threshold values. The PDS approach has lot of advantages, (i) it consist more peak events by selecting the appropriate threshold hence to capture more information regarding the flood phenomena. (ii) it analyses both, the time of arrival and the magnitude of peaks, (iii) it provides extra flexibility in the demonstration of floods and a complete explanation of the flood generating process. However, the PDS approach remains underused and unpopular due to the nonexistence of general framework regarding different approaches.The first objective of the present research work is to develop a framework in the above question on selection of an appropriate threshold value using different concepts and, to verify the independency and stationarity criteria of the extreme events for the modeling of the PDS in the Mahanadi river system, India. For the analysis, daily discharge data from 22 stations with record length varying between 10 and 41 years have been used with the assumption that the whole basin is homogeneous in nature. The results confirmed that the Generalized Pareto (GP) best described the PDS in the study area and also, show that the best PDS/GP performance is found in almost all the value of λ (2, 2.5 and 3). In the second phase, the analysis is done to carry out the regional flood frequency analysis in the Mahanadi basin and to apply the developed model to the respective homogeneous region. Regionalization is the best viable way of improving flood quantile estimation. In the regional flood frequency analysis, selection of basin characteristics, morphology, land use and hydrology have significant role in finding the homogeneous regions. In this work the Mahanadi basin is divided into homogeneous regions by using fifteen effective variables initially. However, it has been observed that the whole basin is not hydro meteorologically homogeneous. Therefore, Factor analysis has been introduced in finding suitable number of variables, and nine variables are found suitable for analysis. Hierarchical (HC) and K-Means Clustering (KM) techniques are used for finding out the possible number of clusters. Here, again the Generalized Pareto (GP) distribution best described the PDS in the study area. To test the homogeneity and to identify the best-fit frequency distribution, regional L-moment algorithm is used. A unique regional flood frequency curve is developed which can estimate the flood quantiles in ungauged catchments and an index flood is also specified concerning the catchment characteristics by using the multiple linear regression approach

    Predicting recurring concepts on data-streams by me ans of a meta-model and a fuzzy similarity function

    Get PDF
    Meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems(IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. In this paper we present MM-PRec, a meta-model for predicting recurring concepts on data-streams which main goal is to predict when the drift is going to occur together with the best model to be used in case of a recurring concept. To fulfill this goal, MM-PRec trains a Hidden Markov Model (HMM) from the instances that appear during the concept drift. The learning process of the base classification learner feeds the meta-model with all the information needed to predict recurrent or similar situations. Thus, the models predicted together with the associated contextual information are stored. In our approach we also propose to use a fuzzy similarity function to decide which is the best model to represent a particular context when drift is detected. The experiments performed show that MM-PRec outperforms the behaviour of other context-aware algorithms in terms of training instances needed, specially in environments characterized by the presence of gradual drifts

    Frequency Analysis of Droughts Using Stochastic and Soft Computing Techniques

    Get PDF
    In the Canadian Prairies recurring droughts are one of the realities which can have significant economical, environmental, and social impacts. For example, droughts in 1997 and 2001 cost over $100 million on different sectors. Drought frequency analysis is a technique for analyzing how frequently a drought event of a given magnitude may be expected to occur. In this study the state of the science related to frequency analysis of droughts is reviewed and studied. The main contributions of this thesis include development of a model in Matlab which uses the qualities of Fuzzy C-Means (FCMs) clustering and corrects the formed regions to meet the criteria of effective hydrological regions. In FCM each site has a degree of membership in each of the clusters. The algorithm developed is flexible to get number of regions and return period as inputs and show the final corrected clusters as output for most case scenarios. While drought is considered a bivariate phenomena with two statistical variables of duration and severity to be analyzed simultaneously, an important step in this study is increasing the complexity of the initial model in Matlab to correct regions based on L-comoments statistics (as apposed to L-moments). Implementing a reasonably straightforward approach for bivariate drought frequency analysis using bivariate L-comoments and copula is another contribution of this study. Quantile estimation at ungauged sites for return periods of interest is studied by introducing two new classes of neural network and machine learning: Radial Basis Function (RBF) and Support Vector Machine Regression (SVM-R). These two techniques are selected based on their good reviews in literature in function estimation and nonparametric regression. The functionalities of RBF and SVM-R are compared with traditional nonlinear regression (NLR) method. As well, a nonlinear regression with regionalization method in which catchments are first regionalized using FCMs is applied and its results are compared with the other three models. Drought data from 36 natural catchments in the Canadian Prairies are used in this study. This study provides a methodology for bivariate drought frequency analysis that can be practiced in any part of the world
    corecore