5,631 research outputs found
Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks
Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin
A survey on online active learning
Online active learning is a paradigm in machine learning that aims to select
the most informative data points to label from a data stream. The problem of
minimizing the cost associated with collecting labeled observations has gained
a lot of attention in recent years, particularly in real-world applications
where data is only available in an unlabeled form. Annotating each observation
can be time-consuming and costly, making it difficult to obtain large amounts
of labeled data. To overcome this issue, many active learning strategies have
been proposed in the last decades, aiming to select the most informative
observations for labeling in order to improve the performance of machine
learning models. These approaches can be broadly divided into two categories:
static pool-based and stream-based active learning. Pool-based active learning
involves selecting a subset of observations from a closed pool of unlabeled
data, and it has been the focus of many surveys and literature reviews.
However, the growing availability of data streams has led to an increase in the
number of approaches that focus on online active learning, which involves
continuously selecting and labeling observations as they arrive in a stream.
This work aims to provide an overview of the most recently proposed approaches
for selecting the most informative observations from data streams in the
context of online active learning. We review the various techniques that have
been proposed and discuss their strengths and limitations, as well as the
challenges and opportunities that exist in this area of research. Our review
aims to provide a comprehensive and up-to-date overview of the field and to
highlight directions for future work
Incremental algorithm for Decision Rule generation in data stream contexts
Actualmente, la ciencia de datos está ganando mucha atención en diferentes sectores.
Concretamente en la industria, muchas aplicaciones pueden ser consideradas. Utilizar
técnicas de ciencia de datos en el proceso de toma de decisiones es una de esas
aplicaciones que pueden aportar valor a la industria. El incremento de la disponibilidad
de los datos y de la aparición de flujos continuos en forma de data streams hace
emerger nuevos retos a la hora de trabajar con datos cambiantes. Este trabajo presenta
una propuesta innovadora, Incremental Decision Rules Algorithm (IDRA), un
algoritmo que, de manera incremental, genera y modifica reglas de decisión para
entornos de data stream para incorporar cambios que puedan aparecer a lo largo del
tiempo. Este método busca proponer una nueva estructura de reglas que busca mejorar
el proceso de toma de decisiones, planteando una base de conocimiento descriptiva y
transparente que pueda ser integrada en una herramienta decisional. Esta tesis describe
la lógica existente bajo la propuesta de IDRA, en todas sus versiones, y propone una
variedad de experimentos para compararlas con un método clásico (CREA) y un
método adaptativo (VFDR). Conjuntos de datos reales, juntamente con algunos
escenarios simulados con diferentes tipos y ratios de error, se utilizan para comparar
estos algoritmos. El estudio prueba que IDRA, específicamente la versión reactiva de
IDRA (RIDRA), mejora la precisión de VFDR y CREA en todos los escenarios, tanto
reales como simulados, a cambio de un incremento en el tiempo.Nowadays, data science is earning a lot of attention in many different sectors.
Specifically in the industry, many applications might be considered. Using data
science techniques in the decision-making process is a valuable approach among the
mentioned applications. Along with this, the growth of data availability and the
appearance of continuous data flows in the form of data stream arise other challenges
when dealing with changing data. This work presents a novel proposal of an algorithm,
Incremental Decision Rules Algorithm (IDRA), that incrementally generates and
modify decision rules for data stream contexts to incorporate the changes that could
appear over time. This method aims to propose new rule structures that improve the
decision-making process by providing a descriptive and transparent base of knowledge
that could be integrated in a decision tool. This work describes the logic underneath
IDRA, in all its versions, and proposes a variety of experiments to compare them with
a classical method (CREA) and an adaptive method (VFDR). Some real datasets,
together with some simulated scenarios with different error types and rates are used to
compare these algorithms. The study proved that IDRA, specifically the reactive
version of IDRA (RIDRA), improves the accuracies of VFDR and CREA in all the
studied scenarios, both real and simulated, in exchange of more time
Flood Frequency Analysis of Partial Duration Series Using Soft Computing Techniques for Mahanadi River Basin in India
In flood frequency analysis, the modeling based on Annual Maximum Flood (AMF) series remains the most popular approach. An alternative approach based on the “partial duration series (PDS) or peaks over threshold (POT)” has been considered in recent years, which captures more information about extreme events by fixing appropriate threshold values. The PDS approach has lot of advantages, (i) it consist more peak events by selecting the appropriate threshold hence to capture more information regarding the flood phenomena. (ii) it analyses both, the time of arrival and the magnitude of peaks, (iii) it provides extra flexibility in the demonstration of floods and a complete explanation of the flood generating process. However, the PDS approach remains underused and unpopular due to the nonexistence of general framework regarding different approaches.The first objective of the present research work is to develop a framework in the above question on selection of an appropriate threshold value using different concepts and, to verify the independency and stationarity criteria of the extreme events for the modeling of the PDS in the Mahanadi river system, India. For the analysis, daily discharge data from 22 stations with record length varying between 10 and 41 years have been used with the assumption that the whole basin is homogeneous in nature. The results confirmed that the Generalized Pareto (GP) best described the PDS in the study area and also, show that the best PDS/GP performance is found in almost all the value of λ (2, 2.5 and 3). In the second phase, the analysis is done to carry out the regional flood frequency analysis in the Mahanadi basin and to apply the developed model to the respective homogeneous region. Regionalization is the best viable way of improving flood quantile estimation. In the regional flood frequency analysis, selection of basin characteristics, morphology, land use and hydrology have significant role in finding the homogeneous regions. In this work the Mahanadi basin is divided into homogeneous regions by using fifteen effective variables initially. However, it has been observed that the whole basin is not hydro meteorologically homogeneous. Therefore, Factor analysis has been introduced in finding suitable number of variables, and nine variables are found suitable for analysis. Hierarchical (HC) and K-Means Clustering (KM) techniques are used for finding out the possible number of clusters. Here, again the Generalized Pareto (GP) distribution best described the PDS in the study area. To test the homogeneity and to identify the best-fit frequency distribution, regional L-moment algorithm is used. A unique regional flood frequency curve is developed which can estimate the flood quantiles in ungauged catchments and an index flood is also specified concerning the catchment characteristics by using the multiple linear regression approach
Predicting recurring concepts on data-streams by me ans of a meta-model and a fuzzy similarity function
Meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems(IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. In this paper we present MM-PRec, a meta-model for predicting recurring concepts on data-streams which main goal is to predict when the drift is going to occur together with the best model to be used in case of a recurring concept. To fulfill this goal, MM-PRec trains a Hidden Markov Model (HMM) from the instances that appear during the concept drift. The learning process of the base classification learner feeds the meta-model with all the information needed to predict recurrent or similar situations. Thus, the models predicted together with the associated contextual information are stored. In our approach we also propose to use a fuzzy similarity function to decide which is the best model to represent a particular context when drift is detected. The experiments performed show that MM-PRec outperforms the behaviour of other context-aware algorithms in terms of training instances needed, specially in environments characterized by the presence of gradual drifts
Frequency Analysis of Droughts Using Stochastic and Soft Computing Techniques
In the Canadian Prairies recurring droughts are one of the realities which can
have significant economical, environmental, and social impacts. For example,
droughts in 1997 and 2001 cost over $100 million on different sectors. Drought frequency
analysis is a technique for analyzing how frequently a drought event of a given
magnitude may be expected to occur. In this study the state of the science related
to frequency analysis of droughts is reviewed and studied. The main contributions
of this thesis include development of a model in Matlab which uses the qualities of
Fuzzy C-Means (FCMs) clustering and corrects the formed regions to meet the criteria
of effective hydrological regions. In FCM each site has a degree of membership in
each of the clusters. The algorithm developed is flexible to get number of regions and
return period as inputs and show the final corrected clusters as output for most case
scenarios. While drought is considered a bivariate phenomena with two statistical
variables of duration and severity to be analyzed simultaneously, an important step
in this study is increasing the complexity of the initial model in Matlab to correct
regions based on L-comoments statistics (as apposed to L-moments). Implementing
a reasonably straightforward approach for bivariate drought frequency analysis using
bivariate L-comoments and copula is another contribution of this study. Quantile estimation at ungauged sites for return periods of interest is studied by introducing two
new classes of neural network and machine learning: Radial Basis Function (RBF)
and Support Vector Machine Regression (SVM-R). These two techniques are selected
based on their good reviews in literature in function estimation and nonparametric
regression. The functionalities of RBF and SVM-R are compared with traditional
nonlinear regression (NLR) method. As well, a nonlinear regression with regionalization
method in which catchments are first regionalized using FCMs is applied and
its results are compared with the other three models. Drought data from 36 natural
catchments in the Canadian Prairies are used in this study. This study provides a
methodology for bivariate drought frequency analysis that can be practiced in any
part of the world
- …