45 research outputs found

    Naive possibilistic classifiers for imprecise or uncertain numerical data

    Get PDF
    International audienceIn real-world problems, input data may be pervaded with uncertainty. In this paper, we investigate the behavior of naive possibilistic classifiers, as a counterpart to naive Bayesian ones, for dealing with classification tasks in the presence of uncertainty. For this purpose, we extend possibilistic classifiers, which have been recently adapted to numerical data, in order to cope with uncertainty in data representation. Here the possibility distributions that are used are supposed to encode the family of Gaussian probabilistic distributions that are compatible with the considered dataset. We consider two types of uncertainty: (i) the uncertainty associated with the class in the training set, which is modeled by a possibility distribution over class labels, and (ii) the imprecision pervading attribute values in the testing set represented under the form of intervals for continuous data. Moreover, the approach takes into account the uncertainty about the estimation of the Gaussian distribution parameters due to the limited amount of data available. We first adapt the possibilistic classification model, previously proposed for the certain case, in order to accommodate the uncertainty about class labels. Then, we propose an algorithm based on the extension principle to deal with imprecise attribute values. The experiments reported show the interest of possibilistic classifiers for handling uncertainty in data. In particular, the probability-to-possibility transform-based classifier shows a robust behavior when dealing with imperfect data

    Possibilistic classifiers for numerical data

    Get PDF
    International audienceNaive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representation of these data. Naive Possibilistic Classifiers (NPC), based on possibility theory, have been recently proposed as a counterpart of Bayesian classifiers to deal with classification tasks. There are only few works that treat possibilistic classification and most of existing NPC deal only with categorical attributes. This work focuses on the estimation of possibility distributions for continuous data. In this paper we investigate two kinds of possibilistic classifiers. The first one is derived from classical or flexible Bayesian classifiers by applying a probability–possibility transformation to Gaussian distributions, which introduces some further tolerance in the description of classes. The second one is based on a direct interpretation of data in possibilistic formats that exploit an idea of proximity between data values in different ways, which provides a less constrained representation of them. We show that possibilistic classifiers have a better capability to detect new instances for which the classification is ambiguous than Bayesian classifiers, where probabilities may be poorly estimated and illusorily precise. Moreover, we propose, in this case, an hybrid possibilistic classification approach based on a nearest-neighbour heuristics to improve the accuracy of the proposed possibilistic classifiers when the available information is insufficient to choose between classes. Possibilistic classifiers are compared with classical or flexible Bayesian classifiers on a collection of benchmarks databases. The experiments reported show the interest of possibilistic classifiers. In particular, flexible possibilistic classifiers perform well for data agreeing with the normality assumption, while proximity-based possibilistic classifiers outperform others in the other cases. The hybrid possibilistic classification exhibits a good ability for improving accuracy

    Water filtration by using apple and banana peels as activated carbon

    Get PDF
    Water filter is an important devices for reducing the contaminants in raw water. Activated from charcoal is used to absorb the contaminants. Fruit peels are some of the suitable alternative carbon to substitute the charcoal. Determining the role of fruit peels which were apple and banana peels powder as activated carbon in water filter is the main goal. Drying and blending the peels till they become powder is the way to allow them to absorb the contaminants. Comparing the results for raw water before and after filtering is the observation. After filtering the raw water, the reading for pH was 6.8 which is in normal pH and turbidity reading recorded was 658 NTU. As for the colour, the water becomes more clear compared to the raw water. This study has found that fruit peels such as banana and apple are an effective substitute to charcoal as natural absorbent

    Comparative study of state-of-the-art machine learning models for analytics-driven embedded systems

    Get PDF
    Analytics-driven embedded systems are gaining foothold faster than ever in the current digital era. The innovation of Internet of Things(IoT) has generated an entire ecosystem of devices, communicating and exchanging data automatically in an interconnected global network. The ability to efficiently process and utilize the enormous amount of data being generated from an ensemble of embedded devices like RFID tags, sensors etc., enables engineers to build smart real-world systems. Analytics-driven embedded system explores and processes the data in-situ or remotely to identify a pattern in the behavior of the system and in turn can be used to automate actions and embark decision making capability to a device. Designing an intelligent data processing model is paramount for reaping the benefits of data analytics, because a poorly designed analytics infrastructure would degrade the system’s performance and effectiveness. There are many different aspects of this data that make it a more complex and challenging analytics task and hence a suitable candidate for big data. Big data is mainly characterized by its high volume, hugely varied data types and high speed of data receipt; all these properties mandate the choice of correct data mining techniques to be used for designing the analytics model. Datasets with images like face recognition, satellite images would perform better with deep learning algorithms, time-series datasets like sensor data from wearable devices would give better results with clustering and supervised learning models. A regression model would suit best for a multivariate dataset like appliances energy prediction data, forest fire data etc. Each machine learning task has a varied range of algorithms which can be used in combination to create an intelligent data analysis model. In this study, a comprehensive comparative analysis was conducted using different datasets freely available on online machine learning repository, to analyze the performance of state-of-art machine learning algorithms. WEKA data mining toolkit was used to evaluate C4.5, Naïve Bayes, Random Forest, kNN, SVM and Multilayer Perceptron for classification models. Linear regression, Gradient Boosting Machine(GBM), Multilayer Perceptron, kNN, Random Forest and Support Vector Machines (SVM) were applied to dataset fit for regression machine learning. Datasets were trained and analyzed in different experimental setups and a qualitative comparative analysis was performed with k-fold Cross Validation(CV) and paired t-test in Weka experimenter

    Continuous Learning of the Structure of Bayesian Networks: A Mapping Study

    Get PDF
    Bayesian networks can be built based on knowledge, data, or both. Independent of the source of information used to build the model, inaccuracies might occur or the application domain might change. Therefore, there is a need to continuously improve the model during its usage. As new data are collected, algorithms to continuously incorporate the updated knowledge can play an essential role in this process. In regard to the continuous learning of the Bayesian network’s structure, the current solutions are based on its structural refinement or adaptation. Recent researchers aim to reduce complexity and memory usage, allowing to solve complex and large-scale practical problems. This study aims to identify and evaluate solutions for the continuous learning of the Bayesian network’s structures, as well as to outline related future research directions. Our attention remains on the structures because the accurate parameters are completely useless if the structure is not representative

    Development of an integrated decision support system for supporting offshore oil spill response in harsh environments

    Get PDF
    Offshore oil spills can lead to significantly negative impacts on socio-economy and constitute a direct hazard to the marine environment and human health. The response to an oil spill usually consists of a series of dynamic, time-sensitive, multi-faceted and complex processes subject to various constraints and challenges. In the past decades, many models have been developed mainly focusing on individual processes including oil weathering simulation, impact assessment, and clean-up optimization. However, to date, research on integration of offshore oil spill vulnerability analysis, process simulation and operation optimization is still lacking. Such deficiency could be more influential in harsh environments. It becomes noticeably critical and urgent to develop new methodologies and improve technical capacities of offshore oil spill responses. Therefore, this proposed research aims at developing an integrated decision support system for supporting offshore oil spill responses especially in harsh environments (DSS-OSRH). Such a DSS consists of offshore oil spill vulnerability analysis, response technologies screening, and simulation-optimization coupling. The uncertainties and/or dynamics have been quantitatively reflected throughout the modeling processes. First, a Monte Carlo simulation based two-stage adaptive resonance theory mapping (MC-TSAM) approach has been developed. A real-world case study was applied for offshore oil spill vulnerability index (OSVI) classification in the south coast of Newfoundland to demonstrate this approach. Furthermore, a Monte Carlo simulation based integrated rule-based fuzzy adaptive resonance theory mapping (MC-IRFAM) approach has been developed for screening and ranking for spill response and clean-up technologies. The feasibility of the MC-IRFAM was tested with a case of screening and ranking response technologies in an offshore oil spill event. A novel Monte Carlo simulation based dynamic mixed integer nonlinear programming (MC-DMINP) approach has also been developed for the simulation-optimization coupling in offshore oil spill responses. To demonstrate this approach, a case study was conducted in device allocation and oil recovery in an offshore oil spill event. Finally, the DSS-OSRH has been developed based on the integration of MC-TSAM, MC-IRFAM, AND MC-DSINP. To demonstrate its feasibility, a case study was conducted in the decision support during offshore oil spill response in the south coast of Newfoundland. The developed approaches and DSS are the first of their kinds to date targeting offshore oil spill responses. The novelty can be reflected from the following aspects: 1) an innovative MC-TSAM approach for offshore OSVI classification under complexity and uncertainty; 2) a new MC-IRFAM approach for oil spill response technologies classification and ranking with uncertain information; 3) a novel MC-DMINP simulation-optimization coupling approach for offshore oil spill response operation and resource allocation under uncertainty; and 4) an innovational DSS-OSRH which consists of the MC-TSAM, MC-IRFAM, MC-DMINP, supporting decision making throughout the offshore oil spill response processes. These methods are particularly suitable for offshore oil spill responses in harsh environments such as the offshore areas of Newfoundland and Labrador (NL). The research will also promote the understanding of the processes of oil transport and fate and the impacts to the affected offshore and shoreline area. The methodologies will be capable of providing modeling tools for other related areas that require timely and effective decisions under complexity and uncertainty

    Inferring Complex Activities for Context-aware Systems within Smart Environments

    Get PDF
    The rising ageing population worldwide and the prevalence of age-related conditions such as physical fragility, mental impairments and chronic diseases have significantly impacted the quality of life and caused a shortage of health and care services. Over-stretched healthcare providers are leading to a paradigm shift in public healthcare provisioning. Thus, Ambient Assisted Living (AAL) using Smart Homes (SH) technologies has been rigorously investigated to help address the aforementioned problems. Human Activity Recognition (HAR) is a critical component in AAL systems which enables applications such as just-in-time assistance, behaviour analysis, anomalies detection and emergency notifications. This thesis is aimed at investigating challenges faced in accurately recognising Activities of Daily Living (ADLs) performed by single or multiple inhabitants within smart environments. Specifically, this thesis explores five complementary research challenges in HAR. The first study contributes to knowledge by developing a semantic-enabled data segmentation approach with user-preferences. The second study takes the segmented set of sensor data to investigate and recognise human ADLs at multi-granular action level; coarse- and fine-grained action level. At the coarse-grained actions level, semantic relationships between the sensor, object and ADLs are deduced, whereas, at fine-grained action level, object usage at the satisfactory threshold with the evidence fused from multimodal sensor data is leveraged to verify the intended actions. Moreover, due to imprecise/vague interpretations of multimodal sensors and data fusion challenges, fuzzy set theory and fuzzy web ontology language (fuzzy-OWL) are leveraged. The third study focuses on incorporating uncertainties caused in HAR due to factors such as technological failure, object malfunction, and human errors. Hence, existing studies uncertainty theories and approaches are analysed and based on the findings, probabilistic ontology (PR-OWL) based HAR approach is proposed. The fourth study extends the first three studies to distinguish activities conducted by more than one inhabitant in a shared smart environment with the use of discriminative sensor-based techniques and time-series pattern analysis. The final study investigates in a suitable system architecture with a real-time smart environment tailored to AAL system and proposes microservices architecture with sensor-based off-the-shelf and bespoke sensing methods. The initial semantic-enabled data segmentation study was evaluated with 100% and 97.8% accuracy to segment sensor events under single and mixed activities scenarios. However, the average classification time taken to segment each sensor events have suffered from 3971ms and 62183ms for single and mixed activities scenarios, respectively. The second study to detect fine-grained-level user actions was evaluated with 30 and 153 fuzzy rules to detect two fine-grained movements with a pre-collected dataset from the real-time smart environment. The result of the second study indicate good average accuracy of 83.33% and 100% but with the high average duration of 24648ms and 105318ms, and posing further challenges for the scalability of fusion rule creations. The third study was evaluated by incorporating PR-OWL ontology with ADL ontologies and Semantic-Sensor-Network (SSN) ontology to define four types of uncertainties presented in the kitchen-based activity. The fourth study illustrated a case study to extended single-user AR to multi-user AR by combining RFID tags and fingerprint sensors discriminative sensors to identify and associate user actions with the aid of time-series analysis. The last study responds to the computations and performance requirements for the four studies by analysing and proposing microservices-based system architecture for AAL system. A future research investigation towards adopting fog/edge computing paradigms from cloud computing is discussed for higher availability, reduced network traffic/energy, cost, and creating a decentralised system. As a result of the five studies, this thesis develops a knowledge-driven framework to estimate and recognise multi-user activities at fine-grained level user actions. This framework integrates three complementary ontologies to conceptualise factual, fuzzy and uncertainties in the environment/ADLs, time-series analysis and discriminative sensing environment. Moreover, a distributed software architecture, multimodal sensor-based hardware prototypes, and other supportive utility tools such as simulator and synthetic ADL data generator for the experimentation were developed to support the evaluation of the proposed approaches. The distributed system is platform-independent and currently supported by an Android mobile application and web-browser based client interfaces for retrieving information such as live sensor events and HAR results
    corecore