28 research outputs found

    An Interval Type-2 Fuzzy Association Rule Mining Approach to Pattern Discovery in Breast Cancer Dataset

    Get PDF
    In the literature, several methods explored to analyze breast cancer dataset have failed to sufficiently handle quantitative attribute sharp boundary problem to resolve inter and intra uncertainties in breast cancer dataset analysis. In this study an Interval Type-2 fuzzy association rule mining approach is proposed for pattern discovery in breast cancer dataset. In the first part of this analysis, the interval Type-2 fuzzification of the breast cancer dataset is carried out using Hao and Mendel approach. In the second part, FP-growth algorithm is adopted for associative pattern discovery from the fuzzified dataset from the first part. To define the intuitive words for breast cancer determinant factors and expert data interval, thirty (30) medical experts from specialized hospitals were consulted through questionnaire poling method. To establish the adequacy of the linguistic word defined by the expert, Jaccard similarity measure is used. This analysis is able to discover associative rules with minimum number of symptoms at confidence values as high as 91%. It also identifies High Bare Nuclei and High Uniformity of Cell Shape as strong determinant factors for diagnosing breast cancer. The proposed approach performed better in terms of rules generated when compared with traditional quantitative association rule mining. It is able to eliminate redundant rules which reduce the number of generated rules by 39.5% and memory usage by 22.6%. The discovered rules are viable in building a comprehensive and compact expert driven knowledge�base for breast cancer decision support or expert syste

    UM ESTUDO DE MAPEAMENTO SISTEMÁTICO DA MINERAÇÃO DE DADOS PARA CENÁRIOS DE BIG DATA

    Get PDF
    O volume de dados produzidos tem crescido em larga escala nos últimos anos. Esses dados são de diferentes fontes e diversificados formatos, caracterizando as principais dimensões do Big Data: grande volume, alta velocidade de crescimento e grande variedade de dados. O maior desafio é como gerar informação de qualidade para inferir insights significativos de tais dados variados e grandes. A Mineração de Dados é o processo de identificar, em dados, padrões válidos, novos e potencialmente úteis. No entanto, a infraestrutura de tecnologia da informação tradicional não é capaz de atender as demandas deste novo cenário. O termo atualmente conhecido como Big Data Mining refere-se à extração de informação a partir de grandes bases de dados. Uma questão a ser respondida é como a comunidade científica está abordando o processo de Big Data Mining? Seria válido identificar quais tarefas, métodos e algoritmos vêm sendo aplicados para extrair conhecimento neste contexto. Este artigo tem como objetivo identificar na literatura os trabalhos de pesquisa já realizados no contexto do Big Data Mining. Buscou-se identificar as áreas mais abordadas, os tipos de problemas tratados, as tarefas aplicadas na extração de conhecimento, os métodos aplicados para a realização das tarefas, os algoritmos para a implementação dos métodos, os tipos de dados que vêm sendo minerados, fonte e estrutura dos mesmos. Um estudo de mapeamento sistemático foi conduzido, foram examinados 78 estudos primários. Os resultados obtidos apresentam uma compreensão panorâmica da área investigada, revelando as principais tarefas, métodos e algoritmos aplicados no Big Data Mining

    Inferring Complex Activities for Context-aware Systems within Smart Environments

    Get PDF
    The rising ageing population worldwide and the prevalence of age-related conditions such as physical fragility, mental impairments and chronic diseases have significantly impacted the quality of life and caused a shortage of health and care services. Over-stretched healthcare providers are leading to a paradigm shift in public healthcare provisioning. Thus, Ambient Assisted Living (AAL) using Smart Homes (SH) technologies has been rigorously investigated to help address the aforementioned problems. Human Activity Recognition (HAR) is a critical component in AAL systems which enables applications such as just-in-time assistance, behaviour analysis, anomalies detection and emergency notifications. This thesis is aimed at investigating challenges faced in accurately recognising Activities of Daily Living (ADLs) performed by single or multiple inhabitants within smart environments. Specifically, this thesis explores five complementary research challenges in HAR. The first study contributes to knowledge by developing a semantic-enabled data segmentation approach with user-preferences. The second study takes the segmented set of sensor data to investigate and recognise human ADLs at multi-granular action level; coarse- and fine-grained action level. At the coarse-grained actions level, semantic relationships between the sensor, object and ADLs are deduced, whereas, at fine-grained action level, object usage at the satisfactory threshold with the evidence fused from multimodal sensor data is leveraged to verify the intended actions. Moreover, due to imprecise/vague interpretations of multimodal sensors and data fusion challenges, fuzzy set theory and fuzzy web ontology language (fuzzy-OWL) are leveraged. The third study focuses on incorporating uncertainties caused in HAR due to factors such as technological failure, object malfunction, and human errors. Hence, existing studies uncertainty theories and approaches are analysed and based on the findings, probabilistic ontology (PR-OWL) based HAR approach is proposed. The fourth study extends the first three studies to distinguish activities conducted by more than one inhabitant in a shared smart environment with the use of discriminative sensor-based techniques and time-series pattern analysis. The final study investigates in a suitable system architecture with a real-time smart environment tailored to AAL system and proposes microservices architecture with sensor-based off-the-shelf and bespoke sensing methods. The initial semantic-enabled data segmentation study was evaluated with 100% and 97.8% accuracy to segment sensor events under single and mixed activities scenarios. However, the average classification time taken to segment each sensor events have suffered from 3971ms and 62183ms for single and mixed activities scenarios, respectively. The second study to detect fine-grained-level user actions was evaluated with 30 and 153 fuzzy rules to detect two fine-grained movements with a pre-collected dataset from the real-time smart environment. The result of the second study indicate good average accuracy of 83.33% and 100% but with the high average duration of 24648ms and 105318ms, and posing further challenges for the scalability of fusion rule creations. The third study was evaluated by incorporating PR-OWL ontology with ADL ontologies and Semantic-Sensor-Network (SSN) ontology to define four types of uncertainties presented in the kitchen-based activity. The fourth study illustrated a case study to extended single-user AR to multi-user AR by combining RFID tags and fingerprint sensors discriminative sensors to identify and associate user actions with the aid of time-series analysis. The last study responds to the computations and performance requirements for the four studies by analysing and proposing microservices-based system architecture for AAL system. A future research investigation towards adopting fog/edge computing paradigms from cloud computing is discussed for higher availability, reduced network traffic/energy, cost, and creating a decentralised system. As a result of the five studies, this thesis develops a knowledge-driven framework to estimate and recognise multi-user activities at fine-grained level user actions. This framework integrates three complementary ontologies to conceptualise factual, fuzzy and uncertainties in the environment/ADLs, time-series analysis and discriminative sensing environment. Moreover, a distributed software architecture, multimodal sensor-based hardware prototypes, and other supportive utility tools such as simulator and synthetic ADL data generator for the experimentation were developed to support the evaluation of the proposed approaches. The distributed system is platform-independent and currently supported by an Android mobile application and web-browser based client interfaces for retrieving information such as live sensor events and HAR results

    Sviluppo di un metodo innovativo per la misura del comfort termico attraverso il monitoraggio di parametri fisiologici e ambientali in ambienti indoor

    Get PDF
    openLa misura del comfort termico in ambienti indoor è un argomento di interesse per la comunità scientifica, poiché il comfort termico incide profondamente sul benessere degli utenti ed inoltre, per garantire condizioni di comfort ottimali, gli edifici devono affrontare costi energetici elevati. Anche se esistono norme nel campo dell'ergonomia del comfort che forniscono linee guida per la valutazione del comfort termico, può succedere che in contesti reali sia molto difficile ottenere una misurazione accurata. Pertanto, per migliorare la misura del comfort termico negli edifici, la ricerca si sta concentrando sulla valutazione dei parametri personali e fisiologici legati al comfort termico, per creare ambienti su misura per l’utente. Questa tesi presenta diversi contributi riguardo questo argomento. Infatti, in questo lavoro di ricerca, sono stati implementati una serie di studi per sviluppare e testare procedure di misurazione in grado di valutare quantitativamente il comfort termico umano, tramite parametri ambientali e fisiologici, per catturare le peculiarità che esistono tra i diversi utenti. In primo luogo, è stato condotto uno studio in una camera climatica controllata, con un set di sensori invasivi utilizzati per la misurazione dei parametri fisiologici. L'esito di questa ricerca è stato utile per ottenere una prima accuratezza nella misurazione del comfort termico dell'82%, ottenuta mediante algoritmi di machine learning (ML) che forniscono la sensazione termica (TSV) utilizzando la variabilità della frequenza cardiaca (HRV) , parametro che la letteratura ha spesso riportato legato sia al comfort termico dell'utenza che alle grandezze ambientali. Questa ricerca ha dato origine a uno studio successivo in cui la valutazione del comfort termico è stata effettuata utilizzando uno smartwatch minimamente invasivo per la raccolta dell’HRV. Questo secondo studio consisteva nel variare le condizioni ambientali di una stanza semi-controllata, mentre i partecipanti potevano svolgere attività di ufficio ma in modo limitato, ovvero evitando il più possibile i movimenti della mano su cui era indossato lo smartwatch. Con questa configurazione, è stato possibile stabilire che l'uso di algoritmi di intelligenza artificiale (AI) e il set di dati eterogeneo creato aggregando parametri ambientali e fisiologici, può fornire una misura di TSV con un errore medio assoluto (MAE) di 1.2 e un errore percentuale medio assoluto (MAPE) del 20%. Inoltre, tramite il Metodo Monte Carlo (MCM) è stato possibile calcolare l'impatto delle grandezze in ingresso sul calcolo del TSV. L'incertezza più alta è stata raggiunta a causa dell'incertezza nella misura della temperatura dell'aria (U = 14%) e dell'umidità relativa (U = 10,5%). L'ultimo contributo rilevante ottenuto con questa ricerca riguarda la misura del comfort termico in ambiente reale, semi controllato, in cui il partecipante non è stato costretto a limitare i propri movimenti. La temperatura della pelle è stata inclusa nel set-up sperimentale, per migliorare la misurazione del TSV. I risultati hanno mostrato che l'inclusione della temperatura della pelle per la creazione di modelli personalizzati, realizzati utilizzando i dati provenienti dal singolo partecipante, porta a risultati soddisfacenti (MAE = 0,001±0,0003 e MAPE = 0,02%±0,09%). L'approccio più generalizzato, invece, che consiste nell'addestrare gli algoritmi sull'intero gruppo di partecipanti tranne uno, e utilizzare quello tralasciato per il test, fornisce prestazioni leggermente inferiori (MAE = 1±0.2 e MAPE = 25% ±6%). Questo risultato evidenzia come in condizioni semi-controllate, la previsione di TSV utilizzando la temperatura della pelle e l'HRV possa essere eseguita con un certo grado di incertezza.Measuring human thermal comfort in indoor environments is a topic of interest in the scientific community, since thermal comfort deeply affects the well-being of occupants and furthermore, to guarantee optimal comfort conditions, buildings must face high energy costs. Even if there are standards in the field of the ergonomics of the thermal environment that provide guidelines for thermal comfort assessment, it can happen that in real-world settings it is very difficult to obtain an accurate measurement. Therefore, to improve the measurement of thermal comfort of occupants in buildings, research is focusing on the assessment of personal and physiological parameters related to thermal comfort, to create environments carefully tailored to the occupant that lives in it. This thesis presents several contributions to this topic. In fact, in the following research work, a set of studies were implemented to develop and test measurement procedures capable of quantitatively assessing human thermal comfort, by means of environmental and physiological parameters, to capture peculiarities among different occupants. Firstly, it was conducted a study in a controlled climatic chamber with an invasive set of sensors used for measuring physiological parameters. The outcome of this research was helpful to achieve a first accuracy in the measurement of thermal comfort of 82%, obtained by training machine learning (ML) algorithms that provide the thermal sensation vote (TSV) by means of environmental quantities and heart rate variability (HRV), a parameter that literature has often reported being related to both users' thermal comfort. This research gives rise to a subsequent study in which thermal comfort assessment was made by using a minimally invasive smartwatch for collecting HRV. This second study consisted in varying the environmental conditions of a semi-controlled test-room, while participants could carry out light-office activities but in a limited way, i.e. avoiding the movements of the hand on which the smartwatch was worn as much as possible. With this experimental setup, it was possible to establish that the use of artificial intelligence (AI) algorithms (such as random forest or convolutional neural networks) and the heterogeneous dataset created by aggregating environmental and physiological parameters, can provide a measure of TSV with a mean absolute error (MAE) of 1.2 and a mean absolute percentage error (MAPE) of 20%. In addition, by using of Monte Carlo Method (MCM), it was possible to compute the impact of the uncertainty of the input quantities on the computation of the TSV. The highest uncertainty was reached due to the air temperature uncertainty (U = 14%) and relative humidity (U = 10.5%). The last relevant contribution obtained with this research work concerns the measurement of thermal comfort in a real-life setting, semi-controlled environment, in which the participant was not forced to limit its movements. Skin temperature was included in the experimental set-up, to improve the measurement of TSV. The results showed that the inclusion of skin temperature for the creation of personalized models, made by using data coming from the single participant brings satisfactory results (MAE = 0.001±0.0003 and MAPE = 0.02%±0.09%). On the other hand, the more generalized approach, which consists in training the algorithms on the whole bunch of participants except one, and using the one left out for the test, provides slightly lower performances (MAE = 1±0.2 and MAPE = 25%±6%). This result highlights how in semi-controlled conditions, the prediction of TSV using skin temperature and HRV can be performed with acceptable accuracy.INGEGNERIA INDUSTRIALEembargoed_20220321Morresi, Nicol

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    An intelligent decision support system for acute lymphoblastic leukaemia detection

    Get PDF
    The morphological analysis of blood smear slides by haematologists or haematopathologists is one of the diagnostic procedures available to evaluate the presence of acute leukaemia. This operation is a complex and costly process, and often lacks standardized accuracy owing to a variety of factors, including insufficient expertise and operator fatigue. This research proposes an intelligent decision support system for automatic detection of acute lymphoblastic leukaemia (ALL) using microscopic blood smear images to overcome the above barrier. The work has four main key stages. (1) Firstly, a modified marker-controlled watershed algorithm integrated with the morphological operations is proposed for the segmentation of the membrane of the lymphocyte and lymphoblast cell images. The aim of this stage is to isolate a lymphocyte/lymphoblast cell membrane from touching and overlapping of red blood cells, platelets and artefacts of the microscopic peripheral blood smear sub-images. (2) Secondly, a novel clustering algorithm with stimulating discriminant measure (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of the nucleus and cytoplasm of lymphocytic cell membranes. The SDM measures are used in conjunction with Genetic Algorithm for the clustering of nucleus, cytoplasm, and background regions. (3) Thirdly, a total of eighty features consisting of shape, texture, and colour information from the nucleus and cytoplasm of the identified lymphocyte/lymphoblast images are extracted. (4) Finally, the proposed feature optimisation algorithm, namely a variant of Bare-Bones Particle Swarm Optimisation (BBPSO), is presented to identify the most significant discriminative characteristics of the nucleus and cytoplasm segmented by the SDM-based clustering algorithm. The proposed BBPSO variant algorithm incorporates Cuckoo Search, Dragonfly Algorithm, BBPSO, and local and global random walk operations of uniform combination, and Lévy flights to diversify the search and mitigate the premature convergence problem of the conventional BBPSO. In addition, it also employs subswarm concepts, self-adaptive parameters, and convergence degree monitoring mechanisms to enable fast convergence. The optimal feature subsets identified by the proposed algorithm are subsequently used for ALL detection and classification. The proposed system achieves the highest classification accuracy of 96.04% and significantly outperforms related meta-heuristic search methods and related research for ALL detection
    corecore