11,357 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Data mining based cyber-attack detection

    Get PDF

    Data-driven design of intelligent wireless networks: an overview and tutorial

    Get PDF
    Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves

    Situational Awareness Support to Enhance Teamwork in Collaborative Environments

    Get PDF
    Modern collaborative environments often provide an overwhelming amount of visual information on multiple displays. The multitude of personal and shared interaction devices leads to lack of awareness of team members on ongoing activities, and awareness of who is in control of shared artefacts. This research addresses the situational awareness (SA) support of multidisciplinary teams in co-located collaborative environments. This work aims at getting insights into design and evaluation of large displays systems that afford SA and effective teamwork

    Autonomic care platform for optimizing query performance

    Get PDF
    Background: As the amount of information in electronic health care systems increases, data operations get more complicated and time-consuming. Intensive Care platforms require a timely processing of data retrievals to guarantee the continuous display of recent data of patients. Physicians and nurses rely on this data for their decision making. Manual optimization of query executions has become difficult to handle due to the increased amount of queries across multiple sources. Hence, a more automated management is necessary to increase the performance of database queries. The autonomic computing paradigm promises an approach in which the system adapts itself and acts as self-managing entity, thereby limiting human interventions and taking actions. Despite the usage of autonomic control loops in network and software systems, this approach has not been applied so far for health information systems. Methods: We extend the COSARA architecture, an infection surveillance and antibiotic management service platform for the Intensive Care Unit (ICU), with self-managed components to increase the performance of data retrievals. We used real-life ICU COSARA queries to analyse slow performance and measure the impact of optimizations. Each day more than 2 million COSARA queries are executed. Three control loops, which monitor the executions and take action, have been proposed: reactive, deliberative and reflective control loops. We focus on improvements of the execution time of microbiology queries directly related to the visual displays of patients' data on the bedside screens. Results: The results show that autonomic control loops are beneficial for the optimizations in the data executions in the ICU. The application of reactive control loop results in a reduction of 8.61% of the average execution time of microbiology results. The combined application of the reactive and deliberative control loop results in an average query time reduction of 10.92% and the combination of reactive, deliberative and reflective control loops provides a reduction of 13.04%. Conclusions: We found that by controlled reduction of queries' executions the performance for the end-user can be improved. The implementation of autonomic control loops in an existing health platform, COSARA, has a positive effect on the timely data visualization for the physician and nurse

    Web-based Geographical Visualization of Container Itineraries

    Get PDF
    Around 90% of the world cargo is transported in maritime containers, but only around 2% are physically inspected. This opens the possibility for illicit activities. A viable solution is to control containerized cargo through information-based risk analysis. Container route-based analysis has been considered a key factor in identifying potentially suspicious consignments. Essential part of itinerary analysis is the geographical visualization of the itinerary. In the present paper, we present initial work of a web-based system’s realization for interactive geographical visualization of container itinerary.JRC.G.4-Maritime affair

    Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

    Get PDF
    Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial

    Investigating the attainment of optimum data quality for EHR Big Data: proposing a new methodological approach

    Get PDF
    The value derivable from the use of data is continuously increasing since some years. Both commercial and non-commercial organisations have realised the immense benefits that might be derived if all data at their disposal could be analysed and form the basis of decision taking. The technological tools required to produce, capture, store, transmit and analyse huge amounts of data form the background to the development of the phenomenon of Big Data. With Big Data, the aim is to be able to generate value from huge amounts of data, often in non-structured format and produced extremely frequently. However, the potential value derivable depends on general level of governance of data, more precisely on the quality of the data. The field of data quality is well researched for traditional data uses but is still in its infancy for the Big Data context. This dissertation focused on investigating effective methods to enhance data quality for Big Data. The principal deliverable of this research is in the form of a methodological approach which can be used to optimize the level of data quality in the Big Data context. Since data quality is contextual, (that is a non-generalizable field), this research study focuses on applying the methodological approach in one use case, in terms of the Electronic Health Records (EHR). The first main contribution to knowledge of this study systematically investigates which data quality dimensions (DQDs) are most important for EHR Big Data. The two most important dimensions ascertained by the research methods applied in this study are accuracy and completeness. These are two well-known dimensions, and this study confirms that they are also very important for EHR Big Data. The second important contribution to knowledge is an investigation into whether Artificial Intelligence with a special focus upon machine learning could be used in improving the detection of dirty data, focusing on the two data quality dimensions of accuracy and completeness. Regression and clustering algorithms proved to be more adequate for accuracy and completeness related issues respectively, based on the experiments carried out. However, the limits of implementing and using machine learning algorithms for detecting data quality issues for Big Data were also revealed and discussed in this research study. It can safely be deduced from the knowledge derived from this part of the research study that use of machine learning for enhancing data quality issues detection is a promising area but not yet a panacea which automates this entire process. The third important contribution is a proposed guideline to undertake data repairs most efficiently for Big Data; this involved surveying and comparing existing data cleansing algorithms against a prototype developed for data reparation. Weaknesses of existing algorithms are highlighted and are considered as areas of practice which efficient data reparation algorithms must focus upon. Those three important contributions form the nucleus for a new data quality methodological approach which could be used to optimize Big Data quality, as applied in the context of EHR. Some of the activities and techniques discussed through the proposed methodological approach can be transposed to other industries and use cases to a large extent. The proposed data quality methodological approach can be used by practitioners of Big Data Quality who follow a data-driven strategy. As opposed to existing Big Data quality frameworks, the proposed data quality methodological approach has the advantage of being more precise and specific. It gives clear and proven methods to undertake the main identified stages of a Big Data quality lifecycle and therefore can be applied by practitioners in the area. This research study provides some promising results and deliverables. It also paves the way for further research in the area. Technical and technological changes in Big Data is rapidly evolving and future research should be focusing on new representations of Big Data, the real-time streaming aspect, and replicating same research methods used in this current research study but on new technologies to validate current results
    corecore