528 research outputs found

    Effective Use Methods for Continuous Sensor Data Streams in Manufacturing Quality Control

    Get PDF
    This work outlines an approach for managing sensor data streams of continuous numerical data in product manufacturing settings, emphasizing statistical process control, low computational and memory overhead, and saving information necessary to reduce the impact of nonconformance to quality specifications. While there is extensive literature, knowledge, and documentation about standard data sources and databases, the high volume and velocity of sensor data streams often makes traditional analysis unfeasible. To that end, an overview of data stream fundamentals is essential. An analysis of commonly used stream preprocessing and load shedding methods follows, succeeded by a discussion of aggregation procedures. Stream storage and querying systems are the next topics. Further, existing machine learning techniques for data streams are presented, with a focus on regression. Finally, the work describes a novel methodology for managing sensor data streams in which data stream management systems save and record aggregate data from small time intervals, and individual measurements from the stream that are nonconforming. The aggregates shall be continually entered into control charts and regressed on. To conserve memory, old data shall be periodically reaggregated at higher levels to reduce memory consumption

    Streamed Data Analysis Using Adaptable Bloom Filter

    Get PDF
    With the coming up of plethora of web applications and technologies like sensors, IoT, cloud computing, etc., the data generation resources have increased exponentially. Stream processing requires real time analytics of data in motion and that too in a single pass. This paper proposes a framework for hourly analysis of streamed data using Bloom filter, a probabilistic data structure where hashing is done by using a combination of double hashing and partition hashing; leading to less inter-hash function collision and decreased computational overhead. When size of incoming data is not known, use of Static Bloom filter leads to high collision rate if data flow is too much, and wastage of storage space if data is less. In such cases it is difficult to determine the optimal Bloom filter parameters (m, k) in advance, thus a target threshold for false positives (f_p) cannot be guaranteed. To accommodate the growing data size, one of the major requirements in Bloom filter is that filter size m should grow dynamically. For predicting the array size of Bloom filter Kalman filter has been used. It has been experimentally proved that proposed Adaptable Bloom Filter (ATBF) efficiently performs peak hour analysis, server utilization and reduces the time and space required for querying dynamic datasets

    A survey on online active learning

    Full text link
    Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work

    Statistical Models for Querying and Managing Time-Series Data

    Get PDF
    In recent years we are experiencing a dramatic increase in the amount of available time-series data. Primary sources of time-series data are sensor networks, medical monitoring, financial applications, news feeds and social networking applications. Availability of large amount of time-series data calls for scalable data management techniques that enable efficient querying and analysis of such data in real-time and archival settings. Often the time-series data generated from sensors (environmental, RFID, GPS, etc.), are imprecise and uncertain in nature. Thus, it is necessary to characterize this uncertainty for producing clean answers. In this thesis we propose methods that address these important issues pertaining to time-series data. Particularly, this thesis is centered around the following three topics: Computing Statistical Measures on Large Time-Series Datasets. Computing statistical measures for large databases of time series is a fundamental primitive for querying and mining time-series data [31, 81, 97, 111, 132, 137]. This primitive is gaining importance with the increasing number and rapid growth of time-series databases. In Chapter 3, we introduce the Affinity framework for efficient computation of statistical measures by exploiting the concept of affine relationships [113, 114]. Affine relationships can be used to infer a large number of statistical measures for time series, from other related time series, instead of computing them directly; thus, reducing the overall computational cost significantly. Moreover, the Affinity framework proposes an unified approach for computing several statistical measures at once. Creating Probabilistic Databases from Imprecise Data. A large amount of time-series data produced in the real-world has an inherent element of uncertainty, arising due to the various sources of imprecision affecting its sources (like, sensor data, GPS trajectories, environmental monitoring data, etc.). The primary sources of imprecision in such data are: imprecise sensors, limited communication bandwidth, sensor failures, etc. Recently there has been an exponential rise in the number of such imprecise sensors, which has led to an explosion of imprecise data. Standard database techniques cannot be used to provide clean and consistent answers in such scenarios. Therefore, probabilistic databases that factor-in the inherent uncertainty and produce clean answers are required. An important assumption i while using probabilistic databases is that each data point has a probability distribution associated with it. This is not true in practice — the distributions are absent. As a solution to this fundamental limitation, in Chapter 4 we propose methods for inferring such probability distributions and using them for efficiently creating probabilistic databases [116]. Managing Participatory Sensing Data. Community-driven participatory sensing is a rapidly evolving paradigm in mobile geo-sensor networks. Here, sensors of various sorts (e.g., multi-sensor units monitoring air quality, cell phones, thermal watches, thermometers in vehicles, etc.) are carried by the community (public vehicles, private vehicles, or individuals) during their daily activities, collecting various types of data about their surrounding. Data generated by these devices is in large quantity, and geographically and temporally skewed. Therefore, it is important that systems designed for managing such data should be aware of these unique data characteristics. In Chapter 5, we propose the ConDense (Community-driven Sensing of the Environment) framework for managing and querying community-sensed data [5, 19, 115]. ConDense exploits spatial smoothness of environmental parameters (like, ambient pollution [5] or radiation [2]) to construct statistical models of the data. Since the number of constructed models is significantly smaller than the original data, we show that using our approach leads to dramatic increase in query processing efficiency [19, 115] and significantly reduces memory usage

    Consensus-based Online Co-Calibration for Networks of Homogeneous Sensors in IIoT Environments under Consideration of Semantic Knowledge

    Get PDF
    Large scale sensor networks form an important part of the Industrial Internet of Things. To maintain the operation of such networks over time, quality of the sensor readings needs to be ensured. This leads to the development of a metrological traceable in-situ calibration method based on a Bayesian framework which leverages local sensor redundancy. Furthermore, automation of such in-situ calibration tasks is a key feature. To this end, an extension of existing sensor-related ontologies is proposed to cover relevant metrological terms. Sensor self-descriptions based on these knowledge representations allow for support of in-situ calibration by finding suitable reference sensors and initialization the mathematical method presented here. The mathematical method is evaluated in simulation studies against a state of the art in-situ calibration. The evaluation results show good estimation performance in cases of time-depending input signals or sensors of comparable uncertainty levels, but also reveal higher computational costs. The developed ontologies are evaluated by a corpus comparison, ontology metrics as well as logical checks of the taxonomic backbone and indicate a good agreement with existing ontology quality standards

    Wireless sensor data processing for on-site emergency response

    Get PDF
    This thesis is concerned with the problem of processing data from Wireless Sensor Networks (WSNs) to meet the requirements of emergency responders (e.g. Fire and Rescue Services). A WSN typically consists of spatially distributed sensor nodes to cooperatively monitor the physical or environmental conditions. Sensor data about the physical or environmental conditions can then be used as part of the input to predict, detect, and monitor emergencies. Although WSNs have demonstrated their great potential in facilitating Emergency Response, sensor data cannot be interpreted directly due to its large volume, noise, and redundancy. In addition, emergency responders are not interested in raw data, they are interested in the meaning it conveys. This thesis presents research on processing and combining data from multiple types of sensors, and combining sensor data with other relevant data, for the purpose of obtaining data of greater quality and information of greater relevance to emergency responders. The current theory and practice in Emergency Response and the existing technology aids were reviewed to identify the requirements from both application and technology perspectives (Chapter 2). The detailed process of information extraction from sensor data and sensor data fusion techniques were reviewed to identify what constitutes suitable sensor data fusion techniques and challenges presented in sensor data processing (Chapter 3). A study of Incident Commanders’ requirements utilised a goal-driven task analysis method to identify gaps in current means of obtaining relevant information during response to fire emergencies and a list of opportunities for WSN technology to fill those gaps (Chapter 4). A high-level Emergency Information Management System Architecture was proposed, including the main components that are needed, the interaction between components, and system function specification at different incident stages (Chapter 5). A set of state-awareness rules was proposed, and integrated with Kalman Filter to improve the performance of filtering. The proposed data pre-processing approach achieved both improved outlier removal and quick detection of real events (Chapter 6). A data storage mechanism was proposed to support timely response to queries regardless of the increase in volume of data (Chapter 7). What can be considered as “meaning” (e.g. events) for emergency responders were identified and a generic emergency event detection model was proposed to identify patterns presenting in sensor data and associate patterns with events (Chapter 8). In conclusion, the added benefits that the technical work can provide to the current Emergency Response is discussed and specific contributions and future work are highlighted (Chapter 9)
    • 

    corecore