528 research outputs found
Effective Use Methods for Continuous Sensor Data Streams in Manufacturing Quality Control
This work outlines an approach for managing sensor data streams of continuous numerical data in product manufacturing settings, emphasizing statistical process control, low computational and memory overhead, and saving information necessary to reduce the impact of nonconformance to quality specifications. While there is extensive literature, knowledge, and documentation about standard data sources and databases, the high volume and velocity of sensor data streams often makes traditional analysis unfeasible. To that end, an overview of data stream fundamentals is essential. An analysis of commonly used stream preprocessing and load shedding methods follows, succeeded by a discussion of aggregation procedures. Stream storage and querying systems are the next topics. Further, existing machine learning techniques for data streams are presented, with a focus on regression. Finally, the work describes a novel methodology for managing sensor data streams in which data stream management systems save and record aggregate data from small time intervals, and individual measurements from the stream that are nonconforming. The aggregates shall be continually entered into control charts and regressed on. To conserve memory, old data shall be periodically reaggregated at higher levels to reduce memory consumption
Streamed Data Analysis Using Adaptable Bloom Filter
With the coming up of plethora of web applications and technologies like sensors, IoT, cloud computing, etc., the data generation resources have increased exponentially. Stream processing requires real time analytics of data in motion and that too in a single pass. This paper proposes a framework for hourly analysis of streamed data using Bloom filter, a probabilistic data structure where hashing is done by using a combination of double hashing and partition hashing; leading to less inter-hash function collision and decreased computational overhead. When size of incoming data is not known, use of Static Bloom filter leads to high collision rate if data flow is too much, and wastage of storage space if data is less. In such cases it is difficult to determine the optimal Bloom filter parameters (m, k) in advance, thus a target threshold for false positives (f_p) cannot be guaranteed. To accommodate the growing data size, one of the major requirements in Bloom filter is that filter size m should grow dynamically. For predicting the array size of Bloom filter Kalman filter has been used. It has been experimentally proved that proposed Adaptable Bloom Filter (ATBF) efficiently performs peak hour analysis, server utilization and reduces the time and space required for querying dynamic datasets
A survey on online active learning
Online active learning is a paradigm in machine learning that aims to select
the most informative data points to label from a data stream. The problem of
minimizing the cost associated with collecting labeled observations has gained
a lot of attention in recent years, particularly in real-world applications
where data is only available in an unlabeled form. Annotating each observation
can be time-consuming and costly, making it difficult to obtain large amounts
of labeled data. To overcome this issue, many active learning strategies have
been proposed in the last decades, aiming to select the most informative
observations for labeling in order to improve the performance of machine
learning models. These approaches can be broadly divided into two categories:
static pool-based and stream-based active learning. Pool-based active learning
involves selecting a subset of observations from a closed pool of unlabeled
data, and it has been the focus of many surveys and literature reviews.
However, the growing availability of data streams has led to an increase in the
number of approaches that focus on online active learning, which involves
continuously selecting and labeling observations as they arrive in a stream.
This work aims to provide an overview of the most recently proposed approaches
for selecting the most informative observations from data streams in the
context of online active learning. We review the various techniques that have
been proposed and discuss their strengths and limitations, as well as the
challenges and opportunities that exist in this area of research. Our review
aims to provide a comprehensive and up-to-date overview of the field and to
highlight directions for future work
Statistical Models for Querying and Managing Time-Series Data
In recent years we are experiencing a dramatic increase in the amount of available time-series data. Primary sources of time-series data are sensor networks, medical monitoring, financial applications, news feeds and social networking applications. Availability of large amount of time-series data calls for scalable data management techniques that enable efficient querying and analysis of such data in real-time and archival settings. Often the time-series data generated from sensors (environmental, RFID, GPS, etc.), are imprecise and uncertain in nature. Thus, it is necessary to characterize this uncertainty for producing clean answers. In this thesis we propose methods that address these important issues pertaining to time-series data. Particularly, this thesis is centered around the following three topics: Computing Statistical Measures on Large Time-Series Datasets. Computing statistical measures for large databases of time series is a fundamental primitive for querying and mining time-series data [31, 81, 97, 111, 132, 137]. This primitive is gaining importance with the increasing number and rapid growth of time-series databases. In Chapter 3, we introduce the Affinity framework for efficient computation of statistical measures by exploiting the concept of affine relationships [113, 114]. Affine relationships can be used to infer a large number of statistical measures for time series, from other related time series, instead of computing them directly; thus, reducing the overall computational cost significantly. Moreover, the Affinity framework proposes an unified approach for computing several statistical measures at once. Creating Probabilistic Databases from Imprecise Data. A large amount of time-series data produced in the real-world has an inherent element of uncertainty, arising due to the various sources of imprecision affecting its sources (like, sensor data, GPS trajectories, environmental monitoring data, etc.). The primary sources of imprecision in such data are: imprecise sensors, limited communication bandwidth, sensor failures, etc. Recently there has been an exponential rise in the number of such imprecise sensors, which has led to an explosion of imprecise data. Standard database techniques cannot be used to provide clean and consistent answers in such scenarios. Therefore, probabilistic databases that factor-in the inherent uncertainty and produce clean answers are required. An important assumption i while using probabilistic databases is that each data point has a probability distribution associated with it. This is not true in practice â the distributions are absent. As a solution to this fundamental limitation, in Chapter 4 we propose methods for inferring such probability distributions and using them for efficiently creating probabilistic databases [116]. Managing Participatory Sensing Data. Community-driven participatory sensing is a rapidly evolving paradigm in mobile geo-sensor networks. Here, sensors of various sorts (e.g., multi-sensor units monitoring air quality, cell phones, thermal watches, thermometers in vehicles, etc.) are carried by the community (public vehicles, private vehicles, or individuals) during their daily activities, collecting various types of data about their surrounding. Data generated by these devices is in large quantity, and geographically and temporally skewed. Therefore, it is important that systems designed for managing such data should be aware of these unique data characteristics. In Chapter 5, we propose the ConDense (Community-driven Sensing of the Environment) framework for managing and querying community-sensed data [5, 19, 115]. ConDense exploits spatial smoothness of environmental parameters (like, ambient pollution [5] or radiation [2]) to construct statistical models of the data. Since the number of constructed models is significantly smaller than the original data, we show that using our approach leads to dramatic increase in query processing efficiency [19, 115] and significantly reduces memory usage
Consensus-based Online Co-Calibration for Networks of Homogeneous Sensors in IIoT Environments under Consideration of Semantic Knowledge
Large scale sensor networks form an important part of the Industrial Internet of Things.
To maintain the operation of such networks over time, quality of the sensor readings needs to be ensured.
This leads to the development of a metrological traceable in-situ calibration method based on a Bayesian framework which leverages local sensor redundancy.
Furthermore, automation of such in-situ calibration tasks is a key feature.
To this end, an extension of existing sensor-related ontologies is proposed to cover relevant metrological terms.
Sensor self-descriptions based on these knowledge representations allow for support of in-situ calibration by finding suitable reference sensors and initialization the mathematical method presented here.
The mathematical method is evaluated in simulation studies against a state of the art in-situ calibration.
The evaluation results show good estimation performance in cases of time-depending input signals or sensors of comparable uncertainty levels, but also reveal higher computational costs.
The developed ontologies are evaluated by a corpus comparison, ontology metrics as well as logical checks of the taxonomic backbone and indicate a good agreement with existing ontology quality standards
Wireless sensor data processing for on-site emergency response
This thesis is concerned with the problem of processing data from Wireless Sensor Networks (WSNs) to meet the requirements of emergency responders (e.g. Fire and Rescue Services). A WSN typically consists of spatially distributed sensor nodes to cooperatively monitor the physical or environmental conditions. Sensor data about the physical or environmental conditions can then be used as part of the input to predict, detect, and monitor emergencies. Although WSNs have demonstrated their great potential in facilitating Emergency Response, sensor data cannot be interpreted directly due to its large volume, noise, and redundancy. In addition, emergency responders are not interested in raw data, they are interested in the meaning it conveys. This thesis presents research on processing and combining data from multiple types of sensors, and combining sensor data with other relevant data, for the purpose of obtaining data of greater quality and information of greater relevance to emergency responders.
The current theory and practice in Emergency Response and the existing technology aids were reviewed to identify the requirements from both application and technology perspectives (Chapter 2). The detailed process of information extraction from sensor data and sensor data fusion techniques were reviewed to identify what constitutes suitable sensor data fusion techniques and challenges presented in sensor data processing (Chapter 3). A study of Incident Commandersâ requirements utilised a goal-driven task analysis method to identify gaps in current means of obtaining relevant information during response to fire emergencies and a list of opportunities for WSN technology to fill those gaps (Chapter 4). A high-level Emergency Information Management System Architecture was proposed, including the main components that are needed, the interaction between components, and system function specification at different incident stages (Chapter 5). A set of state-awareness rules was proposed, and integrated with Kalman Filter to improve the performance of filtering. The proposed data pre-processing approach achieved both improved outlier removal and quick detection of real events (Chapter 6). A data storage mechanism was proposed to support timely response to queries regardless of the increase in volume of data (Chapter 7). What can be considered as âmeaningâ (e.g. events) for emergency responders were identified and a generic emergency event detection model was proposed to identify patterns presenting in sensor data and associate patterns with events (Chapter 8). In conclusion, the added benefits that the technical work can provide to the current Emergency Response is discussed and specific contributions and future work are highlighted (Chapter 9)
- âŠ