336 research outputs found

    Data Analytics and Performance Enhancement in Edge-Cloud Collaborative Internet of Things Systems

    Get PDF
    Based on the evolving communications, computing and embedded systems technologies, Internet of Things (IoT) systems can interconnect not only physical users and devices but also virtual services and objects, which have already been applied to many different application scenarios, such as smart home, smart healthcare, and intelligent transportation. With the rapid development, the number of involving devices increases tremendously. The huge number of devices and correspondingly generated data bring critical challenges to the IoT systems. To enhance the overall performance, this thesis aims to address the related technical issues on IoT data processing and physical topology discovery of the subnets self-organized by IoT devices. First of all, the issues on outlier detection and data aggregation are addressed through the development of recursive principal component analysis (R-PCA) based data analysis framework. The framework is developed in a cluster-based structure to fully exploit the spatial correlation of IoT data. Specifically, the sensing devices are gathered into clusters based on spatial data correlation. Edge devices are assigned to the clusters for the R-PCA based outlier detection and data aggregation. The outlier-free and aggregated data are forwarded to the remote cloud server for data reconstruction and storage. Moreover, a data reduction scheme is further proposed to relieve the burden on the trunk link for data uploading by utilizing the temporal data correlation. Kalman filters (KFs) with identical parameters are maintained at the edge and cloud for data prediction. The amount of data uploading is reduced by using the data predicted by the KF in the cloud instead of uploading all the practically measured data. Furthermore, an unmanned aerial vehicle (UAV) assisted IoT system is particularly designed for large-scale monitoring. Wireless sensor nodes are flexibly deployed for environmental sensing and self-organized into wireless sensor networks (WSNs). A physical topology discovery scheme is proposed to construct the physical topology of WSNs in the cloud server to facilitate performance optimization, where the physical topology indicates both the logical connectivity statuses of WSNs and the physical locations of WSN nodes. The physical topology discovery scheme is implemented through the newly developed parallel Metropolis-Hastings random walk based information sampling and network-wide 3D localization algorithms, where UAVs are served as the mobile edge devices and anchor nodes. Based on the physical topology constructed in the cloud, a UAV-enabled spatial data sampling scheme is further proposed to efficiently sample data from the monitoring area by using denoising autoencoder (DAE). By deploying the encoder of DAE at the UAV and decoder in the cloud, the data can be partially sampled from the sensing field and accurately reconstructed in the cloud. In the final part of the thesis, a novel autoencoder (AE) neural network based data outlier detection algorithm is proposed, where both encoder and decoder of AE are deployed at the edge devices. Data outliers can be accurately detected by the large fluctuations in the squared error generated by the data passing through the encoder and decoder of the AE

    Statistical Methods for Semiconductor Manufacturing

    Get PDF
    In this thesis techniques for non-parametric modeling, machine learning, filtering and prediction and run-to-run control for semiconductor manufacturing are described. In particular, algorithms have been developed for two major applications area: - Virtual Metrology (VM) systems; - Predictive Maintenance (PdM) systems. Both technologies have proliferated in the past recent years in the semiconductor industries, called fabs, in order to increment productivity and decrease costs. VM systems aim of predicting quantities on the wafer, the main and basic product of the semiconductor industry, that may be physically measurable or not. These quantities are usually ’costly’ to be measured in economic or temporal terms: the prediction is based on process variables and/or logistic information on the production that, instead, are always available and that can be used for modeling without further costs. PdM systems, on the other hand, aim at predicting when a maintenance action has to be performed. This approach to maintenance management, based like VM on statistical methods and on the availability of process/logistic data, is in contrast with other classical approaches: - Run-to-Failure (R2F), where there are no interventions performed on the machine/process until a new breaking or specification violation happens in the production; - Preventive Maintenance (PvM), where the maintenances are scheduled in advance based on temporal intervals or on production iterations. Both aforementioned approaches are not optimal, because they do not assure that breakings and wasting of wafers will not happen and, in the case of PvM, they may lead to unnecessary maintenances without completely exploiting the lifetime of the machine or of the process. The main goal of this thesis is to prove through several applications and feasibility studies that the use of statistical modeling algorithms and control systems can improve the efficiency, yield and profits of a manufacturing environment like the semiconductor one, where lots of data are recorded and can be employed to build mathematical models. We present several original contributions, both in the form of applications and methods. The introduction of this thesis will be an overview on the semiconductor fabrication process: the most common practices on Advanced Process Control (APC) systems and the major issues for engineers and statisticians working in this area will be presented. Furthermore we will illustrate the methods and mathematical models used in the applications. We will then discuss in details the following applications: - A VM system for the estimation of the thickness deposited on the wafer by the Chemical Vapor Deposition (CVD) process, that exploits Fault Detection and Classification (FDC) data is presented. In this tool a new clustering algorithm based on Information Theory (IT) elements have been proposed. In addition, the Least Angle Regression (LARS) algorithm has been applied for the first time to VM problems. - A new VM module for multi-step (CVD, Etching and Litography) line is proposed, where Multi-Task Learning techniques have been employed. - A new Machine Learning algorithm based on Kernel Methods for the estimation of scalar outputs from time series inputs is illustrated. - Run-to-Run control algorithms that employ both the presence of physical measures and statistical ones (coming from a VM system) is shown; this tool is based on IT elements. - A PdM module based on filtering and prediction techniques (Kalman Filter, Monte Carlo methods) is developed for the prediction of maintenance interventions in the Epitaxy process. - A PdM system based on Elastic Nets for the maintenance predictions in Ion Implantation tool is described. Several of the aforementioned works have been developed in collaborations with major European semiconductor companies in the framework of the European project UE FP7 IMPROVE (Implementing Manufacturing science solutions to increase equiPment pROductiVity and fab pErformance); such collaborations will be specified during the thesis, underlying the practical aspects of the implementation of the proposed technologies in a real industrial environment

    Enabling Internet-Scale Publish/Subscribe In Overlay Networks

    Get PDF
    As the amount of data in todays Internet is growing larger, users are exposed to too much information, which becomes increasingly more difficult to comprehend. Publish/subscribe systems leverage this problem by providing loosely-coupled communications between producers and consumers of data in a network. Data consumers, i.e., subscribers, are provided with a subscription mechanism, to express their interests in a subset of data, in order to be notified only when some data that matches their subscription is generated by the producers, i.e., publishers. Most publish/subscribe systems today, are based on the client/server architectural model. However, to provide the publish/subscribe service in large scale, companies either have to invest huge amount of money for over-provisioning the resources, or are prone to frequent service failures. Peer-to-peer overlay networks are attractive alternative solutions for building Internet-scale publish/subscribe systems. However, scalability comes with a cost: a published message often needs to traverse a large number of uninterested (unsubscribed) nodes before reaching all its subscribers. We refer to this undesirable traffic, as relay overhead. Without careful considerations, the relay overhead might sharply increase resource consumption for the relay nodes (in terms of bandwidth transmission cost, CPU, etc) and could ultimately lead to rapid deterioration of the system’s performance once the relay nodes start dropping the messages or choose to permanently abandon the system. To mitigate this problem, some solutions use unbounded number of connections per node, while some other limit the expressiveness of the subscription scheme. In this thesis work, we introduce two systems called Vitis and Vinifera, for topic-based and content-based publish/subscribe models, respectively. Both these systems are gossip-based and significantly decrease the relay overhead. We utilize novel techniques to cluster together nodes that exhibit similar subscriptions. In the topic-based model, distinct clusters for each topic are constructed, while clusters in the content-based model are fuzzy and do not have explicit boundaries. We augment these clustered overlays by links that facilitate routing in the network. We construct a hybrid system by injecting structure into an otherwise unstructured network. The resulting structures resemble navigable small-world networks, which spans along clusters of nodes that have similar subscriptions. The properties of such overlays make them an ideal platform for efficient data dissemination in large-scale systems. The systems requires only a bounded node degree and as we show, through simulations, they scale well with the number of nodes and subscriptions and remain efficient under highly complex subscription patterns, high publication rates, and even in the presence of failures in the network. We also compare both systems against some state-of-the-art publish/subscribe systems. Our measurements show that both Vitis and Vinifera significantly outperform their counterparts on various subscription and churn scenarios, under both synthetic workloads and real-world traces
    • …
    corecore