130 research outputs found

    Empirically Measuring Transfer Distance for System Design and Operation

    Full text link
    Classical machine learning approaches are sensitive to non-stationarity. Transfer learning can address non-stationarity by sharing knowledge from one system to another, however, in areas like machine prognostics and defense, data is fundamentally limited. Therefore, transfer learning algorithms have little, if any, examples from which to learn. Herein, we suggest that these constraints on algorithmic learning can be addressed by systems engineering. We formally define transfer distance in general terms and demonstrate its use in empirically quantifying the transferability of models. We consider the use of transfer distance in the design of machine rebuild procedures to allow for transferable prognostic models. We also consider the use of transfer distance in predicting operational performance in computer vision. Practitioners can use the presented methodology to design and operate systems with consideration for the learning theoretic challenges faced by component learning systems

    IIMA 2018 Proceedings

    Get PDF

    Copula-based Multimodal Data Fusion for Inference with Dependent Observations

    Get PDF
    Fusing heterogeneous data from multiple modalities for inference problems has been an attractive and important topic in recent years. There are several challenges in multi-modal fusion, such as data heterogeneity and data correlation. In this dissertation, we investigate inference problems with heterogeneous modalities by taking into account nonlinear cross-modal dependence. We apply copula based methodology to characterize this dependence. In distributed detection, the goal often is to minimize the probability of detection error at the fusion center (FC) based on a fixed number of observations collected by the sensors. We design optimal detection algorithms at the FC using a regular vine copula based fusion rule. Regular vine copula is an extremely flexible and powerful graphical model used to characterize complex dependence among multiple modalities. The proposed approaches are theoretically justified and are computationally efficient for sensor networks with a large number of sensors. With heterogeneous streaming data, the fusion methods applied for processing data streams should be fast enough to keep up with the high arrival rates of incoming data, and meanwhile provide solutions for inference problems (detection, classification, or estimation) with high accuracy. We propose a novel parallel platform, C-Storm (Copula-based Storm), by marrying copula-based dependence modeling for highly accurate inference and a highly-regarded parallel computing platform Storm for fast stream data processing. The efficacy of C-Storm is demonstrated. In this thesis, we consider not only decision level fusion but also fusion with heterogeneous high-level features. We investigate a supervised classification problem by fusing dependent high-level features extracted from multiple deep neural network (DNN) classifiers. We employ regular vine copula to fuse these high-level features. The efficacy of the combination of model-based method and deep learning is demonstrated. Besides fixed-sample-size (FSS) based inference problems, we study a distributed sequential detection problem with random-sample-size. The aim of the distributed sequential detection problem in a non-Bayesian framework is to minimize the average detection time while satisfying the pre-specified constraints on probabilities of false alarm and miss detection. We design local memory-less truncated sequential tests and propose a copula based sequential test at the FC. We show that by suitably designing the local thresholds and the truncation window, the local probabilities of false alarm and miss detection of the proposed local decision rules satisfy the pre-specified error probabilities. Also, we show the asymptotic optimality and time efficiency of the proposed distributed sequential scheme. In large scale sensors networks, we consider a collaborative distributed estimation problem with statistically dependent sensor observations, where there is no FC. To achieve greater sensor transmission and estimation efficiencies, we propose a two-step cluster-based collaborative distributed estimation scheme. In the first step, sensors form dependence driven clusters such that sensors in the same cluster are dependent while sensors from different clusters are independent, and perform copula-based maximum a posteriori probability (MAP) estimation via intra-cluster collaboration. In the second step, the estimates generated in the first step are shared via inter-cluster collaboration to reach an average consensus. The efficacy of the proposed scheme is justified

    Admission Control Optimisation for QoS and QoE Enhancement in Future Networks

    Get PDF
    Recent exponential growth in demand for traffic heterogeneity support and the number of associated devices has considerably increased demand for network resources and induced numerous challenges for the networks, such as bottleneck congestion, and inefficient admission control and resource allocation. Challenges such as these degrade network Quality of Service (QoS) and user-perceived Quality of Experience (QoE). This work studies admission control from various perspectives. For example, two novel single-objective optimisation-based admission control models, Dynamica Slice Allocation and Admission Control (DSAAC) and Signalling and Admission Control (SAC), are presented to enhance future limited-capacity network Grade of Service (GoS), and for control signalling optimisation, respectively. DSAAC is an integrated model whereby a cost-estimation function based on user demand and network capacity quantifies resource allocation among users. Moreover, to maximise resource utility, adjustable minimum and maximum slice resource bounds have also been derived. In the case of user blocking from the primary slice due to congestion or resource scarcity, a set of optimisation algorithms on inter-slice admission control and resource allocation and adaptability of slice elasticity have been proposed. A novel SAC model uses an unsupervised learning technique (i.e. Ranking-based clustering) for optimal clustering based on users’ homogeneous demand characteristics to minimise signalling redundancy in the access network. The redundant signalling reduction reduces the additional burden on the network in terms of unnecessary resource utilisation and computational time. Moreover, dynamically reconfigurable QoE-based slice performance bounds are also derived in the SAC model from multiple demand characteristics for clustered user admission to the optimal network. A set of optimisation algorithms are also proposed to attain efficient slice allocation and users’ QoE enhancement via assessing the capability of slice QoE elasticity. An enhancement of the SAC model is proposed through a novel multi-objective optimisation model named Edge Redundancy Minimisation and Admission Control (E-RMAC). A novel E-RMAC model for the first time considers the issue of redundant signalling between the edge and core networks. This model minimises redundant signalling using two classical unsupervised learning algorithms, K-mean and Ranking-based clustering, and maximises the efficiency of the link (bandwidth resources) between the edge and core networks. For multi-operator environments such as Open-RAN, a novel Forecasting and Admission Control (FAC) model for tenant-aware network selection and configuration is proposed. The model features a dynamic demand-estimation scheme embedded with fuzzy-logic-based optimisation for optimal network selection and admission control. FAC for the first time considers the coexistence of the various heterogeneous cellular technologies (2G, 3G,4G, and 5G) and their integration to enhance overall network throughput by efficient resource allocation and utilisation within a multi-operator environment. A QoS/QoE-based service monitoring feature is also presented to update the demand estimates with the support of a forecasting modifier. he provided service monitoring feature helps resource allocation to tenants, approximately closer to the actual demand of the tenants, to improve tenant-acquired QoE and overall network performance. Foremost, a novel and dynamic admission control model named Slice Congestion and Admission Control (SCAC) is also presented in this thesis. SCAC employs machine learning (i.e. unsupervised, reinforcement, and transfer learning) and multi-objective optimisation techniques (i.e. Non-dominated Sorting Genetic Algorithm II ) to minimise bottleneck and intra-slice congestion. Knowledge transfer among requests in form of coefficients has been employed for the first time for optimal slice requests queuing. A unified cost estimation function is also derived in this model for slice selection to ensure fairness among slice request admission. In view of instantaneous network circumstances and load, a reinforcement learning-based admission control policy is established for taking appropriate action on guaranteed soft and best-effort slice requests admissions. Intra-slice, as well as inter-slice resource allocation, along with the adaptability of slice elasticity, are also proposed for maximising slice acceptance ratio and resource utilisation. Extensive simulation results are obtained and compared with similar models found in the literature. The proposed E-RMAC model is 35% superior at reducing redundant signalling between the edge and core networks compared to recent work. The E-RMAC model reduces the complexity from O(U) to O(R) for service signalling and O(N) for resource signalling. This represents a significant saving in the uplink control plane signalling and link capacity compared to the results found in the existing literature. Similarly, the SCAC model reduces bottleneck congestion by approximately 56% over the entire load compared to ground truth and increases the slice acceptance ratio. Inter-slice admission and resource allocation offer admission gain of 25% and 51% over cooperative slice- and intra-slice-based admission control and resource allocation, respectively. Detailed analysis of the results obtained suggests that the proposed models can efficiently manage future heterogeneous traffic flow in terms of enhanced throughput, maximum network resources utilisation, better admission gain, and congestion control

    Towards Automated Machine Learning on Imperfect Data for Situational Awareness in Power System

    Get PDF
    The increasing penetration of renewable energy sources (such as solar and wind) and incoming widespread electric vehicles charging introduce new challenges in the power system. Due to the variability and uncertainty of these sources, reliable and cost-effective operations of the power system rely on high level of situational awareness. Thanks to the wide deployment of sensors (e.g., phasor measurement units (PMUs) and smart meters) and the emerging smart Internet of Things (IoT) sensing devices in the electric grid, large amounts of data are being collected, which provide golden opportunities to achieve high level of situational awareness for reliable and cost-effective grid operations.To better utilize the data, this dissertation aims to develop Machine Learning (ML) methods and provide fundamental understanding and systematic exploitation of ML for situational awareness using large amounts of imperfect data collected in power systems, in order to improve the reliability and resilience of power systems.However, building excellent ML models needs clean, accurate and sufficient training data. The data collected from the real-world power system is of low quality. For example, the data collected from wind farms contains a mixture of ramp and non-ramp as well as the mingle of heterogeneous dynamics data; the data in the transmission grid contains noisy, missing, insufficient and inaccurate timestamp data. Employing ML without considering these distinct features in real-world applications cannot build good ML models. This dissertation aims to address these challenges in two applications, wind generation forecast and power system event classification, by developing ML models in an automated way with less efforts from domain experts, as the cost of processing such large amounts of imperfect data by experts can be prohibitive in practice.First, we take heterogeneous dynamics into consideration, especially for ramp events. A Drifting Streaming Peaks-over-Threshold (DSPOT) enhanced self-evolving neural networks-based short-term wind farm generation forecast is proposed by utilizing dynamic ramp thresholds to separate the ramp and non-ramp events, based on which different neural networks are trained to learn different dynamics of wind farm generation. As the efficacy of the neural networks relies on the quality of training datasets (i.e., the classification accuracy of ramp and non-ramp events), a Bayesian optimization based approach is developed to optimize the parameters of DSPOT to enhance the quality of the training datasets and the corresponding performance of the neural networks. Experimental results show that compared with other forecast approaches, the proposed forecast approach can substantially improve the forecast accuracy, especially for ramp events. Next, we address the challenges of event classification due to the low-quality PMU measurements and event logs. A novel machine learning framework is proposed for robust event classification, which consists of three main steps: data preprocessing, fine-grained event data extraction, and feature engineering. Specifically, the data preprocessing step addresses the data quality issues of PMU measurements (e.g., bad data and missing data); in the fine-grained event data extraction step, a model-free event detection method is developed to accurately localize the events from the inaccurate event timestamps in the event logs; and the feature engineering step constructs the event features based on the patterns of different event types, in order to improve the performance and the interpretability of the event classifiers. Moreover, with the small number of good features, we need much less training data to train a good event classifier, which can address the challenge of insufficient and imbalanced training data, and the training time is negligible compared to neural network based approaches. Based on the proposed framework, we developed a workflow for event classification using the real-world PMU data streaming into the system in real time. Using the proposed framework, robust event classifiers can be efficiently trained based on many off-the-shelf lightweight machine learning models. Numerical experiments using the real-world dataset from the Western Interconnection of the U.S power transmission grid show that the event classifiers trained under the proposed framework can achieve high classification accuracy while being robust against low-quality data. Subsequently, we address the challenge of insufficient training labels. The real-world PMU data is often incomplete and noisy, which can significantly reduce the efficacy of existing machine learning techniques that require high-quality labeled training data. To obtain high-quality event logs for large amounts of PMU measurements, it requires significant efforts from domain experts to maintain the event logs and even hand-label the events, which can be prohibitively costly or impractical in practice. So we develop a weakly supervised machine learning approach that can learn a good event classifier using a few labeled PMU data. The key idea is to learn the labels from unlabeled data using a probabilistic generative model, in order to improve the training of the event classifiers. Experimental results show that even with 95\% of unlabeled data, the average accuracy of the proposed method can still achieve 78.4\%. This provides a promising way for domain experts to maintain the event logs in a less expensive and automated manner. Finally, we conclude the dissertation and discuss future directions

    Anomaly detection on data streams from vehicular networks

    Get PDF
    As redes veiculares são compostas por nós com elevada mobilidade que apenas estão ativos quando o veículo se encontra em movimento, tornando a rede imprevisível e em constante mudança. Num cenário tão dinâmico, detetar anomalias na rede torna-se uma tarefa exigente, mas crucial. A Veniam opera uma rede veicular que garante conexão fiável através de redes heterogéneas como LTE, Wi-Fi e DSRC, conectando os veículos à Internet e a outros dispositivos espalhados pela cidade. Ao longo do tempo, os nós enviam dados para a Cloud tanto por tecnologias em tempo real como por tecnologias tolerantes a atraso, aumentando a dinâmica da rede. O objetivo desta dissertação é propor e implementar um método para detetar anomalias numa rede veicular real, através de uma análise online dos fluxos de dados enviados dos veículos para a Cloud. Os fluxos da rede foram explorados de forma a caracterizar os dados disponíveis e selecionar casos de uso. Os datasets escolhidos foram submetidos a diferentes técnicas de deteção de anomalias, como previsão de séries temporais e deteção de outliers baseados na densidade da vizinhança, seguido da análise dos trade-offs para selecionar os algoritmos que melhor se ajustam às características dos dados. A solução proposta engloba duas etapas: uma primeira fase de triagem seguida de uma classificação baseada no método dos vizinhos mais próximos. O sistema desenvolvido foi implementado no cluster distribuído da Veniam, que executa Apache Spark, permitindo uma solução rápida e escalável que classifica os dados assim que chegam à Cloud. A performance do método foi avaliada pela sua precisão, i.e., a percentagem de verdadeiras anomalias dentro das anomalias detetadas, quando foi submetido a datasets com anomalias artificiais provenientes de fontes de dados diferentes, recebidas tanto por tecnologias em tempo real como por tecnologias tolerantes a atraso.Vehicular networks are characterized by high mobility nodes that are only active when the vehicle is moving, thus making the network unpredictable and in constant change. In such a dynamic scenario, detecting anomalies in the network is a challenging but crucial task. Veniam operates a vehicular network that ensures reliable connectivity through heterogeneous networks such as LTE, Wi-Fi and DSRC, connecting the vehicles to the Internet and to other devices spread throughout the city. Over time, nodes send data to the cloud either by real time technologies or by delay tolerant ones, increasing the network's dynamics. The aim of this dissertation is to propose and implement a method for detecting anomalies in a real-world vehicular network through means of an online analysis of the data streams that come from the vehicles to the cloud. The network's streams were explored in order to characterize the available data and select target use cases. The chosen datasets were submitted to different anomaly detection techniques, such as time series forecasting and density-based outlier detection, followed by the trade-offs' analysis to select the algorithms that best modeled the data characteristics. The proposed solution comprises two stages: a lightweight screening step, followed by a Nearest Neighbor classification. The developed system was implemented on Veniam's distributed cluster running Apache Spark, allowing a fast and scalable solution that classifies the data as soon as it reaches the Cloud. The performance of the method was evaluated by its precision, i.e., the percentage of true anomalies within the detected outliers, when it was submitted to datasets presenting artificial anomalies from different data sources, received either by real-time or delay tolerant technologies

    Towards Name Disambiguation: Relational, Streaming, and Privacy-Preserving Text Data

    Get PDF
    In the real world, our DNA is unique but many people share names. This phenomenon often causes erroneous aggregation of documents of multiple persons who are namesakes of one another. Such mistakes deteriorate the performance of document retrieval, web search, and more seriously, cause improper attribution of credit or blame in digital forensics. To resolve this issue, the name disambiguation task 1 is designed to partition the documents associated with a name reference such that each partition contains documents pertaining to a unique real-life person. Existing algorithms for this task mainly suffer from the following drawbacks. First, the majority of existing solutions substantially rely on feature engineering, such as biographical feature extraction, or construction of auxiliary features from Wikipedia. However, for many scenarios, such features may be costly to obtain or unavailable in privacy sensitive domains. Instead we solve the name disambiguation task in restricted setting by leveraging only the relational data in the form of anonymized graphs. Second, most of the existing works for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task should be performed in an online streaming fashion in order to identify records of new ambiguous entities having no preexisting records. Finally, we investigate the potential disclosure risk of textual features used in name disambiguation and propose several algorithms to tackle the task in a privacy-aware scenario. In summary, in this dissertation, we present a number of novel approaches to address name disambiguation tasks from the above three aspects independently, namely relational, streaming, and privacy preserving textual data

    Intelligent computing in electrical utility industry 4.0 : concept, key technologies, applications and future directions

    Get PDF
    Industry 4.0 (I-4.0) is referred to as ‘fourth industrial revolution’ towards incorporation of artificial intelligence and digitalization of industrial systems. It is meticulously associated with the development and advancement of evolving technologies such as: Internet of Things, Cyber-Physical System, Information and Communications Technology, Enterprise Architecture, and Enterprise Integration. Power systems of today face several challenges that need to be addressed and application of these technologies can make the modern power systems become more effective, reliable, secure, and cost-effective. Therefore, a widespread analysis of I- 4.0 is performed in this paper and a summary of the outcomes, future scope, and real-world application of I- 4.0 on the electrical utility industry (EUI) is reported by reviewing the existing literature. This report will be helpful to the investigators interested in the area of I- 4.0 and for application in EUI.Analytical Center for Government of the Russian Federation.https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639Electrical, Electronic and Computer Engineerin
    • …
    corecore