693 research outputs found

    Autonomous and reliable operation of multilayer optical networks

    Get PDF
    This Ph.D. thesis focuses on the reliable autonomous operation of multilayer optical networks. The first objective focuses on the reliability of the optical network and proposes methods for health analysis related to Quality of Transmission (QoT) degradation. Such degradation is produced by soft-failures in optical devices and fibers in core and metro segments of the operators’ transport networks. Here, we compare estimated and measured QoT in the optical transponder by using a QoT tool based on GNPy. We show that the changes in the values of input parameters of the QoT model representing optical devices can explain the deviations and degradation in performance of such devices. We use reverse engineering to estimate the value of those parameters that explain the observed QoT. We show by simulation a large anticipation in soft-failure detection, localization and identification of degradation before affecting the network. Finally, for validating our approach, we experimentally observe the high accuracy in the estimation of the modeling parameters. The second objective focuses on multilayer optical networks, where lightpaths are used to connect packet nodes thus creating virtual links (vLink). Specifically, we study how lightpaths can be managed to provide enough capacity to the packet layer without detrimental effects in their Quality of Service (QoS), like added delays or packet losses, and at the same time minimize energy consumption. Such management must be as autonomous as possible to minimize human intervention. We study the autonomous operation of optical connections based on digital subcarrier multiplexing (DSCM). We propose several solutions for the autonomous operation of DSCM systems. In particular, the combination of two modules running in the optical node and in the optical transponder activate and deactivate subcarriers to adapt the capacity of the optical connection to the upper layer packet traffic. The module running in the optical node is part of our Intent-based Networking (IBN) solution and implements prediction to anticipate traffic changes. Our comprehensive study demonstrates the feasibility of DSCM autonomous operation and shows large cost savings in terms of energy consumption. In addition, our study provides a guideline to help vendors and operators to adopt the proposed solutions. The final objective targets at automating packet layer connections (PkC). Automating the capacity required by PkCs can bring further cost reduction to network operators, as it can limit the resources used at the optical layer. However, such automation requires careful design to avoid any QoS degradation, which would impact Service Level Agreement (SLA) in the case that the packet flow is related to some customer connection. We study autonomous packet flow capacity management. We apply RL techniques and propose a management lifecycle consisting of three different phases: 1) a self-tuned threshold-based approach for setting up the connection until enough data is collected, which enables understanding the traffic characteristics; 2) RL operation based on models pre-trained with generic traffic profiles; and 3) RL operation based on models trained with the observed traffic. We show that RL algorithms provide poor performance until they learn optimal policies, as well as when the traffic characteristics change over time. The proposed lifecycle provides remarkable performance from the starting of the connection and it shows the robustness while facing changes in traffic. The contribution is twofold: 1) and on the one hand, we propose a solution based on RL, which shows superior performance with respect to the solution based on prediction; and 2) because vLinks support packet connections, coordination between the intents of both layers is proposed. In this case, the actions taken by the individual PkCs are used by the vLink intent. The results show noticeable performance compared to independent vLink operation.Esta tesis doctoral se centra en la operación autónoma y confiable de redes ópticas multicapa. El primer objetivo se centra en la fiabilidad de la red óptica y propone métodos para el análisis del estado relacionados con la degradación de la calidad de la transmisión (QoT). Dicha degradación se produce por fallos en dispositivos ópticos y fibras en las redes de transporte de los operadores que no causan el corte de la señal. Comparamos el QoT estimado y medido en el transpondedor óptico mediante el uso de una herramienta de QoT basada en GNPy. Mostramos que los cambios en los valores de los parámetros de entrada del modelo QoT que representan los dispositivos ópticos pueden explicar las desviaciones y la degradación en el rendimiento de dichos dispositivos. Usamos ingeniería inversa para estimar el valor de aquellos parámetros que explican el QoT observado. Mostramos, mediante simulación, una gran anticipación en la detección, localización e identificación de fallas leves antes de afectar la red. Finalmente, validamos nuestro método de forma experimental y comprobamos la alta precisión en la estimación de los parámetros de los modelos. El segundo objetivo se centra en las redes ópticas multicapa, donde se utilizan conexiones ópticas (lightpaths) para conectar nodos de paquetes creando así enlaces virtuales (vLink). Específicamente, estudiamos cómo se pueden gestionar los lightpaths para proporcionar suficiente capacidad a la capa de paquetes sin efectos perjudiciales en su calidad de servicio (QoS), como retrasos adicionales o pérdidas de paquetes, y al mismo tiempo minimizar el consumo de energía. Estudiamos el funcionamiento autónomo de conexiones ópticas basadas en multiplexación de subportadoras digitales (DSCM) y proponemos soluciones para su funcionamiento autónomo. En particular, la combinación de dos módulos que se ejecutan en el nodo óptico y en el transpondedor óptico activan y desactivan subportadoras para adaptar la capacidad de la conexión óptica al tráfico de paquetes. El módulo que se ejecuta en el nodo óptico implementa la predicción para anticipar los cambios de tráfico. Nuestro estudio demuestra la viabilidad de la operación autónoma de DSCM y muestra un gran ahorro de consumo de energía. El objetivo final es la automatización de conexiones de capa de paquete (PkC). La automatización de la capacidad requerida por las PkC puede generar una mayor reducción de costes, ya que puede limitar los recursos utilizados en la capa óptica. Sin embargo, dicha automatización requiere un diseño cuidadoso para evitar cualquier degradación de QoS, lo que afectaría acuerdos de nivel de servicio (SLA) en el caso de que el flujo de paquetes esté relacionado con alguna conexión del cliente. Estudiamos la gestión autónoma de la capacidad del flujo de paquetes. Aplicamos RL y proponemos un ciclo de vida de gestión con tres fases: 1) un enfoque basado en umbrales auto ajustados para configurar la conexión hasta que se recopilen suficientes datos, lo que permite comprender las características del tráfico; 2) operación RL basada en modelos pre-entrenados con perfiles de tráfico genéricos; y 3) operación de RL en base a modelos entrenados con el tránsito observado. Mostramos que los algoritmos de RL ofrecen un desempeño deficiente hasta que aprenden las políticas óptimas, así cuando las características del tráfico cambian con el tiempo. El ciclo de vida propuesto proporciona un rendimiento notable desde el inicio de la conexión y muestra la robustez frente a cambios en el tráfico. La contribución es doble: 1) proponemos una solución basada en RL que muestra un rendimiento superior que la solución basada en predicción; y 2) debido a que los vLinks admiten conexiones de paquetes, se propone la coordinación entre las intenciones de ambas capas. En este caso, la intención de vLink utiliza las acciones realizadas por los PkC individuales. Los resultados muestran un rendimiento notable en comparación con la operación independiente de vLink.Postprint (published version

    Monitoring and Data Analytics for Optical Networking:Benefits, Architectures, and Use Cases

    Get PDF
    Operators' network management continuously measures network health by collecting data from the deployed network devices; data is used mainly for performance reporting and diagnosing network problems after failures, as well as by human capacity planners to predict future traffic growth. Typically, these network management tools are generally reactive and require significant human effort and skills to operate effectively. As optical networks evolve to fulfil highly flexible connectivity and dynamicity requirements, and supporting ultra-low latency services, they must also provide reliable connectivity and increased network resource efficiency. Therefore, reactive human-based network measurement and management will be a limiting factor in the size and scale of these new networks. Future optical networks must support fully automated management, providing dynamic resource re-optimization to rapidly adapt network resources based on predicted conditions and events; identify service degradation conditions that will eventually impact connectivity and highlight critical devices and links for further inspection; and augment rapid protection schemes if a failure is predicted or detected, and facilitate resource optimization after restoration events. Applying automation techniques to network management requires both the collection of data from a variety of sources at various time frequencies, but it must also support the capability to extract knowledge and derive insight for performance monitoring, troubleshooting, and maintain network service continuity. Innovative analytics algorithms must be developed to derive meaningful input to the entities that orchestrate and control network resources; these control elements must also be capable of proactively programming the underlying optical infrastructure. In this article, we review the emerging requirements for optical network management automation, the capabilities of current optical systems, and the development and standardization status of data models and protocols to facilitate automated network monitoring. Finally, we propose an architecture to provide Monitoring and Data Analytics (MDA) capabilities, we present illustrative control loops for advanced network monitoring use cases, and the findings that validate the usefulness of MDA to provide automated optical network management

    Soft-failure localization and time-dependent degradation detection for network diagnosis

    Get PDF
    © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In optical networks, degradation of the Quality of Transmission (QoT) can be the outcome of soft-failures in optical devices, like Optical Transponders, Wavelength Selective Switches (WSS) and Optical Amplifiers (OA). In this paper, we assume time-dependent degradations on ROADMs and OAs. Specifically, several degradations are considered: i) the noise figure can increase linearly over time due to the aging of the components; ii) the maximum of optical output power of the amplifiers can decrease because of the degradation in the pump lasers of the EDFAs; iii) aging effects, e.g., due to fiber splices; and iv) the OSNR can vary caused by frequency drift of WSSs due to temperature variations. Our proposal for degradation detection and soft-failure localization includes algorithms that are able to detect and localize the degradation in early stages and facilitate network diagnosis.In addition, we propose an architecture where the control plane consist of a network controller, a Monitoring and Data Analytics system and a QoT tool based on GNPy that are interconnected with each other.The research leading to these results has received funding from the Spanish MINECO TWINS project (TEC2017-90097-R), and from the Catalan Institution for Research and Advanced Studies (ICREA).Peer ReviewedPostprint (published version

    Performance studies of evolutionary transfer learning for end-to-end QoT estimation in multi-domain optical networks [Invited]

    Get PDF
    This paper proposes an evolutionary transfer learning approach (Evol-TL) for scalable quality-of-transmission (QoT) estimation in multi-domain elastic optical networks (MD-EONs). Evol-TL exploits a broker-based MD-EON architecture that enables cooperative learning between the broker plane (end-to-end) and domain-level (local) machine learning functions while securing the autonomy of each domain. We designed a genetic algorithm to optimize the neural network architectures and the sets of weights to be transferred between the source and destination tasks. We evaluated the performance of Evol-TL with three case studies considering the QoT estimation task for lightpaths with (i) different path lengths (in terms of the numbers of fiber links traversed), (ii) different modulation formats, and (iii) different device conditions (emulated by introducing different levels of wavelength-specific attenuation to the amplifiers). The results show that the proposed approach can reduce the average amount of required training data by up to 13× while achieving an estimation accuracy above 95%

    Machine Learning for Multi-Layer Open and Disaggregated Optical Networks

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Study and application of spectral monitoring techniques for optical network optimization

    Get PDF
    One of the possible ways to address the constantly increasing amount of heterogeneous and variable internet traffic is the evolution of the current optical networks towards a more flexible, open, and disaggregated paradigm. In such scenarios, the role played by Optical Performance Monitoring (OPM) is fundamental. In fact, OPM allows to balance performance and specification mismatches resulting from the disaggregation adoption and provides the control plane with the necessary feedback to grant the optical networks an adequate automation level. Therefore, new flexible and cost-effective OPM solutions are needed, as well as novel techniques to extract the desired information from the monitored data and process and apply them. In this dissertation, we focus on three aspects related to OPM. We first study a monitoring data plane scheme to acquire the high resolution signal optical spectra in a nonintrusive way. In particular, we propose a coherent detection based Optical Spectrum Analyzer (OSA) enhanced with specific Digital Signal Processing (DSP) to detect spectral slices of the considered optical signals. Then, we identify two main placement strategies for such monitoring solutions, enhancing them using two spectral processing techniques to estimate signal- and optical filter-related parameters. Specifically, we propose a way to estimate the Amplified Spontaneous Emission (ASE) noise or its related Optical Signal-to-Noise (OSNR) using optical spectra acquired at the egress ports of the network nodes and the filter central frequency and 3/6 dB bandwidth, using spectra captured at the ingress ports of the network nodes. To do so, we leverage Machine Learning (ML) algorithms and the function fitting principle, according to the considered scenario. We validate both the monitoring strategies and their related processing techniques through simulations and experiments. The obtained results confirm the validity of the two proposed estimation approaches. In particular, we are able to estimate in-band the OSNR/ASE noise within an egress monitor placement scenario, with a Maximum Absolute Error (MAE) lower than 0.4 dB. Moreover, we are able to estimate the filter central frequency and 3/6 dB bandwidth, within an ingress optical monitor placement scenario, with a MAE lower than 0.5 GHz and 0.98 GHz, respectively. Based on such evaluations, we also compare the two placement scenarios and provide guidelines on their implementation. According to the analysis of specific figures of merit, such as the estimation of the Signal-to-Noise Ratio (SNR) penalty introduced by an optical filter, we identify the ingress monitoring strategy as the most promising. In fact, when compared to scenarios where no monitoring strategy is adopted, the ingress one reduced the SNR penalty estimation by 92%. Finally, we identify a potential application for the monitored information. Specifically, we propose a solution for the optimization of the subchannel spectral spacing in a superchannel. Leveraging convex optimization methods, we implement a closed control loop process for the dynamical reconfiguration of the subchannel central frequencies to optimize specific Quality of Transmission (QoT)-related metrics. Such a solution is based on the information monitored at the superchannel receiver side. In particular, to make all the subchannels feasible, we consider the maximization of the total superchannel capacity and the maximization of the minimum superchannel subchannel SNR value. We validate the proposed approach using simulations, assuming scenarios with different subchannel numbers, signal characteristics, and starting frequency values. The obtained results confirm the effectiveness of our solution. Specifically, compared with the equally spaced subchannel scenario, we are able to improve the total and the minimum subchannel SNR values of a four subchannel superchannel, of 1.45 dB and 1.19 dB, respectively.Una de las posibles formas de hacer frente a la creciente cantidad de tráfico heterogéneo y variable de Internet es la evolución de las actuales redes ópticas hacia un paradigma más flexible, abierto y desagregado. En estos escenarios, el papel que desempeña el modulo óptico de monitorización de prestaciones (OPM) es fundamental. De hecho, el OPM permite equilibrar los desajustes de rendimiento y especificación, los cuales surgen con la adopción de la desagregación; del mismo modo el OPM también proporciona al plano de control la realimentación necesaria para otorgar un nivel de automatización adecuado a las redes ópticas. En esta tesis, nos centramos en tres aspectos relacionados con el OPM. En primer lugar, estudiamos un esquema de monitorización para adquirir, de forma no intrusiva, los espectros ópticos de señales de alta resolución. En concreto, proponemos un analizador de espectro óptico (OSA) basado en detección coherente y mejorado con un específico procesado digital de señal (DSP) para detectar cortes espectrales de las señales ópticas consideradas. A continuación, presentamos dos técnicas de colocación para dichas soluciones de monitorización, mejorándolas mediante dos técnicas de procesamiento espectral para estimar los parámetros relacionados con la señal y el filtro óptico. Específicamente, proponemos un método para estimar el ruido de emisión espontánea amplificada (ASE), o la relación de señal-ruido óptica (OSNR), utilizando espectros ópticos adquiridos en los puertos de salida de los nodos de la red. Del mismo modo, estimamos la frecuencia central del filtro y el ancho de banda de 3/6 dB, utilizando espectros capturados en los puertos de entrada de los nodos de la red. Para ello, aprovechamos los algoritmos de Machine Learning (ML) y el principio de function fitting, según el escenario considerado. Validamos tanto las estrategias de monitorización como las técnicas de procesamiento mediante simulaciones y experimentos. Se puede estimar en banda el ruido ASE/OSNR en un escenario de colocación de monitores de salida, con un Maximum Absolute Error (MAE) inferior a 0.4 dB. Además, se puede estimar la frecuencia central del filtro y el ancho de banda de 3/6 dB, dentro de un escenario de colocación de monitores ópticos de entrada, con un MAE inferior a 0.5 GHz y 0.98 GHz, respectivamente. A partir de estas evaluaciones, también comparamos los dos escenarios de colocación y proporcionamos directrices sobre su aplicación. Según el análisis de específicas figuras de mérito, como la estimación de la penalización de la relación señal-ruido (SNR) introducida por un filtro óptico, demostramos que la estrategia de monitorización de entrada es la más prometedora. De hecho, utilizar un sistema de monitorización de entrada redujo la estimación de la penalización del SNR en un 92%. Por último, identificamos una posible aplicación para la información monitorizada. En concreto, proponemos una solución para la optimización del espaciado espectral de los subcanales en un supercanal. Aprovechando los métodos de optimización convexa, implementamos un proceso cíclico de control cerrado para la reconfiguración dinámica de las frecuencias centrales de los subcanales con el fin de optimizar métricas específicas relacionadas con la calidad de la transmisión (QoT). Esta solución se basa en la información monitorizada en el lado del receptor del supercanal. Validamos el enfoque propuesto mediante simulaciones, asumiendo escenarios con un diferente número de subcanales, distintas características de la señal, y diversos valores de la frecuencia inicial. Los resultados obtenidos confirman la eficacia de nuestra solución. Más específicatamente, en comparación con el escenario de subcanales igualmente espaciados, se pueden mejorar los valores totales y minimos de SNR de los subcanales de un supercanal de cuatro subcanales, de 1.45 dB y 1.19 dB, respectivamentePostprint (published version

    Distributed collaborative knowledge management for optical network

    Get PDF
    Network automation has been long time envisioned. In fact, the Telecommunications Management Network (TMN), defined by the International Telecommunication Union (ITU), is a hierarchy of management layers (network element, network, service, and business management), where high-level operational goals propagate from upper to lower layers. The network management architecture has evolved with the development of the Software Defined Networking (SDN) concept that brings programmability to simplify configuration (it breaks down high-level service abstraction into lower-level device abstractions), orchestrates operation, and automatically reacts to changes or events. Besides, the development and deployment of solutions based on Artificial Intelligence (AI) and Machine Learning (ML) for making decisions (control loop) based on the collected monitoring data enables network automation, which targets at reducing operational costs. AI/ML approaches usually require large datasets for training purposes, which are difficult to obtain. The lack of data can be compensated with a collective self-learning approach. In this thesis, we go beyond the aforementioned traditional control loop to achieve an efficient knowledge management (KM) process that enhances network intelligence while bringing down complexity. In this PhD thesis, we propose a general architecture to support KM process based on four main pillars, which enable creating, sharing, assimilating and using knowledge. Next, two alternative strategies based on model inaccuracies and combining model are proposed. To highlight the capacity of KM to adapt to different applications, two use cases are considered to implement KM in a purely centralized and distributed optical network architecture. Along with them, various policies are considered for evaluating KM in data- and model- based strategies. The results target to minimize the amount of data that need to be shared and reduce the convergence error. We apply KM to multilayer networks and propose the PILOT methodology for modeling connectivity services in a sandbox domain. PILOT uses active probes deployed in Central Offices (COs) to obtain real measurements that are used to tune a simulation scenario reproducing the real deployment with high accuracy. A simulator is eventually used to generate large amounts of realistic synthetic data for ML training and validation. We apply KM process also to a more complex network system that consists of several domains, where intra-domain controllers assist a broker plane in estimating accurate inter-domain delay. In addition, the broker identifies and corrects intra-domain model inaccuracies, as well as it computes an accurate compound model. Such models can be used for quality of service (QoS) and accurate end-to-end delay estimations. Finally, we investigate the application on KM in the context of Intent-based Networking (IBN). Knowledge in terms of traffic model and/or traffic perturbation is transferred among agents in a hierarchical architecture. This architecture can support autonomous network operation, like capacity management.La automatización de la red se ha concebido desde hace mucho tiempo. De hecho, la red de gestión de telecomunicaciones (TMN), definida por la Unión Internacional de Telecomunicaciones (ITU), es una jerarquía de capas de gestión (elemento de red, red, servicio y gestión de negocio), donde los objetivos operativos de alto nivel se propagan desde las capas superiores a las inferiores. La arquitectura de gestión de red ha evolucionado con el desarrollo del concepto de redes definidas por software (SDN) que brinda capacidad de programación para simplificar la configuración (descompone la abstracción de servicios de alto nivel en abstracciones de dispositivos de nivel inferior), organiza la operación y reacciona automáticamente a los cambios o eventos. Además, el desarrollo y despliegue de soluciones basadas en inteligencia artificial (IA) y aprendizaje automático (ML) para la toma de decisiones (bucle de control) en base a los datos de monitorización recopilados permite la automatización de la red, que tiene como objetivo reducir costes operativos. AI/ML generalmente requieren un gran conjunto de datos para entrenamiento, los cuales son difíciles de obtener. La falta de datos se puede compensar con un enfoque de autoaprendizaje colectivo. En esta tesis, vamos más allá del bucle de control tradicional antes mencionado para lograr un proceso eficiente de gestión del conocimiento (KM) que mejora la inteligencia de la red al tiempo que reduce la complejidad. En esta tesis doctoral, proponemos una arquitectura general para apoyar el proceso de KM basada en cuatro pilares principales que permiten crear, compartir, asimilar y utilizar el conocimiento. A continuación, se proponen dos estrategias alternativas basadas en inexactitudes del modelo y modelo de combinación. Para resaltar la capacidad de KM para adaptarse a diferentes aplicaciones, se consideran dos casos de uso para implementar KM en una arquitectura de red óptica puramente centralizada y distribuida. Junto a ellos, se consideran diversas políticas para evaluar KM en estrategias basadas en datos y modelos. Los resultados apuntan a minimizar la cantidad de datos que deben compartirse y reducir el error de convergencia. Aplicamos KM a redes multicapa y proponemos la metodología PILOT para modelar servicios de conectividad en un entorno aislado. PILOT utiliza sondas activas desplegadas en centrales de telecomunicación (CO) para obtener medidas reales que se utilizan para ajustar un escenario de simulación que reproducen un despliegue real con alta precisión. Un simulador se utiliza finalmente para generar grandes cantidades de datos sintéticos realistas para el entrenamiento y la validación de ML. Aplicamos el proceso de KM también a un sistema de red más complejo que consta de varios dominios, donde los controladores intra-dominio ayudan a un plano de bróker a estimar el retardo entre dominios de forma precisa. Además, el bróker identifica y corrige las inexactitudes de los modelos intra-dominio, así como también calcula un modelo compuesto preciso. Estos modelos se pueden utilizar para estimar la calidad de servicio (QoS) y el retardo extremo a extremo de forma precisa. Finalmente, investigamos la aplicación en KM en el contexto de red basada en intención (IBN). El conocimiento en términos de modelo de tráfico y/o perturbación del tráfico se transfiere entre agentes en una arquitectura jerárquica. Esta arquitectura puede soportar el funcionamiento autónomo de la red, como la gestión de la capacidad.Postprint (published version

    A Tutorial on Machine Learning for Failure Management in Optical Networks

    Get PDF
    Failure management plays a role of capital importance in optical networks to avoid service disruptions and to satisfy customers' service level agreements. Machine learning (ML) promises to revolutionize the (mostly manual and human-driven) approaches in which failure management in optical networks has been traditionally managed, by introducing automated methods for failure prediction, detection, localization, and identification. This tutorial provides a gentle introduction to some ML techniques that have been recently applied in the field of the optical-network failure management. It then introduces a taxonomy to classify failure-management tasks and discusses possible applications of ML for these failure management tasks. Finally, for a reader interested in more implementative details, we provide a step-by-step description of how to solve a representative example of a practical failure-management task

    Learning life cycle to speed up autonomic optical transmission and networking adoption

    Get PDF
    Autonomic optical transmission and networking requires machine learning (ML) models to be trained with large datasets. However, the availability of enough real data to produce accurate ML models is rarely ensured since new optical equipment and techniques are continuously being deployed in the network. One option is to generate data from simulations and lab experiments, but such data could not cover the whole features space and would translate into inaccuracies in the ML models. In this paper, we propose an ML-based algorithm life cycle to facilitate ML deployment in real operator networks. The dataset for ML training can be initially populated based on the results from simulations and lab experiments. Once ML models are generated, ML retraining can be performed after inaccuracies are detected to improve their precision. Illustrative numerical results show the benefits of the proposed learning cycle for general use cases. In addition, two specific use cases are proposed and demonstrated that implement different learning strategies: (i) a two-phase strategy performing out-of-field training using data from simulations and lab experiments with generic equipment, followed by an in-field adaptation to support heterogeneous equipment (the accuracy of this strategy is shown for a use case of failure detection and identification), and (ii) in-field retraining, where ML models are retrained after detecting model inaccuracies. Different approaches are analyzed and evaluated for a use case of autonomic transmission, where results show the significant benefits of collective learning.Peer ReviewedPostprint (published version

    On Cooperative Fault Management in Multi-Domain Optical Networks Using Hybrid Learning

    Get PDF
    This paper presents a hybrid learning approach for cooperative fault management in multi-domain optical networks (MD-ONs). The proposed approach relies on a broker-based MD-ON architecture for coordination of inter-domain service provisioning. We first propose a self-supervised learning design for soft failure detection. The self-supervised learning design makes use of a clustering algorithm for extracting normal and abnormal patterns from optical performance monitoring data and a supervised learning-based classifier trained with the learned patterns for online detection. To facilitate high soft failure detection accuracy in the absence of sufficient abnormal data for training, the proposed design estimates model uncertainties during predictions and identifies instances associated with high uncertainties as also soft failures. Then, we extend the self-supervised learning design and present a federated learning framework for the broker plane and DMs to learn cooperatively while complying with the privacy constraints of each domain. Finally, a data-driven soft failure localization scheme that operates by analyzing the patterns of data is proposed as a complement to the existing approaches. Performance evaluations indicate that the self-supervised learning design can achieve soft failure detection accuracy of up to ∼ 97% with 0.01%-0.04% false alarm rate, while federated learning enables DMs to realize >90% soft failure detection rates in the cases of highly unbalanced data distribution (two of the three domains possess zero abnormal data for training)
    • …
    corecore