11,343 research outputs found

    Automatic Detection of Mass Outages in Radio Access Networks

    Get PDF
    Fault management in mobile networks is required for detecting, analysing, and fixing problems appearing in the mobile network. When a large problem appears in the mobile network, multiple alarms are generated from the network elements. Traditionally Network Operations Center (NOC) process the reported failures, create trouble tickets for problems, and perform a root cause analysis. However, alarms do not reveal the root cause of the failure, and the correlation of alarms is often complicated to determine. If the network operator can correlate alarms and manage clustered groups of alarms instead of separate ones, it saves costs, preserves the availability of the mobile network, and improves the quality of service. Operators may have several electricity providers and the network topology is not correlated with the electricity topology. Additionally, network sites and other network elements are not evenly distributed across the network. Hence, we investigate the suitability of a density-based clustering methods to detect mass outages and perform alarm correlation to reduce the amount of created trouble tickets. This thesis focuses on assisting the root cause analysis and detecting correlated power and transmission failures in the mobile network. We implement a Mass Outage Detection Service and form a custom density-based algorithm. Our service performs alarm correlation and creates clusters of possible power and transmission mass outage alarms. We have filed a patent application based on the work done in this thesis. Our results show that we are able to detect mass outages in real time from the data streams. The results also show that detected clusters reduce the number of created trouble tickets and help reduce of the costs of running the network. The number of trouble tickets decreases by 4.7-9.3% for the alarms we process in the service in the tested networks. When we consider only alarms included in the mass outage groups, the reduction is over 75%. Therefore continuing to use, test, and develop implemented Mass Outage Detection Service is beneficial for operators and automated NOC

    An intelligent alarm management system for large-scale telecommunication companies

    Get PDF
    This paper introduces an intelligent system that performs alarm correlation and root cause analysis. The system is designed to operate in large- scale heterogeneous networks from telecommunications operators. The pro- posed architecture includes a rules management module that is based in data mining (to generate the rules) and reinforcement learning (to improve rule se- lection) algorithms. In this work, we focus on the design and development of the rule generation part and test it using a large real-world dataset containing alarms from a Portuguese telecommunications company. The correlation engine achieved promising results, measured by a compression rate of 70% and as- sessed in real-time by experienced network administrator staff

    Tiresias: Online Anomaly Detection for Hierarchical Operational Network Data

    Full text link
    Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect problems in their networks. Unfortunately, there is lack of efficient tools to automatically track and detect anomalous events on operational data, causing ISP operators to rely on manual inspection of this data. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). To address these challenges, we propose Tiresias, an automated approach to locating anomalous events on hierarchical operational data. Tiresias leverages the hierarchical structure of operational data to identify high-impact aggregates (e.g., locations in the network, failure modes) likely to be associated with anomalous events. To accommodate different kinds of operational network data, Tiresias consists of an online detection algorithm with low time and space complexity, while preserving high detection accuracy. We present results from two case studies using operational data collected at a large commercial IP network operated by a Tier-1 ISP: customer care call logs and set-top box crash logs. By comparing with a reference set verified by the ISP's operational group, we validate that Tiresias can achieve >94% accuracy in locating anomalies. Tiresias also discovered several previously unknown anomalies in the ISP's customer care cases, demonstrating its effectiveness

    v. 20, no. 15, May 29, 1959

    Get PDF

    Supporting Telecommunication Alarm Management System with Trouble Ticket Prediction

    Get PDF
    Fault alarm data emanated from heterogeneous telecommunication network services and infrastructures are exploding with network expansions. Managing and tracking the alarms with Trouble Tickets using manual or expert rule- based methods has become challenging due to increase in the complexity of Alarm Management Systems and demand for deployment of highly trained experts. As the size and complexity of networks hike immensely, identifying semantically identical alarms, generated from heterogeneous network elements from diverse vendors, with data-driven methodologies has become imperative to enhance efficiency. In this paper, a data-driven Trouble Ticket prediction models are proposed to leverage Alarm Management Systems. To improve performance, feature extraction, using a sliding time-window and feature engineering, from related history alarm streams is also introduced. The models were trained and validated with a data-set provided by the largest telecommunication provider in Italy. The experimental results showed the promising efficacy of the proposed approach in suppressing false positive alarms with Trouble Ticket prediction
    corecore