Search CORE

10,622 research outputs found

Unattended network operations technology assessment study. Technical support for defining advanced satellite systems concepts

Author: Holdridge Mark
Jaworski Allan
Morgan Herbert K.
Odubiyi Jide
Price Kent M.
Publication venue
Publication date
Field of study

The results are summarized of an unattended network operations technology assessment study for the Space Exploration Initiative (SEI). The scope of the work included: (1) identified possible enhancements due to the proposed Mars communications network; (2) identified network operations on Mars; (3) performed a technology assessment of possible supporting technologies based on current and future approaches to network operations; and (4) developed a plan for the testing and development of these technologies. The most important results obtained are as follows: (1) addition of a third Mars Relay Satellite (MRS) and MRS cross link capabilities will enhance the network's fault tolerance capabilities through improved connectivity; (2) network functions can be divided into the six basic ISO network functional groups; (3) distributed artificial intelligence technologies will augment more traditional network management technologies to form the technological infrastructure of a virtually unattended network; and (4) a great effort is required to bring the current network technology levels for manned space communications up to the level needed for an automated fault tolerance Mars communications network

NASA Technical Reports Server

The Ultralight project: the network as an integrated and managed resource for data-intensive science

Author: Bunn Julian
Cavanaugh Richard
Legrand Iosif
Low Steven
McKee Shawn
Nae Dan
Newman Harvey
Ravot Sylvain
Steenberg Conrad
Su Xun
Thomas Michael
van Lingen Frank
Xia Yang
Publication venue
Publication date: 01/11/2005
Field of study

Looks at the UltraLight project which treats the network interconnecting globally distributed data sets as a dynamic, configurable, and closely monitored resource to construct a next-generation system that can meet the high-energy physics community's data-processing, distribution, access, and analysis needs

Caltech Authors

Cooperative Monitoring to Diagnose Multiagent Plans

Author: Micalizio Roberto
Torasso Pietro
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2014
Field of study

Diagnosing the execution of a Multiagent Plan (MAP) means identifying and explaining action failures (i.e., actions that did not reach their expected effects). Current approaches to MAP diagnosis are substantially centralized, and assume that action failures are inde-pendent of each other. In this paper, the diagnosis of MAPs, executed in a dynamic and partially observable environment, is addressed in a fully distributed and asynchronous way; in addition, action failures are no longer assumed as independent of each other. The paper presents a novel methodology, named Cooperative Weak-Committed Moni-toring (CWCM), enabling agents to cooperate while monitoring their own actions. Coop-eration helps the agents to cope with very scarcely observable environments: what an agent cannot observe directly can be acquired from other agents. CWCM exploits nondetermin-istic action models to carry out two main tasks: detecting action failures and building trajectory-sets (i.e., structures representing the knowledge an agent has about the environ-ment in the recent past). Relying on trajectory-sets, each agent is able to explain its own action failures in terms of exogenous events that have occurred during the execution of the actions themselves. To cope with dependent failures, CWCM is coupled with a diagnostic engine that distinguishes between primary and secondary action failures. An experimental analysis demonstrates that the CWCM methodology, together with the proposed diagnostic inferences, are effective in identifying and explaining action failures even in scenarios where the system observability is significantly reduced. 1

CiteSeerX

Institutional Research Information System University of Turin

Real-Time QoS Monitoring and Anomaly Detection on Microservice-based Applications in Cloud-Edge Infrastructure

Author: Noor Ayman Ibrahem
Publication venue: Newcastle University
Publication date: 01/01/2021
Field of study

Ph. D. Thesis.Microservices have emerged as a new approach for developing and deploying cloud applications that require higher levels of agility, scale, and reliability. A microservicebased cloud application architecture advocates decomposition of monolithic application components into independent software components called \microservices". As the independent microservices can be developed, deployed, and updated independently of each other, it leads to complex run-time performance monitoring and management challenges. The deployment environment for microservices in multi-cloud environments is very complex as there are numerous components running in heterogeneous environments (VM/container) and communicating frequently with each other using REST-based/REST-less APIs. In some cases, multiple components can also be executed inside a VM/container making any failure or anomaly detection very complicated. It is necessary to monitor the performance variation of all the service components to detect any reason for failure. Microservice and container architecture allows to design loose-coupled services and run them in a lightweight runtime environment for more e cient scaling. Thus, containerbased microservice deployment is now the standard model for hosting cloud applications across industries. Despite the strongest scalability characteristic of this model which opens the doors for further optimizations in both application structure and performance, such characteristic adds an additional level of complexity to monitoring application performance. Performance monitoring system can lead to severe application outages if it is not able to successfully and quickly detecting failures and localizing their causes. Machine learning-based techniques have been applied to detect anomalies in microservice-based cloud-based applications. The existing research works used di erent tracking algorithms to search the root cause if anomaly observed behaviour. However, linking the observed failures of an application with their root causes by the use of these techniques is still an open research problem. Osmotic computing is a new IoT application programming paradigm that's driven by the signi cant increase in resource capacity/capability at the network edge, along with support for data transfer protocols that enable such resources to interact more seamlessly with cloud-based services. Much of the di culty in Quality of Service (QoS) and performance monitoring of IoT applications in an osmotic computing environment is due to the massive scale and heterogeneity (IoT + edge + cloud) of computing environments. To handle monitoring and anomaly detection of microservices in cloud and edge datacenters, this thesis presents multilateral research towards monitoring and anomaly detection on microservice-based applications performance in cloud-edge infrastructure. The key contributions of this thesis are as following: • It introduces a novel system, Multi-microservices Multi-virtualization Multicloud monitoring (M3 ) that provides a holistic approach to monitor the performance of microservice-based application stacks deployed across multiple cloud data centers. • A framework forMonitoring, Anomaly Detection and Localization System (MADLS) which utilizes a simpli ed approach that depends on commonly available metrics o ering a simpli ed deployment environment for the developer. • Developing a uni ed monitoring model for cloud-edge that provides an IoT application administrator with detailed QoS information related to microservices deployed across cloud and edge datacenters.Royal Embassy of Saudi Arabia Cultural Bureau in London, government of Saudi Arabi

Newcastle University eTheses

Real-time performance diagnosis and evaluation of big data systems in cloud datacenters

Author: Demirbaga Umit
Publication venue: Newcastle University
Publication date: 01/01/2022
Field of study

PhD ThesisModern big data processing systems are becoming very complex in terms of largescale, high-concurrency and multiple talents. Thus, many failures and performance reductions only happen at run-time and are very difficult to capture. Moreover, some issues may only be triggered when some components are executed. To analyze the root cause of these types of issues, we have to capture the dependencies of each component in real-time. Big data processing systems, such as Hadoop and Spark, usually work in large-scale, highly-concurrent, and multi-tenant environments that can easily cause hardware and software malfunctions or failures, thereby leading to performance degradation. Several systems and methods exist to detect big data processing systems’ performance degradation, perform root-cause analysis, and even overcome the issues causing such degradation. However, these solutions focus on specific problems such as stragglers and inefficient resource utilization. There is a lack of a generic and extensible framework to support the real-time diagnosis of big data systems. Performance diagnosis and prediction of big data systems are highly complex as these frameworks are typically deployed in cloud data centers that are large-scale, highly concurrent, and follows a multi-tenant model. Several factors, including hardware heterogeneity, stochastic networks and application workloads may impact the performance of big data systems. The current state-of-the-art does not sufficiently address the challenge of determining complex, usually stochastic and hidden relationships between these factors. To handle performance diagnosis and evaluation of big data systems in cloud environments, this thesis proposes multilateral research towards monitoring and performance diagnosis and prediction in cloud-based large-scale distributed systems by involving a novel combination of an effective and efficient deployment pipeline.The key contributions of this dissertation are listed below: - i - • Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs). • Developing AutoDiagn, an automated real-time diagnosis framework for big data systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online root-cause analysis for a big data system. • Designing a novel root-cause analysis technique/system called BigPerf for big data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex relationships between performance related factors. The key contributions of this dissertation are listed below: - i - • Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs). • Developing AutoDiagn, an automated real-time diagnosis framework for big data systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online root-cause analysis for a big data system. • Designing a novel root-cause analysis technique/system called BigPerf for big data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex relationships between performance related factors. The key contributions of this dissertation are listed below: - i - • Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs). • Developing AutoDiagn, an automated real-time diagnosis framework for big data systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online root-cause analysis for a big data system. • Designing a novel root-cause analysis technique/system called BigPerf for big data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex relationships between performance related factors.State of the Republic of Turkey and the Turkish Ministry of National Educatio

Newcastle University eTheses

Self-Modeling based Diagnosis of Services over Programmable Networks

Author: Ben Yahia Imen Grida
CRESPI Noel
Sanchez Vilchez Jose Manuel
Publication venue: HAL CCSD
Publication date: 06/06/2016
Field of study

International audienceIn this paper, we propose a multi-layer self-diagnosis framework for networking services within SDN and NFV environments. The framework encompasses three main contributions: 1) the definition of multi-layered templates to identify what to supervise while taking into account the physical, logical, virtual and service layers. These templates are also finer-granular, extendable and machine-readable; 2) a self-modeling module that takes as input these templates, instantiates them and generates on-the-fly the diagnosis model that includes the physical, logical, and the virtual dependencies of networking services; 3) a service-aware root-cause analysis module that takes into account the networking services' views and their underlying network resources observations within the aforementioned layers. We also present extensive simulations to prove the fully automated, finer granularity and reduced uncertainty of the root cause of networking services failures and their underlying network resources

Crossref

Monitoring and Failure Recovery of Cloud-Managed Digital Signage

Author: Sultania Ashish Kumar
Publication venue
Publication date: 01/01/2017
Field of study

Digitaal signage kasutatakse laialdaselt erinevates valdkondades, nagu näiteks transpordisüsteemid, turustusvõimalused, meelelahutus ja teised, et kuvada teavet piltide, videote ja teksti kujul. Nende ressursside usaldusväärsus, vajalike teenuste kättesaadavus ja turvameetmed on selliste süsteemide vastuvõtmisel võtmeroll. Digitaalse märgistussüsteemi tõhus haldamine on teenusepakkujatele keeruline ülesanne. Selle süsteemi rikkeid võib põhjustada mitmeid põhjuseid, nagu näiteks vigased kuvarid, võrgu-, riist- või tarkvaraprobleemid, mis on üsna korduvad. Traditsiooniline protsess sellistest ebaõnnestumistest taastumisel hõlmab sageli tüütuid ja tülikaid diagnoose. Paljudel juhtudel peavad tehnikud kohale füüsiliselt külastama, suurendades seeläbi hoolduskulusid ja taastumisaega.Selles väites pakume lahendust, mis jälgib, diagnoosib ja taandub tuntud tõrgetest, ühendades kuvarid pilvega. Pilvepõhine kaug- ja autonoomne server konfigureerib kaugseadete sisu ja uuendab neid dünaamiliselt. Iga kuva jälgib jooksvat protsessi ja saadab trace’i, logib süstemisse perioodiliselt. Negatiivide puhul analüüsitakse neid serverisse salvestatud logisid, mis optimaalselt kasutavad kohandatud logijuhtimismoodulit. Lisaks näitavad ekraanid ebaõnnestumistega toimetulemiseks enesetäitmise protseduure, kui nad ei suuda pilvega ühendust luua. Kavandatud lahendus viiakse läbi Linuxi süsteemis ja seda hinnatakse serveri kasutuselevõtuga Amazon Web Service (AWS) pilves. Peamisteks tulemusteks on meetodite kogum, mis võimaldavad kaugjuhtimisega kuvariprobleemide lahendamist.Digital signage is widely used in various fields such as transport systems, trading outlets, entertainment, and others, to display information in the form of images, videos, and text. The reliability of these resources, availability of required services and security measures play a key role in the adoption of such systems. Efficient management of the digital signage system is a challenging task to the service providers. There could be many reasons that lead to the malfunctioning of this system such as faulty displays, network, hardware or software failures that are quite repetitive. The traditional process of recovering from such failures often involves tedious and cumbersome diagnosis. In many cases, technicians need to physically visit the site, thereby increasing the maintenance costs and the recovery time. In this thesis, we propose a solution that monitors, diagnoses and recovers from known failures by connecting the displays to a cloud. A cloud-based remote and autonomous server configures the content of remote displays and updates them dynamically. Each display tracks the running process and sends the trace and system logs to the server periodically. These logs, stored at the server optimally using a customized log management module, are analysed for failures. In addition, the displays incorporate self-recovery procedures to deal with failures, when they are unable to create connection to the cloud. The proposed solution is implemented on a Linux system and evaluated by deploying the server on the Amazon Web Service (AWS) cloud. The main result of the thesis is a collection of techniques for resolving the display system failures remotely

DSpace at Tartu University Library

Space station advanced automation

Author: Woods Donald
Publication venue
Publication date
Field of study

In the development of a safe, productive and maintainable space station, Automation and Robotics (A and R) has been identified as an enabling technology which will allow efficient operation at a reasonable cost. The Space Station Freedom's (SSF) systems are very complex, and interdependent. The usage of Advanced Automation (AA) will help restructure, and integrate system status so that station and ground personnel can operate more efficiently. To use AA technology for the augmentation of system management functions requires a development model which consists of well defined phases of: evaluation, development, integration, and maintenance. The evaluation phase will consider system management functions against traditional solutions, implementation techniques and requirements; the end result of this phase should be a well developed concept along with a feasibility analysis. In the development phase the AA system will be developed in accordance with a traditional Life Cycle Model (LCM) modified for Knowledge Based System (KBS) applications. A way by which both knowledge bases and reasoning techniques can be reused to control costs is explained. During the integration phase the KBS software must be integrated with conventional software, and verified and validated. The Verification and Validation (V and V) techniques applicable to these KBS are based on the ideas of consistency, minimal competency, and graph theory. The maintenance phase will be aided by having well designed and documented KBS software

NASA Technical Reports Server