Search CORE

41 research outputs found

The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems

Author: Anand V.
Mace J.
Vigfusson Y.
Xie Z.
Zhang L.
Publication venue
Publication date: 01/01/2022
Field of study

Today's distributed tracing frameworks are ill-equipped to troubleshoot rareedge-case requests. The crux of the problem is a trade-off between specificityand overhead. On the one hand, frameworks can indiscriminately select requeststo trace when they enter the system (head sampling), but this is unlikely tocapture a relevant edge-case trace because the framework cannot know whichrequests will be problematic until after-the-fact. On the other hand,frameworks can trace everything and later keep only the interesting edge-casetraces (tail sampling), but this has high overheads on the traced applicationand enormous data ingestion costs. In this paper we circumvent this trade-off for any edge-case with symptomsthat can be programmatically detected, such as high tail latency, errors, andbottlenecked queues. We propose a lightweight and always-on distributed tracingsystem, Hindsight, which implements a retroactive sampling abstraction: insteadof eagerly ingesting and processing traces, Hindsight lazily retrieves tracedata only after symptoms of a problem are detected. Hindsight is analogous to acar dash-cam that, upon detecting a sudden jolt in momentum, persists the lasthour of footage. Developers using Hindsight receive the exact edge-case tracesthey desire without undue overhead or dependence on luck. Our evaluation showsthat Hindsight scales to millions of requests per second, adds nanosecond-leveloverhead to generate trace data, handles GB/s of data per node, transparentlyintegrates with existing distributed tracing systems, and successfully persistsfull, detailed traces in real-world use cases when edge-case problems aredetected.<br

MPG.PuRe

TAXONOMY OF SECURITY AND PRIVACY ISSUES IN SERVERLESS COMPUTING

Author: Pusuluri Vasanta Swarna Ratnam
Publication venue: The Repository at St. Cloud State
Publication date: 01/08/2022
Field of study

The advent of cloud computing has led to a new era of computer usage. Networking and physical security are some of the IT infrastructure concerns that IT administrators around the world had to worry about for their individual environments. Cloud computing took away that burden and redefined the meaning of IT administrators. Serverless computing as it relates to secure software development is creating the same kind of change. Developers can quickly spin up a secure development environment in a matter of minutes without having to worry about any of the underlying infrastructure setups. In the paper, we will look at the merits and demerits of serverless computing, what is drawing the demand for serverless computing among developers, the security and privacy issues of serverless technology, and detail the parameters to consider when setting up and using a secure development environment based on serverless computin

St. Cloud State University

Technical Communication in China:Studies on the User Experience of Technical Documentation

Author: Gao Zhijun
Publication venue: University of Twente
Publication date: 01/02/2024
Field of study

Technical communication is the process of conveying complex information to a varied audience, including both technical and non-technical individuals. It aims to make information usable and accessible. The dissertation provides an in-depth examination of technical communication's evolution in China, with a focus on enhancing user experience with technical documentation.The dissertation is organized into seven chapters, beginning with the current state of technical communication in China, followed by three parts covering five research studies on various aspects of technical documentation, including the roles of technical communicators who create technical documentation, the design and evaluation of developer documentation, and the application of emotional design in user manuals. It concludes by summarizing key findings, discussing theoretical and practical implications, and suggesting future research directions.This dissertation aims to answer five research questions. The first focuses on the state of the art of TC in China. The other four questions explore specific angles on TC in a Chinese context. The five research questions are:• RQ1. What is the development of technical communication as a professional discipline in China?• RQ2. What are the learning habits, information journey, and expectations of Chinese developers regarding developer documentation?• RQ3. What are key factors influencing the effectiveness of searching and finding technical documentation?• RQ4. What are effective strategies for evaluating performance and user experience of developer documentation?• RQ5. What is the impact of emotional design on user experience and effectiveness in technical documentation?<br/

University of Twente Research Information

Performance Evaluation of Serverless Applications and Infrastructures

Author: Scheuner Joel
Publication venue
Publication date: 01/01/2022
Field of study

Context. Cloud computing has become the de facto standard for deploying modern web-based software systems, which makes its performance crucial to the efficient functioning of many applications. However, the unabated growth of established cloud services, such as Infrastructure-as-a-Service (IaaS), and the emergence of new serverless services, such as Function-as-a-Service (FaaS), has led to an unprecedented diversity of cloud services with different performance characteristics. Measuring these characteristics is difficult in dynamic cloud environments due to performance variability in large-scale distributed systems with limited observability.Objective. This thesis aims to enable reproducible performance evaluation of serverless applications and their underlying cloud infrastructure.Method. A combination of literature review and empirical research established a consolidated view on serverless applications and their performance. New solutions were developed through engineering research and used to conduct performance benchmarking field experiments in cloud environments.Findings. The review of 112 FaaS performance studies from academic and industrial sources found a strong focus on a single cloud platform using artificial micro-benchmarks and discovered that most studies do not follow reproducibility principles on cloud experimentation. Characterizing 89 serverless applications revealed that they are most commonly used for short-running tasks with low data volume and bursty workloads. A novel trace-based serverless application benchmark shows that external service calls often dominate the median end-to-end latency and cause long tail latency. The latency breakdown analysis further identifies performance challenges of serverless applications, such as long delays through asynchronous function triggers, substantial runtime initialization for coldstarts, increased performance variability under bursty workloads, and heavily provider-dependent performance characteristics. The evaluation of different cloud benchmarking methodologies has shown that only selected micro-benchmarks are suitable for estimating application performance, performance variability depends on the resource type, and batch testing on the same instance with repetitions should be used for reliable performance testing.Conclusions. The insights of this thesis can guide practitioners in building performance-optimized serverless applications and researchers in reproducibly evaluating cloud performance using suitable execution methodologies and different benchmark types

Chalmers Research

Enterprise Architecture: Enabling Digital Transformation for Operational Business Process during COVID-19

Author: Hardi Kori Viony
Legowo Nilo
Publication venue: 'Ital Publication'
Publication date: 21/01/2023
Field of study

The SARS-CoV-2 pandemic and the global response to contain its spread and deaths have been unprecedented, according to UNICEF research on COVID-19 released in 2021. Many steps had been taken by countries worldwide, particularly those in South Asia. As of May 17th, 2020, Indonesia reported a total of 17,514 daily positive cases. It has been confirmed that the majority of cases throughout the archipelago occur primarily on Java, particularly in the Greater Jakarta, Greater Bandung, Semarang, Solo, and Greater Surabaya areas. The research object of this paper is a system integrator company located in, Central Jakarta. The company's business is badly impacted by this pandemic. The company provides nearly all ICT solutions, yet improving their internal systems is an issue that has never been brought up. Due to physical distance regulations, leading workers to work from home. To keep the business running, the company began using email as their only tool to run the whole system, which is not effective and causing a crisis for the company. The purpose of this paper is to propose a digital transformation plan as a solution and to support business continuity by utilizing TOGAF ADM. Doi: 10.28991/HIJ-2023-04-01-01 Full Text: PD

HighTech and Innovation Journal

Security Enhancement Deploying SIEM in a Small ISP Environment

Author: Bělousov Petr
Publication venue: Vysoké učení technické v Brně. Fakulta podnikatelská
Publication date: 01/01/2019
Field of study

Diplomová práce se zaměřuje na zvýšení bezpečnosti v prostředí malého poskytovatele internetu nasazením SIEM systému. Dostupné systémy jsou porovnány a zhodnoceny v souladu s požadavky zadávající firmy. Projekt nasazení systému SIEM je navržen, implementován a zhodnocen v souladu s unikátním prostředím firmy.This master’s thesis is focused on improvement of security in small ISP environment by deploying SIEM system in the company. The available systems are compared and evaluated to cover the requirements. The selected SIEM system deployment is proposed, implemented and evaluated in accordance to the firm’s unique characteristics.

Digital library of Brno University of Technology

National Repository of Grey Literature

Techno-economic evaluation of integrated process flowsheets for vinasse management with value addition for decision making

Author: Azegele Rony Mung'asia
Publication venue: Department of Chemical Engineering
Publication date: 03/05/2022
Field of study

Bioethanol production through fermentation of sugarcane juice and its derivatives such as molasses is gaining popularity worldwide as focus shifts towards renewable energy production. However, ethanol fermentation results in the production of large volumes of a dark brown and low pH liquid waste termed vinasse. At a vinasse production rate of 12-15 liters per liter of ethanol, sustainability of this bioprocess is impacted as effluent handling costs are high. If disposed onto the land, breakdown of the organic matter within may lead to the release of greenhouse gases into the atmosphere. Additionally, disposal into water bodies results in eutrophication due to the overload of plant nutrients (N, P and K). Further, owing to the high potassium content, the use of dewatered vinasse as animal feed supplements has been shown to cause digestive tract problems in ruminants depending on the supplementation rates (>10%). To increase sustainability of bioethanol fermentation processes through combined treatment and resource recovery from vinasse, biological and physico-chemical processes have been developed and implemented in industry. Conventionally, raw vinasse is dewatered through evaporation processes (MEE) as a means of volume reduction. Membrane processes such as reverse osmosis (RO) have in the recent past become popularized as water recovery options from vinasse due to process simplicity and lower costs of equipment. Resulting concentrates from RO and MEE can be used as fertilizer. Due to the high organic content, vinasse is a suitable candidate for anaerobic digestion (AD) where the organic matter is broken down to biogas and an effluent that can be safely used as fertilizer. Additionally, the biogas from AD may be harnessed for electricity generation through combined heat and power processes or upgraded to biomethane to be used as a substitute for natural gas. For high moisture content substrates such as vinasse, up flow anaerobic sludge blanket reactors are best suited as sludge residence time is prolonged thereby increasing contact time with substrate which leading to higher methane yields. AD is often sensitive to changes in temperature, substrate composition, loading rate and pH. The presence of inhibitory components such as potassium salt ions (>11.6 g/L) in the vinasse feed result in a reduction of methanogenic activity manifested through reduced biogas and methane yields. Salt recovery processes including electrodialysis and ion-exchange have been investigated in literature on a pilot scale for the removal of K+ ions from raw vinasse. To improve resource productivity, integration of vinasse treatment processes has been implemented in industry. Integration combines biological and physico-chemical processes which results in performance optimization and energy efficiency thereby improving economic feasibility of the projects. During the project conceptualization phase, process modelling is a vital tool that can be used to predict outcomes such as substrate utilization rates, product yields and optimal operating conditions of integrated processes in a timely and cost effective manner. In addition, techno-economic analyses can be used to determine cost sensitive areas and overall feasibility of the integrated processes. Having reviewed the current industrial practices, this project sought to develop integrated flowsheets consisting of biological and physical processes for the combined vinasse treatment and value creation. Value creation was demonstrated through the recovery of valuable products including energy, salts and water from the raw vinasse. Due to its simplicity and cost effectiveness, AD was selected as the primary technology for vinasse treatment and biogas production. This was coupled with a combined heat and power system for electricity generation to form the base case flowsheet. It was hypothesized that incorporation of pre- and post-treatment as well as alternative biogas utilization processes to the base case flowsheet for recovery of salts and water would generate additional revenue and cost savings. Profitability of the base case process was expected to increase with the additional pre- and post-treatments. To fulfil the objective set out and prove the hypothesis, a three step research approach was taken. The first step involved simulation and benchmarking of the base case flowsheet (AD and CHP). Using techno-economic analyses, the effect of individual addition of pre- and post-treatment options to the base case flowsheet on profitability was investigated. A framework was then developed to investigate the incorporation of combined pre- and posttreatment options to the base case flowsheet. Thereafter, a decision support tool that in comparing various combinations of vinasse treatment routes in terms of process performance and profitability was developed to aid in the synthesis of vinasse treatment processes in industry. As bioprocess modelling is complex, it was important to select an appropriate simulation platform. Given the availability of a dedicated bioprocess compound database, sensitivity and optimization features and flexible customization options within Aspen Plus, it was preferred as the primary simulation platform over SuperPro Designer and high performance programming languages (C++, Java). In developing the base case AD flowsheet, several frameworks in the literature were considered. These included ADM1 (Batstone et al., 2002), ADM-3P (Ikumi et al., 2011) and a comprehensive model by Angelidaki et al. (1993). The presence of a well defined stoichiometric framework motivated the decision to adopt the comprehensive model by Angelidaki et al. (1993). Using a combination of in-built unit operations as well as customized user models (calculator blocks), the AD model by Angelidaki et al. (1993) was implemented on Aspen Plus. As ADM1 was considered an extension of the comprehensive model (Angelidaki et al., 1993) with several similarities, kinetic constants describing substrate uptake and microbial growth were adapted from ADM1. To ascertain the predictive quality of the built AD model, four case studies in the literature concerning the AD of manure (cow and swine) and municipal solid waste were simulated and the predicted simulation results compared to the experimental results. The developed AD model accurately predicted the methane yields of the four case studies as evidenced by the average difference of 10% between simulation and experimental results. A regression analysis between experimental and predicted data yielded a value of 0.74. Given the assumptions made in simplifying the developed model, the R2 value was deemed acceptable and further affirmed the agreement between the model and experimental results. To investigate the robustness of the developed AD model, sensitivity analyses on the feed composition as well as organic loading were conducted. Increasing inhibitory compound concentrations above certain thresholds was shown to negatively impact methanogenic activity as evidenced by the decreasing methane yields. Although ammonia is inhibitory at concentrations above 0.22 g/L, it is an important nitrogen source for biomass growth. Similarly, while acetic acid is inhibitory to acetogenic microbes, it is a crucial substrate for the growth of methanogenic archaea and methane production. Inorganic salt inhibition on the other hand may be reduced through extraction of K2SO4 through pre-treatment processes. The compositional sensitivity analyses as well as the benchmarking study showed that the built AD model had a solid core framework which accurately predicted experimental data for a range of substrates. Combined with a simplified CHP model of a Jenbacher spark ignition engine (General Electric, 2008) to form the base case flowsheet, the built AD model was used for all further simulations in this work. To determine the financial standing of the base case, simulation and subsequent techno-economic analyses were conducted. At an industrial reactor capacity of 2000 m3 and a loading rate of 25 kgCOD/m3 .day, simulation of the base case process resulted in a methane yield of 45 L-CH4/kgVSadded and an electrical production capacity of 410 kW. Discounted cash flow analyses (USD, 2016) showed that the base case was not profitable within a 20-year project lifetime as evidenced by the low return on investment and internal rate of return. However, a further sensitivity on profitability of the base case showed that decreasing potassium ion concentrations in the feed would result in higher profitability higher methane yields because of decreased K+ inhibition. Despite the positive effect of on AD performance, further analyses were required to validate feasibility of K2SO4 recovery processes as well as water recovery processes aimed at further value creation from vinasse. To investigate the effect of pre-treatment on base case flowsheet economics, an ion exchange process adapted from Zhang et al. (2012) was incorporated based on the comparatively higher degree of selectivity to K+ ions exhibited by the ion exchange process than ozonation and electrodialysis. As expected, improved CH4 yields (14%), electrical production and consequently, increases (>100%) in profitability indicators were observed. However, the pretreated base case (IEX-AD-CHP) remained unprofitable which was an indication that the marginal revenue from increased electrical production and K2SO4 sales did not match the additional capital costs. To increase profitability of the base case, biogas upgrading using a HPWS system was used in place of the CHP. Due to the comparatively low cost of HPWS equipment coupled with the increased revenue from biomethane sales, the AD-HPWS process exhibited higher profitability (ROI: 19.6%) than the base case (ROI: 0%). As evidenced by the IRR (16.3%) that was greater than the cost of capital (15%), the AD-HPWS option was profitable over a 20 year lifetime. Resource recovery from the AD effluent was sought through incorporation of RO and MEE to form the AD-CHP-RO and AD-CHP-MEE routes. Most notably, there was a significant (170%) increase in cost savings with the use of RO and MEE concentrates as fertilizer compared to the raw AD effluent from the base case. Additional cost savings of up to

27 700 were achieved with upstream reintegration of RO permeate or MEE condensate water. This savings was based on the municipal water tariff of R5/kL. The combined cost savings led to increased profitability of the base case as evidenced by the increase in ROI from 0% to 3%. Potential knock-on effects of pre-treatments on efficiency of post-treatment or biogas utilization processes were noted. These were investigated through the simultaneous addition of pre- and post-treatment combinations to the base case AD process to form a decision making framework. Through techno-economic comparisons drawn between the 12 distinct vinasse treatment routes resulting from various combinations of pre- and post-treatment options in the decision making framework, three major decision criteria were established. Despite the improved performance and methane yields observed with pre-treatment addition, there was a decline in profitability of the AD-HPWS-RO/MEE processes owing to increased capital costs that remain unrecovered by marginal revenue obtained from biomethane sales. The contrary is observed with the AD-CHP-RO/MEE processes as evidenced by the 20 to 30% increase in profitability indicators upon addition of pre-treatment. This is attributed to the marginal revenues from increased electrical output as well as the cost savings from water reuse and RO/MEE concentrates. Due to the contrasting effect of pre-treatment on CHP and HPWS affiliated processes and profitability, the presence of inhibitory potassium ions was considered a decision criterion. Due to the low cost of HPWS equipment, it was observed that choosing to upgrade biogas to biomethane as opposed to using CHP exhibited higher performance (energy output) and profitability in all process combinations. This was evidenced by the higher ROI and IRR of the AD-HPWS, AD-HPWS-RO/MEE and IEX-AD-HPWS-RO/MEE process options compared to the CHP counterparts. As a result, the choice of biogas utilization was considered an important decision criterion affecting profitability. Because of increased cost savings with upstream reintegration of water and the use of concentrates as fertilizer, the implementation of RO and MEE was observed to increase profitability of all process options including AD-CHP/HPWS and IEX-AD-CHP/HPWS. This was majorly through cost savings from use of RO and MEE concentrates as fertilizer (

250 000/yr) and upstream reintegration of water. This led to the conclusion that the recovery of concentrates from vinasse is an important decision criterion when looking to increase profitability and process sustainability. Overall, based on the techno-economic analyses, the most profitable vinasse treatment process included an anaerobic digester coupled with a high-pressure water scrubbing system for biomethane production and reverse osmosis process for water recovery (ROI: 22.9%, NPV: $540 000). This facilitated both increased energy output from biomethane and cost savings from water reuse. Further research is recommended around the AD modelling aspect to extend functionality to ionic speciation and pH prediction. it is recommended that equipment quotes from suppliers within South Africa be sourced as opposed to costing heuristics in the literature to increase the accuracy of capital and operating expenditure

Cape Town University OpenUCT

Real-time performance diagnosis and evaluation of big data systems in cloud datacenters

Author: Demirbaga Umit
Publication venue: Newcastle University
Publication date: 01/01/2022
Field of study

PhD ThesisModern big data processing systems are becoming very complex in terms of largescale, high-concurrency and multiple talents. Thus, many failures and performance reductions only happen at run-time and are very difficult to capture. Moreover, some issues may only be triggered when some components are executed. To analyze the root cause of these types of issues, we have to capture the dependencies of each component in real-time. Big data processing systems, such as Hadoop and Spark, usually work in large-scale, highly-concurrent, and multi-tenant environments that can easily cause hardware and software malfunctions or failures, thereby leading to performance degradation. Several systems and methods exist to detect big data processing systems’ performance degradation, perform root-cause analysis, and even overcome the issues causing such degradation. However, these solutions focus on specific problems such as stragglers and inefficient resource utilization. There is a lack of a generic and extensible framework to support the real-time diagnosis of big data systems. Performance diagnosis and prediction of big data systems are highly complex as these frameworks are typically deployed in cloud data centers that are large-scale, highly concurrent, and follows a multi-tenant model. Several factors, including hardware heterogeneity, stochastic networks and application workloads may impact the performance of big data systems. The current state-of-the-art does not sufficiently address the challenge of determining complex, usually stochastic and hidden relationships between these factors. To handle performance diagnosis and evaluation of big data systems in cloud environments, this thesis proposes multilateral research towards monitoring and performance diagnosis and prediction in cloud-based large-scale distributed systems by involving a novel combination of an effective and efficient deployment pipeline.The key contributions of this dissertation are listed below: - i - • Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs). • Developing AutoDiagn, an automated real-time diagnosis framework for big data systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online root-cause analysis for a big data system. • Designing a novel root-cause analysis technique/system called BigPerf for big data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex relationships between performance related factors. The key contributions of this dissertation are listed below: - i - • Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs). • Developing AutoDiagn, an automated real-time diagnosis framework for big data systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online root-cause analysis for a big data system. • Designing a novel root-cause analysis technique/system called BigPerf for big data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex relationships between performance related factors. The key contributions of this dissertation are listed below: - i - • Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs). • Developing AutoDiagn, an automated real-time diagnosis framework for big data systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online root-cause analysis for a big data system. • Designing a novel root-cause analysis technique/system called BigPerf for big data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex relationships between performance related factors.State of the Republic of Turkey and the Turkish Ministry of National Educatio

Newcastle University eTheses

Performance Regression Detection in DevOps

Author: Bodík Peter
Chen Jinfu
Foo King Chun
Malik Haroon
Tan Jiaqi
Publication venue
Publication date: 02/10/2020
Field of study

Performance is an important aspect of software quality. The goals of performance are typically defined by setting upper and lower bounds for response time and throughput of a system and physical level measurements such as CPU, memory, and I/O. To meet such performance goals, several performance-related activities are needed in development (Dev) and operations (Ops). Large software system failures are often due to performance issues rather than functional bugs. One of the most important performance issues is performance regression. Although performance regressions are not all bugs, they often have a direct impact on users’ experience of the system. The process of detection of performance regressions in development and operations is faced with challenges. First, the detection of performance regression is conducted after the fact, i.e., after the system is built and deployed in the field or dedicated performance testing environments. Large amounts of resources are required to detect, locate, understand, and fix performance regressions at such a late stage in the development cycle. Second, even we can detect a performance regression, it is extremely hard to fix it because other changes are applied to the system after the introduction of the regression. These challenges call for further in-depth analyses of the performance regression. In this thesis, to avoid performance regression slipping into operation, we first perform an exploratory study on the source code changes that introduce performance regressions in order to understand root-causes of performance regression in the source code level. Second, we propose an approach that automatically predicts whether a test would manifest performance regressions in a code commit. Most of the performance issues are related to configurations. Therefore, third, we propose an approach that predicts whether a configuration option manifests a performance variation issue. To assist practitioners to analyze system performance with operational data, we propose an approach to recovering field-representative workload that can be used to detect performance regression

Crossref

Concordia University Research Repository

Logging Statements Analysis and Automation in Software Systems with Data Mining and Machine Learning Techniques

Author: Gholamian Sina
Publication venue: 'University of Waterloo'
Publication date: 13/01/2022
Field of study

Log files are widely used to record runtime information of software systems, such as the timestamp of an event, the name or ID of the component that generated the log, and parts of the state of a task execution. The rich information of logs enables system developers (and operators) to monitor the runtime behavior of their systems and further track down system problems in development and production settings. With the ever-increasing scale and complexity of modern computing systems, the volume of logs is rapidly growing. For example, eBay reported that the rate of log generation on their servers is in the order of several petabytes per day in 2018 [17]. Therefore, the traditional way of log analysis that largely relies on manual inspection (e.g., searching for error/warning keywords or grep) has become an inefficient, a labor intensive, error-prone, and outdated task. The growth of the logs has initiated the emergence of automated tools and approaches for log mining and analysis. In parallel, the embedding of logging statements in the source code is a manual and error-prone task, and developers often might forget to add a logging statement in the software's source code. To address the logging challenge, many e orts have aimed to automate logging statements in the source code, and in addition, many tools have been proposed to perform large-scale log le analysis by use of machine learning and data mining techniques. However, the current logging process is yet mostly manual, and thus, proper placement and content of logging statements remain as challenges. To overcome these challenges, methods that aim to automate log placement and content prediction, i.e., `where and what to log', are of high interest. In addition, approaches that can automatically mine and extract insight from large-scale logs are also well sought after. Thus, in this research, we focus on predicting the log statements, and for this purpose, we perform an experimental study on open-source Java projects. We introduce a log-aware code-clone detection method to predict the location and description of logging statements. Additionally, we incorporate natural language processing (NLP) and deep learning methods to further enhance the performance of the log statements' description prediction. We also introduce deep learning based approaches for automated analysis of software logs. In particular, we analyze execution logs and extract natural language characteristics of logs to enable the application of natural language models for automated log le analysis. Then, we propose automated tools for analyzing log files and measuring the information gain from logs for different log analysis tasks such as anomaly detection. We then continue our NLP-enabled approach by leveraging the state-of-the-art language models, i.e., Transformers, to perform automated log parsing

University of Waterloo's Institutional Repository