157 research outputs found
Detecting Abnormal Machine Characteristics in Cloud Infrastructures
In the cloud computing environment resources are accessed as services rather than as a product. Monitoring this system for performance is crucial because of typical pay-peruse packages bought by the users for their jobs. With the huge number of machines currently in the cloud system, it is often extremely difficult for system administrators to keep track of all machines using distributed monitoring programs such as Ganglia1 which lacks system health assessment and summarization capabilities. To overcome this problem, we propose a technique for automated anomaly detection using machine performance data in the cloud. Our algorithm is entirely distributed and runs locally on each computing machine on the cloud in order to rank the machines in order of their anomalous behavior for given jobs. There is no need to centralize any of the performance data for the analysis and at the end of the analysis, our algorithm generates error reports, thereby allowing the system administrators to take corrective actions. Experiments performed on real data sets collected for different jobs validate the fact that our algorithm has a low overhead for tracking anomalous machines in a cloud infrastructure
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
Historically, high energy physics computing has been performed on large
purpose-built computing systems. These began as single-site compute facilities,
but have evolved into the distributed computing grids used today. Recently,
there has been an exponential increase in the capacity and capability of
commercial clouds. Cloud resources are highly virtualized and intended to be
able to be flexibly deployed for a variety of computing tasks. There is a
growing nterest among the cloud providers to demonstrate the capability to
perform large-scale scientific computing. In this paper, we discuss results
from the CMS experiment using the Fermilab HEPCloud facility, which utilized
both local Fermilab resources and virtual machines in the Amazon Web Services
Elastic Compute Cloud. We discuss the planning, technical challenges, and
lessons learned involved in performing physics workflows on a large-scale set
of virtualized resources. In addition, we will discuss the economics and
operational efficiencies when executing workflows both in the cloud and on
dedicated resources.Comment: 15 pages, 9 figure
Recommended from our members
Remedying Security Concerns at an Internet Scale
The state of security across the Internet is poor, and it has been so since the advent of the modern Internet. While the research community has made tremendous progress over the years in learning how to design and build secure computer systems, network protocols, and algorithms, we are far from a world where we can truly trust the security of deployed Internet systems. In reality, we may never reach such a world. Security concerns continue to be identified at scale through-out the software ecosystem, with thousands of vulnerabilities discovered each year. Meanwhile, attacks have become ever more frequent and consequential.As Internet systems will continue to be inevitably affected by newly found security concerns, the research community must develop more effective ways to remedy these issues. To that end, in this dissertation, we conduct extensive empirical measurements to understand how remediation occurs in practice for Internet systems, and explore methods for spurring improved remediation behavior. This dissertation provides a treatment of the complete remediation life cycle, investigating the creation, dissemination, and deployment of remedies. We start by focusing on security patches that address vulnerabilities, and analyze at scale their creation process, characteristics of the resulting fixes, and how these impact vulnerability remediation. We then investigate and systematize how administrators of Internet systems deploy software updates which patch vulnerabilities across the many machines they manage on behalf of organizations. Finally, we conduct the first systematic exploration of Internet-scale outreach efforts to disseminate information about security concerns and their remedies to system administrators, with an aim of driving their remediation decisions. Our results show that such outreach campaigns can effectively galvanize positive reactions.Improving remediation, particularly at scale, is challenging, as the problem space exhibits many dimensions beyond traditional computer technical considerations, including human, social, organizational, economic, and policy facets. To make meaningful progress, this work uses a diversity of empirical methods, from software data mining to user studies to Internet-wide network measurements, to systematically collect and evaluate large-scale datasets. Ultimately, this dissertation establishes broad empirical grounding on security remediation in practice today, as well as new approaches for improved remediation at an Internet scale
Thermal Neural Networks: Lumped-Parameter Thermal Modeling With State-Space Machine Learning
With electric power systems becoming more compact and increasingly powerful,
the relevance of thermal stress especially during overload operation is
expected to increase ceaselessly. Whenever critical temperatures cannot be
measured economically on a sensor base, a thermal model lends itself to
estimate those unknown quantities. Thermal models for electric power systems
are usually required to be both, real-time capable and of high estimation
accuracy. Moreover, ease of implementation and time to production play an
increasingly important role. In this work, the thermal neural network (TNN) is
introduced, which unifies both, consolidated knowledge in the form of
heat-transfer-based lumped-parameter models, and data-driven nonlinear function
approximation with supervised machine learning. A quasi-linear
parameter-varying system is identified solely from empirical data, where
relationships between scheduling variables and system matrices are inferred
statistically and automatically. At the same time, a TNN has physically
interpretable states through its state-space representation, is end-to-end
trainable -- similar to deep learning models -- with automatic differentiation,
and requires no material, geometry, nor expert knowledge for its design.
Experiments on an electric motor data set show that a TNN achieves higher
temperature estimation accuracies than previous white-/grey- or black-box
models with a mean squared error of and a worst-case error of
at 64 model parameters.Comment: Preprint; Fix typos, streamline math. notation; 10 page
Towards Secure Cloud Orchestration for Multi-Cloud Deployments
Cloud orchestration frameworks are commonly used to deploy and operate cloud infrastructure. Their role spans both vertically (deployment on infrastructure, platform, application and microservice levels) and horizontally (deployments from many distinct cloud resource providers). However, despite the central role of orchestration, the popular orchestration frameworks lack mechanisms to provide security guarantees for cloud operators. In this work, we analyze the security landscape of cloud orchestration frameworks for multicloud infrastructure. We identify a set of attack scenarios, define security enforcement enablers and propose an architecture for a security-enabled cloud orchestration framework for multi-cloud application deployments
BEHAVIORAL CHARACTERIZATION OF ATTACKS ON THE REMOTE DESKTOP PROTOCOL
The Remote Desktop Protocol (RDP) is popular for enabling remote access and administration of Windows systems; however, attackers can take advantage of RDP to cause harm to critical systems using it. Detection and classification of RDP attacks is a challenge because most RDP traffic is encrypted, and it is not always clear which connections to a system are malicious after manual decryption of RDP traffic. In this research, we used open-source tools to generate and analyze RDP attack data using a power-grid honeypot under our control. We developed methods for detecting and characterizing RDP attacks through malicious signatures, Windows event log entries, and network traffic metadata. Testing and evaluation of our characterization methods on actual attack data collected by four instances of our honeypot showed that we could effectively delineate benign and malicious RDP traffic and classify the severity of RDP attacks on unprotected or misconfigured Windows systems. The classification of attack patterns and severity levels can inform defenders of adversarial behavior in RDP attacks. Our results can also help protect national critical infrastructure, including Department of Defense systems.DOE, Washington DC 20805Civilian, SFSApproved for public release. Distribution is unlimited
- …