774 research outputs found
Disk failure prediction based on multi-layer domain adaptive learning
Large scale data storage is susceptible to failure. As disks are damaged and
replaced, traditional machine learning models, which rely on historical data to
make predictions, struggle to accurately predict disk failures. This paper
presents a novel method for predicting disk failures by leveraging multi-layer
domain adaptive learning techniques. First, disk data with numerous faults is
selected as the source domain, and disk data with fewer faults is selected as
the target domain. A training of the feature extraction network is performed
with the selected origin and destination domains. The contrast between the two
domains facilitates the transfer of diagnostic knowledge from the domain of
source and target. According to the experimental findings, it has been
demonstrated that the proposed technique can generate a reliable prediction
model and improve the ability to predict failures on disk data with few failure
samples
An improved CTGAN for data processing method of imbalanced disk failure
To address the problem of insufficient failure data generated by disks and
the imbalance between the number of normal and failure data. The existing
Conditional Tabular Generative Adversarial Networks (CTGAN) deep learning
methods have been proven to be effective in solving imbalance disk failure
data. But CTGAN cannot learn the internal information of disk failure data very
well. In this paper, a fault diagnosis method based on improved CTGAN, a
classifier for specific category discrimination is added and a discriminator
generate adversarial network based on residual network is proposed. We named it
Residual Conditional Tabular Generative Adversarial Networks (RCTGAN). Firstly,
to enhance the stability of system a residual network is utilized. RCTGAN uses
a small amount of real failure data to synthesize fake fault data; Then, the
synthesized data is mixed with the real data to balance the amount of normal
and failure data; Finally, four classifier (multilayer perceptron, support
vector machine, decision tree, random forest) models are trained using the
balanced data set, and the performance of the models is evaluated using G-mean.
The experimental results show that the data synthesized by the RCTGAN can
further improve the fault diagnosis accuracy of the classifier
A Survey of Methods for Handling Disk Data Imbalance
Class imbalance exists in many classification problems, and since the data is
designed for accuracy, imbalance in data classes can lead to classification
challenges with a few classes having higher misclassification costs. The
Backblaze dataset, a widely used dataset related to hard discs, has a small
amount of failure data and a large amount of health data, which exhibits a
serious class imbalance. This paper provides a comprehensive overview of
research in the field of imbalanced data classification. The discussion is
organized into three main aspects: data-level methods, algorithmic-level
methods, and hybrid methods. For each type of method, we summarize and analyze
the existing problems, algorithmic ideas, strengths, and weaknesses.
Additionally, the challenges of unbalanced data classification are discussed,
along with strategies to address them. It is convenient for researchers to
choose the appropriate method according to their needs
The 1993/1994 NASA Graduate Student Researchers Program
The NASA Graduate Student Researchers Program (GSRP) attempts to reach a culturally diverse group of promising U.S. graduate students whose research interests are compatible with NASA's programs in space science and aerospace technology. Each year we select approximately 100 new awardees based on competitive evaluation of their academic qualifications, their proposed research plan and/or plan of study, and their planned utilization of NASA research facilities. Fellowships of up to $22,000 are awarded for one year and are renewable, based on satisfactory progress, for a total of three years. Approximately 300 graduate students are, thus, supported by this program at any one time. Students may apply any time during their graduate career or prior to receiving their baccalaureate degree. An applicant must be sponsored by his/her graduate department chair or faculty advisor; this book discusses the GSRP in great detail
Internet Predictions
More than a dozen leading experts give their opinions on where the Internet is headed and where it will be in the next decade in terms of technology, policy, and applications. They cover topics ranging from the Internet of Things to climate change to the digital storage of the future. A summary of the articles is available in the Web extras section
The 1991/92 graduate student researchers program, including the underrepresented minority focus component
The Graduate Student Research Program (GSRP) was expanded in 1987 to include the Underrepresented Minority Focus Component (UMFC). This program was designed to increase minority participation in graduate study and research, and ultimately, in space science and aerospace technology careers. This booklet presents the areas of research activities at NASA facilities for the GSRP and summarizes and presents the objectives of the UMFC
The 1995 NASA guide to graduate support
The future of the United States is in the classrooms of America and tomorrow's scientific and technological capabilities are derived from today's investments in research. In 1980, NASA initiated the Graduate Student Researchers Program (GSRP) to cultivate additional research ties to the academic community and to support promising students pursuing advanced degrees in science and engineering. Since then, approximately 1300 students have completed the program's requirements. In 1987, the program was expanded to include the Underrepresented Minority and Disabled Focus (UMDF) Component. This program was designed to increase participation of underrepresented groups in graduate study and research and, ultimately, in space science and aerospace technology careers. Approximately 270 minority students have completed the program's requirements while making significant contributions to the nation's aerospace efforts. Continuing to expand fellowship opportunities, NASA announced the Graduate Student Fellowships in Global Change Research in 1990. Designed to support the rapid growth in the study of earth as a system, more than 250 fellowships have been awarded. And, in 1992, NASA announced opportunities in the multiagency High Performance Computing and Communications (HPCC) Program designed to accelerate the development and application of massively parallel processing. Approximately five new fellowships will be awarded yearly. This booklet will guide you in your efforts to participate in programs for graduate student support
Recommended from our members
Social network support for data delivery infrastructures
Network infrastructures often need to stage content so that it is accessible to consumers. The standard solution, deploying the content on a centralised server, can be inadequate in several situations.
Our thesis is that information encoded in social networks can be used to tailor content staging decisions to the user base and thereby build better data delivery infrastructures. This claim is supported by two case studies, which apply social information in challenging situations where traditional content staging is infeasible. Our approach works by examining empirical traces to identify relevant social properties, and then exploits them.
The first study looks at cost-effectively serving the ``Long Tail'' of rich-media user-generated content, which need to be staged close to viewers to control latency and jitter. Our traces show that a preference for the unpopular tail items often spreads virally and is localised to some part of the social network. Exploiting this, we propose Buzztraq, which decreases replication costs by selectively copying items to locations favoured by viral spread. We also design SpinThrift, which separates popular and unpopular content based on the relative proportion of viral accesses, and opportunistically spins down disks containing unpopular content, thereby saving energy.
The second study examines whether human face-to-face contacts can efficiently create paths over time between arbitrary users. Here, content is staged by spreading it through intermediate users until the destination is reached. Flooding every node minimises delivery times but is not scalable. We show that the human contact network is resilient to individual path failures, and for unicast paths, can efficiently approximate flooding in delivery time distribution simply by randomly sampling a handful of paths found by it. Multicast by contained flooding within a community is also efficient. However, connectivity relies on rare contacts and frequent contacts are often not useful for data delivery.
Also, periods of similar duration could achieve different levels of connectivity; we devise a test to identify good periods. We finish by discussing how these properties influence routing algorithms.This work was supported by a St. John's College Benefactor's Scholarship and a Research Studentship from the Cambridge Philosophical Society
Data-Driven Intelligent Scheduling For Long Running Workloads In Large-Scale Datacenters
Cloud computing is becoming a fundamental facility of society today. Large-scale public or private cloud datacenters spreading millions of servers, as a warehouse-scale computer, are supporting most business of Fortune-500 companies and serving billions of users around the world. Unfortunately, modern industry-wide average datacenter utilization is as low as 6% to 12%. Low utilization not only negatively impacts operational and capital components of cost efficiency, but also becomes the scaling bottleneck due to the limits of electricity delivered by nearby utility. It is critical and challenge to improve multi-resource efficiency for global datacenters.
Additionally, with the great commercial success of diverse big data analytics services, enterprise datacenters are evolving to host heterogeneous computation workloads including online web services, batch processing, machine learning, streaming computing, interactive query and graph computation on shared clusters. Most of them are long-running workloads that leverage long-lived containers to execute tasks.
We concluded datacenter resource scheduling works over last 15 years. Most previous works are designed to maximize the cluster efficiency for short-lived tasks in batch processing system like Hadoop. They are not suitable for modern long-running workloads of Microservices, Spark, Flink, Pregel, Storm or Tensorflow like systems. It is urgent to develop new effective scheduling and resource allocation approaches to improve efficiency in large-scale enterprise datacenters.
In the dissertation, we are the first of works to define and identify the problems, challenges and scenarios of scheduling and resource management for diverse long-running workloads in modern datacenter. They rely on predictive scheduling techniques to perform reservation, auto-scaling, migration or rescheduling. It forces us to pursue and explore more intelligent scheduling techniques by adequate predictive knowledges. We innovatively specify what is intelligent scheduling, what abilities are necessary towards intelligent scheduling, how to leverage intelligent scheduling to transfer NP-hard online scheduling problems to resolvable offline scheduling issues.
We designed and implemented an intelligent cloud datacenter scheduler, which automatically performs resource-to-performance modeling, predictive optimal reservation estimation, QoS (interference)-aware predictive scheduling to maximize resource efficiency of multi-dimensions (CPU, Memory, Network, Disk I/O), and strictly guarantee service level agreements (SLA) for long-running workloads.
Finally, we introduced a large-scale co-location techniques of executing long-running and other workloads on the shared global datacenter infrastructure of Alibaba Group. It effectively improves cluster utilization from 10% to averagely 50%. It is far more complicated beyond scheduling that involves technique evolutions of IDC, network, physical datacenter topology, storage, server hardwares, operating systems and containerization. We demonstrate its effectiveness by analysis of newest Alibaba public cluster trace in 2017. We are the first of works to reveal the global view of scenarios, challenges and status in Alibaba large-scale global datacenters by data demonstration, including big promotion events like Double 11 .
Data-driven intelligent scheduling methodologies and effective infrastructure co-location techniques are critical and necessary to pursue maximized multi-resource efficiency in modern large-scale datacenter, especially for long-running workloads
- ā¦