774 research outputs found

    Disk failure prediction based on multi-layer domain adaptive learning

    Full text link
    Large scale data storage is susceptible to failure. As disks are damaged and replaced, traditional machine learning models, which rely on historical data to make predictions, struggle to accurately predict disk failures. This paper presents a novel method for predicting disk failures by leveraging multi-layer domain adaptive learning techniques. First, disk data with numerous faults is selected as the source domain, and disk data with fewer faults is selected as the target domain. A training of the feature extraction network is performed with the selected origin and destination domains. The contrast between the two domains facilitates the transfer of diagnostic knowledge from the domain of source and target. According to the experimental findings, it has been demonstrated that the proposed technique can generate a reliable prediction model and improve the ability to predict failures on disk data with few failure samples

    An improved CTGAN for data processing method of imbalanced disk failure

    Full text link
    To address the problem of insufficient failure data generated by disks and the imbalance between the number of normal and failure data. The existing Conditional Tabular Generative Adversarial Networks (CTGAN) deep learning methods have been proven to be effective in solving imbalance disk failure data. But CTGAN cannot learn the internal information of disk failure data very well. In this paper, a fault diagnosis method based on improved CTGAN, a classifier for specific category discrimination is added and a discriminator generate adversarial network based on residual network is proposed. We named it Residual Conditional Tabular Generative Adversarial Networks (RCTGAN). Firstly, to enhance the stability of system a residual network is utilized. RCTGAN uses a small amount of real failure data to synthesize fake fault data; Then, the synthesized data is mixed with the real data to balance the amount of normal and failure data; Finally, four classifier (multilayer perceptron, support vector machine, decision tree, random forest) models are trained using the balanced data set, and the performance of the models is evaluated using G-mean. The experimental results show that the data synthesized by the RCTGAN can further improve the fault diagnosis accuracy of the classifier

    A Survey of Methods for Handling Disk Data Imbalance

    Full text link
    Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level methods, and hybrid methods. For each type of method, we summarize and analyze the existing problems, algorithmic ideas, strengths, and weaknesses. Additionally, the challenges of unbalanced data classification are discussed, along with strategies to address them. It is convenient for researchers to choose the appropriate method according to their needs

    The 1993/1994 NASA Graduate Student Researchers Program

    Get PDF
    The NASA Graduate Student Researchers Program (GSRP) attempts to reach a culturally diverse group of promising U.S. graduate students whose research interests are compatible with NASA's programs in space science and aerospace technology. Each year we select approximately 100 new awardees based on competitive evaluation of their academic qualifications, their proposed research plan and/or plan of study, and their planned utilization of NASA research facilities. Fellowships of up to $22,000 are awarded for one year and are renewable, based on satisfactory progress, for a total of three years. Approximately 300 graduate students are, thus, supported by this program at any one time. Students may apply any time during their graduate career or prior to receiving their baccalaureate degree. An applicant must be sponsored by his/her graduate department chair or faculty advisor; this book discusses the GSRP in great detail

    Internet Predictions

    Get PDF
    More than a dozen leading experts give their opinions on where the Internet is headed and where it will be in the next decade in terms of technology, policy, and applications. They cover topics ranging from the Internet of Things to climate change to the digital storage of the future. A summary of the articles is available in the Web extras section

    The 1991/92 graduate student researchers program, including the underrepresented minority focus component

    Get PDF
    The Graduate Student Research Program (GSRP) was expanded in 1987 to include the Underrepresented Minority Focus Component (UMFC). This program was designed to increase minority participation in graduate study and research, and ultimately, in space science and aerospace technology careers. This booklet presents the areas of research activities at NASA facilities for the GSRP and summarizes and presents the objectives of the UMFC

    The 1995 NASA guide to graduate support

    Get PDF
    The future of the United States is in the classrooms of America and tomorrow's scientific and technological capabilities are derived from today's investments in research. In 1980, NASA initiated the Graduate Student Researchers Program (GSRP) to cultivate additional research ties to the academic community and to support promising students pursuing advanced degrees in science and engineering. Since then, approximately 1300 students have completed the program's requirements. In 1987, the program was expanded to include the Underrepresented Minority and Disabled Focus (UMDF) Component. This program was designed to increase participation of underrepresented groups in graduate study and research and, ultimately, in space science and aerospace technology careers. Approximately 270 minority students have completed the program's requirements while making significant contributions to the nation's aerospace efforts. Continuing to expand fellowship opportunities, NASA announced the Graduate Student Fellowships in Global Change Research in 1990. Designed to support the rapid growth in the study of earth as a system, more than 250 fellowships have been awarded. And, in 1992, NASA announced opportunities in the multiagency High Performance Computing and Communications (HPCC) Program designed to accelerate the development and application of massively parallel processing. Approximately five new fellowships will be awarded yearly. This booklet will guide you in your efforts to participate in programs for graduate student support

    Data-Driven Intelligent Scheduling For Long Running Workloads In Large-Scale Datacenters

    Get PDF
    Cloud computing is becoming a fundamental facility of society today. Large-scale public or private cloud datacenters spreading millions of servers, as a warehouse-scale computer, are supporting most business of Fortune-500 companies and serving billions of users around the world. Unfortunately, modern industry-wide average datacenter utilization is as low as 6% to 12%. Low utilization not only negatively impacts operational and capital components of cost efficiency, but also becomes the scaling bottleneck due to the limits of electricity delivered by nearby utility. It is critical and challenge to improve multi-resource efficiency for global datacenters. Additionally, with the great commercial success of diverse big data analytics services, enterprise datacenters are evolving to host heterogeneous computation workloads including online web services, batch processing, machine learning, streaming computing, interactive query and graph computation on shared clusters. Most of them are long-running workloads that leverage long-lived containers to execute tasks. We concluded datacenter resource scheduling works over last 15 years. Most previous works are designed to maximize the cluster efficiency for short-lived tasks in batch processing system like Hadoop. They are not suitable for modern long-running workloads of Microservices, Spark, Flink, Pregel, Storm or Tensorflow like systems. It is urgent to develop new effective scheduling and resource allocation approaches to improve efficiency in large-scale enterprise datacenters. In the dissertation, we are the first of works to define and identify the problems, challenges and scenarios of scheduling and resource management for diverse long-running workloads in modern datacenter. They rely on predictive scheduling techniques to perform reservation, auto-scaling, migration or rescheduling. It forces us to pursue and explore more intelligent scheduling techniques by adequate predictive knowledges. We innovatively specify what is intelligent scheduling, what abilities are necessary towards intelligent scheduling, how to leverage intelligent scheduling to transfer NP-hard online scheduling problems to resolvable offline scheduling issues. We designed and implemented an intelligent cloud datacenter scheduler, which automatically performs resource-to-performance modeling, predictive optimal reservation estimation, QoS (interference)-aware predictive scheduling to maximize resource efficiency of multi-dimensions (CPU, Memory, Network, Disk I/O), and strictly guarantee service level agreements (SLA) for long-running workloads. Finally, we introduced a large-scale co-location techniques of executing long-running and other workloads on the shared global datacenter infrastructure of Alibaba Group. It effectively improves cluster utilization from 10% to averagely 50%. It is far more complicated beyond scheduling that involves technique evolutions of IDC, network, physical datacenter topology, storage, server hardwares, operating systems and containerization. We demonstrate its effectiveness by analysis of newest Alibaba public cluster trace in 2017. We are the first of works to reveal the global view of scenarios, challenges and status in Alibaba large-scale global datacenters by data demonstration, including big promotion events like Double 11 . Data-driven intelligent scheduling methodologies and effective infrastructure co-location techniques are critical and necessary to pursue maximized multi-resource efficiency in modern large-scale datacenter, especially for long-running workloads
    • ā€¦
    corecore