46 research outputs found

    Data-level hybrid strategy selection for disk fault prediction model based on multivariate GAN

    Full text link
    Data class imbalance is a common problem in classification problems, where minority class samples are often more important and more costly to misclassify in a classification task. Therefore, it is very important to solve the data class imbalance classification problem. The SMART dataset exhibits an evident class imbalance, comprising a substantial quantity of healthy samples and a comparatively limited number of defective samples. This dataset serves as a reliable indicator of the disc's health status. In this paper, we obtain the best balanced disk SMART dataset for a specific classification model by mixing and integrating the data synthesised by multivariate generative adversarial networks (GAN) to balance the disk SMART dataset at the data level; and combine it with genetic algorithms to obtain higher disk fault classification prediction accuracy on a specific classification model

    A Survey of Methods for Handling Disk Data Imbalance

    Full text link
    Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level methods, and hybrid methods. For each type of method, we summarize and analyze the existing problems, algorithmic ideas, strengths, and weaknesses. Additionally, the challenges of unbalanced data classification are discussed, along with strategies to address them. It is convenient for researchers to choose the appropriate method according to their needs

    Genome-wide analysis reveals four key transcription factors associated with cadmium stress in creeping bentgrass (Agrostis stolonifera L.)

    Get PDF
    Cadmium (Cd) toxicity seriously affects the growth and development of plants, so studies on uptake, translocation, and accumulation of Cd in plants are crucial for phytoremediation. However, the molecular mechanism of the plant response to Cd stress remains poorly understood. The main objective of this study was to reveal differentially expressed genes (DEGs) under lower (BT2_5) and higher (BT43) Cd concentration treatments in creeping bentgrass. A total of 463,184 unigenes were obtained from creeping bentgrass leaves using RNA sequencing technology. Observation of leaf tissue morphology showed that the higher Cd concentration damages leaf tissues. Four key transcription factor (TF) families, WRKY, bZIP, ERF, and MYB, are associated with Cd stress in creeping bentgrass. Our findings revealed that these four TFs play crucial roles during the creeping bentgrass response to Cd stress. This study is mainly focused on the molecular characteristics of DEGs under Cd stress using transcriptomic analysis in creeping bentgrass. These results provide novel insight into the regulatory mechanisms of respond to Cd stress and enrich information for phytoremediation

    A Systematic Study of Associations between Supernova Remnants and Molecular Clouds

    Full text link
    We universally search for evidence of kinematic and spatial correlation of supernova remnant (SNR) and molecular cloud (MC) associations for nearly all SNRs in the coverage of the MWISP CO survey, i.e. 149 SNRs, 170 SNR candidates, and 18 pure pulsar wind nebulae (PWNe) in 1 deg < l < 230 deg and -5.5 deg < b < 5.5 deg. Based on high-quality and unbiased 12CO/13CO/C18O (J = 1--0) survey data, we apply automatic algorithms to identify broad lines and spatial correlations for molecular gas in each SNR region. The 91% of SNR-MC associations detected previously are identified in this paper by CO line emission. Overall, there could be as high as 80% of SNRs associated with MCs. The proportion of SNRs associated with MCs is high within the Galactic longitude less than ~50 deg. Kinematic distances of all SNRs that are associated with MCs are estimated based on systemic velocities of associated MCs. The radius of SNRs associated with MCs follows a lognormal distribution, which peaks at ~8.1 pc. The progenitor initial mass of these SNRs follows a power-law distribution with an index of ~-2.3 that is consistent with the Salpeter index of -2.35. We find that SNR-MC associations are mainly distributed in a thin disk along the Galactic plane, while a small amount distributed in a thick disk. With the height of these SNRs from the Galactic plane below ~45 pc, the distribution of the average radius relative to the height of them is roughly flat, and the average radius increases with the height when above ~45 pc.Comment: 77 pages, 20 figures, 4 tables (with machine-readable versions), accepted for publication in ApJ

    Foxtail Mosaic Virus-induced Flowering Assays in Monocot Crops

    Get PDF
    Virus-induced flowering (VIF) exploits RNA or DNA viruses to express flowering time genes to induce flowering in plants. Such plant virus-based tools have recently attracted widespread attention for their fundamental and applied uses in flowering physiology and in accelerating breeding in dicotyledonous crops and woody fruit-trees. We now extend this technology to a monocot grass and a cereal crop. Using the Foxtail mosaic virus-based VIF system, dubbed FoMViF, we showed that expression of florigenic Flowering Locus T (FT) genes can promote early flowering and spikelet development in proso millet, a C4 grass species with potential for nutritional food and biofuel resources, and in non-vermalized C3 wheat, a major food crop worldwide. Floral and spikelet/grain induction in the two monocot plants was caused by the virally expressed untagged or FLAG-tagged FT orthologues, and the florigenic activity of rice Hd3a was more pronounced than its dicotyledonous counterparts in proso millet. The FoMViF system is easy to perform and its efficacy to induce flowering and early spikelet/grain production is high. In addition to proso millet and wheat, we envisage that FoMViF will be also applicable to many economically important monocotyledonous food and biofuel crops

    Uncovering a Phenomenon of Active Hormone Transcriptional Regulation during Early Somatic Embryogenesis in Medicago sativa

    No full text
    Somatic embryogenesis (SE) is a developmental process in which somatic cells undergo dedifferentiation to become plant stem cells, and redifferentiation to become a whole embryo. SE is a prerequisite for molecular breeding and is an excellent platform to study cell development in the majority of plant species. However, the molecular mechanism involved in M. sativa somatic embryonic induction, embryonic and maturation is unclear. This study was designed to examine the differentially expressed genes (DEGs) and miRNA roles during somatic embryonic induction, embryonic and maturation. The cut cotyledon (ICE), non-embryogenic callus (NEC), embryogenic callus (EC) and cotyledon embryo (CE) were selected for transcriptome and small RNA sequencing. The results showed that 17,251 DEGs, and 177 known and 110 novel miRNAs families were involved in embryonic induction (ICE to NEC), embryonic (NEC to EC), and maturation (EC to CE). Expression patterns and functional classification analysis showed several novel genes and miRNAs involved in SE. Moreover, embryonic induction is an active process of molecular regulation, and hormonal signal transduction related to pathways involved in the whole SE. Finally, a miRNA&ndash;target interaction network was proposed during M. sativa SE. This study provides novel perspectives to comprehend the molecular mechanisms in M. sativa SE

    High-Temperature Disaster Risk Assessment for Urban Communities: A Case Study in Wuhan, China

    No full text
    High-temperature risk disaster, a common meteorological disaster, seriously affects people&rsquo;s productivity, life, and health. However, insufficient attention has been paid to this disaster in urban communities. To assess the risk of high-temperature disasters, this study, using remote sensing data and geographic information data, analyzes 973 communities in downtown Wuhan with the geography-weighted regression method. First, the study evaluates the distribution characteristics of high temperatures in communities and explores the spatial differences of risks. Second, a metrics and weight system is constructed, from which the main factors are determined. Third, a risk assessment model of high-temperature disasters is established from disaster-causing danger, disaster-generating sensitivity, and disaster-bearing vulnerability. The results show that: (a) the significance of the impact of the built environment on high-temperature disasters is obviously different from its coefficient space differentiation; (b) the risk in the old city is high, whereas that in the area around the river is low; and (c) different risk areas should design built environment optimization strategies aimed specifically at the area. The significance of this study is that it develops a high-temperature disaster assessment framework for risk identification, impact differentiation, and difference optimization, and provides theoretical support for urban high-temperature disaster prevention and mitigation
    corecore