6 research outputs found

    A Study of Neural Collapse Phenomenon: Grassmannian Frame, Symmetry, Generalization

    Full text link
    In this paper, we extends original Neural Collapse Phenomenon by proving Generalized Neural Collapse hypothesis. We obtain Grassmannian Frame structure from the optimization and generalization of classification. This structure maximally separates features of every two classes on a sphere and does not require a larger feature dimension than the number of classes. Out of curiosity about the symmetry of Grassmannian Frame, we conduct experiments to explore if models with different Grassmannian Frames have different performance. As a result, we discover the Symmetric Generalization phenomenon. We provide a theorem to explain Symmetric Generalization of permutation. However, the question of why different directions of features can lead to such different generalization is still open for future investigation.Comment: 25 pages, 2 figure

    Towards Decision-Friendly AUC: Learning Multi-Classifier with AUCĀµ

    No full text
    Area Under the ROC Curve (AUC) is a widely used ranking metric in imbalanced learning due to its insensitivity to label distributions. As a well-known multiclass extension of AUC, Multiclass AUC (MAUC, a.k.a. M-metric) measures the average AUC of multiple binary classifiers. In this paper, we argue that simply optimizing MAUC is far from enough for imbalanced multi-classification. More precisely, MAUC only focuses on learning scoring functions via ranking optimization, while leaving the decision process unconsidered. Therefore, scoring functions being able to make good decisions might suffer from low performance in terms of MAUC. To overcome this issue, we turn to explore AUCĀµ, another multiclass variant of AUC, which further takes the decision process into consideration. Motivated by this fact, we propose a surrogate risk optimization framework to improve model performance from the perspective of AUCĀµ. Practically, we propose a two-stage training framework for multi-classification, where at the first stage a scoring function is learned maximizing AUCĀµ, and at the second stage we seek for a decision function to improve the F1-metric via our proposed soft F1. Theoretically, we first provide sufficient conditions that optimizing the surrogate losses could lead to the Bayes optimal scoring function. Afterward, we show that the proposed surrogate risk enjoys a generalization bound in order of O(1/āˆšN). Experimental results on four benchmark datasets demonstrate the effectiveness of our proposed method in both AUCĀµ and F1-metric

    eForecaster: Unifying Electricity Forecasting with Robust, Flexible, and Explainable Machine Learning Algorithms

    No full text
    Electricity forecasting is crucial in scheduling and planning of future electric load, so as to improve the reliability and safeness of the power grid. Despite recent developments of forecasting algorithms in the machine learning community, there is a lack of general and advanced algorithms specifically considering requirements from the power industry perspective. In this paper, we present eForecaster, a unified AI platform including robust, flexible, and explainable machine learning algorithms for diversified electricity forecasting applications. Since Oct. 2021, multiple commercial bus load, system load, and renewable energy forecasting systems built upon eForecaster have been deployed in seven provinces of China. The deployed systems consistently reduce the average Mean Absolute Error (MAE) by 39.8% to 77.0%, with reduced manual work and explainable guidance. In particular, eForecaster also integrates multiple interpretation methods to uncover the working mechanism of the predictive models, which significantly improves forecasts adoption and user satisfaction

    Genome-Wide Characterization Reveals Variation Potentially Involved in Pathogenicity and Mycotoxins Biosynthesis of Fusarium proliferatum Causing Spikelet Rot Disease in Rice

    No full text
    Fusarium proliferatum is the primary cause of spikelet rot disease in rice (Oryza sativa L.) in China. The pathogen not only infects a wide range of cereals, causing severe yield losses but also contaminates grains by producing various mycotoxins that are hazardous to humans and animals. Here, we firstly reported the whole-genome sequence of F. proliferatum strain Fp9 isolated from the rice spikelet. The genome was approximately 43.9 Mb with an average GC content of 48.28%, and it was assembled into 12 scaffolds with an N50 length of 4,402,342 bp. There is a close phylogenetic relationship between F. proliferatum and Fusarium fujikuroi, the causal agent of the bakanae disease of rice. The expansion of genes encoding cell wall-degrading enzymes and major facilitator superfamily (MFS) transporters was observed in F. proliferatum relative to other fungi with different nutritional lifestyles. Species-specific genes responsible for mycotoxins biosynthesis were identified among F. proliferatum and other Fusarium species. The expanded and unique genes were supposed to promote F. proliferatum adaptation and the rapid response to the host’s infection. The high-quality genome of F. proliferatum strain Fp9 provides a valuable resource for deciphering the mechanisms of pathogenicity and secondary metabolism, and therefore shed light on development of the disease management strategies and detoxification of mycotoxins contamination for spikelet rot disease in rice

    Data_Sheet_1_Genomic footprints related with adaptation and fumonisins production in Fusarium proliferatum.docx

    No full text
    Fusarium proliferatum is the principal etiological agent of rice spikelet rot disease (RSRD) in China, causing yield losses and fumonisins contamination in rice. The intraspecific variability and evolution pattern of the pathogen is poorly understood. Here, we performed whole-genome resequencing of 67 F. proliferatum strains collected from major rice-growing regions in China. Population structure indicated that eastern population of F. proliferatum located in Yangtze River with the high genetic diversity and recombinant mode that was predicted as the putative center of origin. Southern population and northeast population were likely been introduced into local populations through gene flow, and genetic differentiation between them might be shaped by rice-driven domestication. A total of 121 distinct genomic loci implicated 85 candidate genes were suggestively associated with variation of fumonisin B1 (FB1) production by genome-wide association study (GWAS). We subsequently tested the function of five candidate genes (gabap, chsD, palA, hxk1, and isw2) mapped in our association study by FB1 quantification of deletion strains, and mutants showed the impact on FB1 production as compared to the wide-type strain. Together, this is the first study to provide insights into the evolution and adaptation in natural populations of F. proliferatum on rice, as well as the complex genetic architecture for fumonisins biosynthesis.</p
    corecore