2 research outputs found

    Fast Optimization Algorithms for AUC Maximization

    No full text
    Stochastic optimizations algorithms like stochastic gradient descent (SGD) are favorable for large-scale data analysis because they update the model sequentially and with low per-iteration costs. Much of the existing work focuses on optimizing accuracy, however, it is known that accuracy is not an appropriate measure for class imbalanced data. Area under the ROC curve (AUC) is a standard metric that is used to measure classification performance for such a situation. Therefore, developing stochastic learning algorithms that maximize AUC in lieu of accuracy is of both theoretical and practical interest. However, AUC maximization presents a challenge since the learning objective function is defined over a pair of instances of opposite classes. Existing methods can overcome this issue and achieve online processing but with higher space and time complexity. In this thesis, we will develop two novel stochastic algorithms for AUC maximization. The first is an online method which is referred to as SPAM. In comparison to the previous literature, the algorithm can be applied to non-smooth penalty functions while achieving a convergence rate of O(log T / T). The second is a batch learning method which is referred to as SPDAM. We establish a linear convergence rate for a sufficiently large batch size. We demonstrate the effectiveness of such algorithms on standard benchmark data sets as well as data sets for anomaly detection tasks

    Patterns of mega-forest fires in east Siberia will become less predictable with climate warming

    No full text
    Very large fires covering tens to hundreds of hectares, termed mega-fires, have become a prominent feature of fire regime in taiga forests worldwide, and in Siberia in particular. Here, we applied an array of machine learning algorithms and statistical methods to estimate the relative importance of various factors in observed patterns of Eastern Siberian fires mapped with satellite data. More specifically, we tested linkages of “hot spot” ignitions with 42 variables representing landscape characteristics, climatic, and anthropogenic factors, such as human population density, locations of settlements and road networks. Analysis of data spanning seventeen years (2001–2017) showed that during low or moderately high fire seasons, models with full set of variables predict locations of fires with a very high probability (AUC = 95%). Sensitivity, or the ratio of correctly predicted fire pixels to the total number of pixels analyzed, declined to 30–40% during warm and dry years of increased fire activity, especially in models driven by anthropogenic variables only. This analysis demonstrates that if warming in Eastern Siberia continues, forest fires will become not only more frequent but also less predictable. We explain this by examining model performance as a function of either temperature or precipitation. This effect from climate makes it nearly impossible to segregate ignition points from locations, which were burnt several hours or even several days earlier. An increase in secondary burnt locations makes it difficult for machine learning algorithms to establish causality links with anthropogenic and other groups of variables
    corecore