246 research outputs found

    Qualitative Models of Climate Variations Impact on Crop Yields

    Get PDF
    This report presents application of machine learning methodology in modeling process. The objective is to identify and explain impact of weather variations on crop yields, and to test the approach on the agricultural and climatic data from the USA. First, separation of weather and non-weather factors is performed by trend identification- two methods of trend identification are considered. Then the importance of the attributes is assessed using information gain measure and attributes are also aggregated into seasons. Finally, four types of classification methods (support vector machines, nearestneighbors classifier and two variants of decision rules induction) are applied to the data and the results are compared and analyzed. The proposed approach differs from standard approaches to the crop yields modeling. It does not require a lot of expert knowledge, nor assume anything about the data distributions. All conclusions are drawn from data and the final model is build only from data. This approach is much simpler, however it maintains high accuracy and performanc

    Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families

    Full text link
    We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They put forward the question what families have exchangeable SNML strategies. This paper fully answers this open problem for one-dimensional exponential families. The exchangeability can happen only for three classes of natural exponential family distributions, namely the Gaussian, Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML Exchangeability, Exponential Family, Online Learning, Logarithmic Loss, Bayesian Strategy, Jeffreys Prior, Fisher Information1Comment: 23 page

    Learning to Crawl

    Full text link
    Web crawling is the problem of keeping a cache of webpages fresh, i.e., having the most recent copy available when a page is requested. This problem is usually coupled with the natural restriction that the bandwidth available to the web crawler is limited. The corresponding optimization problem was solved optimally by Azar et al. [2018] under the assumption that, for each webpage, both the elapsed time between two changes and the elapsed time between two requests follow a Poisson distribution with known parameters. In this paper, we study the same control problem but under the assumption that the change rates are unknown a priori, and thus we need to estimate them in an online fashion using only partial observations (i.e., single-bit signals indicating whether the page has changed since the last refresh). As a point of departure, we characterise the conditions under which one can solve the problem with such partial observability. Next, we propose a practical estimator and compute confidence intervals for it in terms of the elapsed time between the observations. Finally, we show that the explore-and-commit algorithm achieves an O(T)\mathcal{O}(\sqrt{T}) regret with a carefully chosen exploration horizon. Our simulation study shows that our online policy scales well and achieves close to optimal performance for a wide range of the parameters.Comment: Published at AAAI 202

    ‘YOU’RE WRITING ABOUT WHOM?’ STUDYING POLITICAL AND POLICY HISTORY THROUGH THE LIVES OF SECONDARY FIGURES

    Get PDF
    ABSTRACT‘You’re Writing About Who?’:Studying Political and Policy History through Secondary FiguresBiography, once a denigrated ield among academic historians, is undergoing a revival, at least according to a recent issue of the American Historical Review. For a long time, it has been enticing to write life stories of “the greats” – monarchs, presidents, and even dictators. More recently, with the emergence of social and cultural history, “the grunts” – ordinary people who made possible mass-based movements for change – have begun to receive their due. But what about those inbetween the greats and grunts, particularly leaders in the contemporary era who never made to the top of “the greasy pole,” to invoke Disraeli’s famous phrase. Drawing examples from my forthcoming biography of Paul V. McNutt, an American politician who helped shape events during the era of Franklin D. Roosevelt and Harry S. Truman, this article explores the merits and demerits of writing about secondary figures, that is, supporting actors in larger political dramas. Studies of such men and women make inviting topics – and a natural fit at university presses, especially state and regional ones – partly because their lives and impact have been overlooked by earlier scholars. Usually, secondary figures have left behind a cache of papers from which the historian can begin to reconstruct their stories. Biography as methodology remains inherently integrative; it allows one to combine traditional political history with more recent trends in historiography,such as emphases on the importance of gender, “place,” and the “internationalization” of history. At the same time, however, the anonymity of many secondary figures can prove frustrating as biographers struggle to explain, justify, and secure funding for their research topics. Tracking down sources also can be difficult, involving considerable time and expense in traveling to local, state, national and overseas archives

    Comparison of machine learning algorithms used to classify the asteroids observed by all-sky surveys

    Get PDF
    Context. Multifilter photometry from large sky surveys is commonly used to assign asteroid taxonomic types and study various problems in planetary science. To maximize the science output of those surveys, it is important to use methods that best link the spectro-photometric measurements to asteroid taxonomy. Aims. We aim to determine which machine learning methods are the most suitable for the taxonomic classification for various sky surveys. Methods. We utilized five machine learning supervised classifiers: logistic regression, naive Bayes, support vector machines (SVMs), gradient boosting, and MultiLayer Perceptrons (MLPs). Those methods were found to reproduce the Bus-DeMeo taxonomy at various rates depending on the set of filters used by each survey. We report several evaluation metrics for a comprehensive comparison (prediction accuracy, balanced accuracy, F1 score, and the Matthews correlation coefficient) for 11 surveys and space missions. Results. Among the methods analyzed, multilayer perception and gradient boosting achieved the highest accuracy and naive Bayes achieved the lowest accuracy in taxonomic prediction across all surveys. We found that selecting the right machine learning algorithm can improve the success rate by a factor of >2. The best balanced accuracy (similar to 85% for a taxonomic type prediction) was found for the Visible and Infrared Survey telescope for Astronomy (VISTA) and the ESA Euclid mission surveys where broadband filters best map the 1 mu m and 2 mu m olivine and pyroxene absorption bands. Conclusions. To achieve the highest accuracy in the taxonomic type prediction based on multifilter photometric measurements, we recommend the use of gradient boosting and MLP optimized for each survey. This can improve the overall success rate even when compared with naive Bayes. A merger of different datasets can further boost the prediction accuracy. For the combination of the Legacy Survey of Space and Time and VISTA survey, we achieved 90% for the taxonomic type prediction.Peer reviewe
    • 

    corecore