246 research outputs found
Qualitative Models of Climate Variations Impact on Crop Yields
This report presents application of machine learning methodology in modeling process. The objective is to identify and explain impact of weather variations on crop yields, and to test the approach on the agricultural and climatic data from the USA.
First, separation of weather and non-weather factors is performed by trend identification- two methods of trend identification are considered. Then the importance of the attributes is assessed using information gain measure and attributes are also aggregated into seasons. Finally, four types of classification methods (support vector machines, nearestneighbors classifier and two variants of decision rules induction) are applied to the data and the results are compared and analyzed.
The proposed approach differs from standard approaches to the crop yields modeling. It does not require a lot of expert knowledge, nor assume anything about the data distributions. All conclusions are drawn from data and the final model is build only from data. This approach is much simpler, however it maintains high accuracy and performanc
Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families
We study online learning under logarithmic loss with regular parametric
models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction
strategy with Jeffreys prior and sequential normalized maximum likelihood
(SNML) coincide and are optimal if and only if the latter is exchangeable, and
if and only if the optimal strategy can be calculated without knowing the time
horizon in advance. They put forward the question what families have
exchangeable SNML strategies. This paper fully answers this open problem for
one-dimensional exponential families. The exchangeability can happen only for
three classes of natural exponential family distributions, namely the Gaussian,
Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML
Exchangeability, Exponential Family, Online Learning, Logarithmic Loss,
Bayesian Strategy, Jeffreys Prior, Fisher Information1Comment: 23 page
Learning to Crawl
Web crawling is the problem of keeping a cache of webpages fresh, i.e.,
having the most recent copy available when a page is requested. This problem is
usually coupled with the natural restriction that the bandwidth available to
the web crawler is limited. The corresponding optimization problem was solved
optimally by Azar et al. [2018] under the assumption that, for each webpage,
both the elapsed time between two changes and the elapsed time between two
requests follow a Poisson distribution with known parameters. In this paper, we
study the same control problem but under the assumption that the change rates
are unknown a priori, and thus we need to estimate them in an online fashion
using only partial observations (i.e., single-bit signals indicating whether
the page has changed since the last refresh). As a point of departure, we
characterise the conditions under which one can solve the problem with such
partial observability. Next, we propose a practical estimator and compute
confidence intervals for it in terms of the elapsed time between the
observations. Finally, we show that the explore-and-commit algorithm achieves
an regret with a carefully chosen exploration horizon.
Our simulation study shows that our online policy scales well and achieves
close to optimal performance for a wide range of the parameters.Comment: Published at AAAI 202
Recommended from our members
A Conceptual Framework for E-Branding Strategies in the Non-Profit Sector
Despite the economic significance non-profit organizations (NPOs) have acquired in recent years, the implications of the Internet for NPO marketing seem to have received only little attention from both researchers and practi tioners. Although NPO marketing has been the subject of academic research for more than 30 years, NPO brand\u27ing has been studied for just about 10 years (cf. Hankinson, 2001). Recent research on branding in the nonprofit s«:ctor includes, for example, an assessment of the impact of brand orientation on non-profit performance (Hanldnson, 2001; Hankinson, 2002) and the development of a non-profit brand orientation scale (Ewing and Napoli 200^1). In particular, the small body of literature on e-branding for NPOs calls for an investigation into e-branding stratsgie:s for NPOs. Ruscli (2002) rhetorically asks if investing in a brand can be seen as a frivolous activity for NPOs in view of their not-for-profit mission, but he concludes that a better understanding of NPO branding will lead to a more effective use of their donations, which eventually furthers the NPOs\u27 causes. Although the peculiar organizational structure of NPOs fosters creativity and innovation, they often lack the motivation to exploit these opportunities commercially. Also, it seems that NPOs have not yet seized the opportunity to fully integrate the Internet into their busraess processes with a view to enhancing their core competencies, even though NPOs - particularly educational institutions - vrere actually the first organizations to use the Internet (Clay, 2002). Based on the assumption that well thought-out strategies for internal and extemal communication will help NPOs to build such e-brands, this paper begins with an outline of relevant aspects of both NPOs and e-communication. The main argument put forward in this paper is that successful e-branding for NPOs is determined by the aligmnent of intra-organizational and extemal communication capabilities. The conceptual framework for NPO e-branding we arrived at is based on qualitative inten\u27iews with NPOs from different sectors and an examination of their public Web sites. The paper concludes with hands-on recommendations for NPO communication strategies and suggestions for further research
âYOUâRE WRITING ABOUT WHOM?â STUDYING POLITICAL AND POLICY HISTORY THROUGH THE LIVES OF SECONDARY FIGURES
ABSTRACTâYouâre Writing About Who?â:Studying Political and Policy History through Secondary FiguresBiography, once a denigrated ield among academic historians, is undergoing a revival, at least according to a recent issue of the American Historical Review. For a long time, it has been enticing to write life stories of âthe greatsâ â monarchs, presidents, and even dictators. More recently, with the emergence of social and cultural history, âthe gruntsâ â ordinary people who made possible mass-based movements for change â have begun to receive their due. But what about those inbetween the greats and grunts, particularly leaders in the contemporary era who never made to the top of âthe greasy pole,â to invoke Disraeliâs famous phrase. Drawing examples from my forthcoming biography of Paul V. McNutt, an American politician who helped shape events during the era of Franklin D. Roosevelt and Harry S. Truman, this article explores the merits and demerits of writing about secondary figures, that is, supporting actors in larger political dramas. Studies of such men and women make inviting topics â and a natural fit at university presses, especially state and regional ones â partly because their lives and impact have been overlooked by earlier scholars. Usually, secondary figures have left behind a cache of papers from which the historian can begin to reconstruct their stories. Biography as methodology remains inherently integrative; it allows one to combine traditional political history with more recent trends in historiography,such as emphases on the importance of gender, âplace,â and the âinternationalizationâ of history. At the same time, however, the anonymity of many secondary figures can prove frustrating as biographers struggle to explain, justify, and secure funding for their research topics. Tracking down sources also can be difficult, involving considerable time and expense in traveling to local, state, national and overseas archives
Comparison of machine learning algorithms used to classify the asteroids observed by all-sky surveys
Context. Multifilter photometry from large sky surveys is commonly used to assign asteroid taxonomic types and study various problems in planetary science. To maximize the science output of those surveys, it is important to use methods that best link the spectro-photometric measurements to asteroid taxonomy. Aims. We aim to determine which machine learning methods are the most suitable for the taxonomic classification for various sky surveys. Methods. We utilized five machine learning supervised classifiers: logistic regression, naive Bayes, support vector machines (SVMs), gradient boosting, and MultiLayer Perceptrons (MLPs). Those methods were found to reproduce the Bus-DeMeo taxonomy at various rates depending on the set of filters used by each survey. We report several evaluation metrics for a comprehensive comparison (prediction accuracy, balanced accuracy, F1 score, and the Matthews correlation coefficient) for 11 surveys and space missions. Results. Among the methods analyzed, multilayer perception and gradient boosting achieved the highest accuracy and naive Bayes achieved the lowest accuracy in taxonomic prediction across all surveys. We found that selecting the right machine learning algorithm can improve the success rate by a factor of >2. The best balanced accuracy (similar to 85% for a taxonomic type prediction) was found for the Visible and Infrared Survey telescope for Astronomy (VISTA) and the ESA Euclid mission surveys where broadband filters best map the 1 mu m and 2 mu m olivine and pyroxene absorption bands. Conclusions. To achieve the highest accuracy in the taxonomic type prediction based on multifilter photometric measurements, we recommend the use of gradient boosting and MLP optimized for each survey. This can improve the overall success rate even when compared with naive Bayes. A merger of different datasets can further boost the prediction accuracy. For the combination of the Legacy Survey of Space and Time and VISTA survey, we achieved 90% for the taxonomic type prediction.Peer reviewe
- âŠ