8 research outputs found

    Would AI Stocks Estimate Be as Surprised to USDA Stocks Reports As Private Market Analysts?

    Get PDF
    The USDA survey-based Quarterly Agriculture Stocks (QAS) reports are the primary source of information regarding the relative supply of U.S. corn, soybeans, and wheat for the last fifty years. Research has examined USDA stock reports and their relevancy to the market (e.g., Isengildina-Massa et al., 2021). In addition, private industry analysts estimate expected quarterly grain stock reports before USDA releases them. Market information firms such as Bloomberg and Reuters publish a subset of these estimates a few days before the USDA reports. Previous research has found that when industry analysts have significant differences in stock expectations compared to what the USDA releases for grain stocks, market prices adjust rapidly to what the USDA found in their survey. Many media outlets and previous research attribute the differences in expectations and changes in market prices to a market surprise (e.g., Karali et al. (2020)). Market analysts, USDA officials, and researchers have offered four reasons for market surprises in the grain stocks reports. First, USDA surveys may need to account for grain in transit when surveying stocks. Second, the market often uses weight (e.g., 60 lbs per bushel) to determine supply, while survey estimates ask how much volume (e.g., bushels) is on the farm or in commercial storage. When there is a deviation in the average weight of a commodity for a season, there could be discrepancies between surveyed stocks and actual stocks by weight. Third, errors in estimating what portion of existing stocks are from old or new crop production may cause surprises in the final annual report before a change in the marketing year. For example, USDA asks in their survey how much old crop corn is on hand on September 1st, although some crops taken in by grain wholesalers can be new crops by this date. There can be discrepancies when the survey respondent must accurately segregate the new and old crop amounts. Fourth, USDA survey-based stock reports contain survey noise. Market analysts may need to account for survey noise in sequential estimates. This paper seeks to use AI methods and large datasets on grain movement to understand the primary reason market analysts are frequently surprised by USDA QAS reports. Given the recent surge in grain movement data, available grain quality data, and data on the output of significant demand sources of grain, particularly at a state level, it is possible to use advances in analyzing high dimensional data (e.g., random forest, gradient boosting) to develop an objective artificial intelligent (AI) market analyst. This paper aims to explore additional public data sources related to commodity demand and supply in the corn, wheat, and soybean markets and apply AI techniques to determine whether data analytics improves the prediction of QAS reports released by USDA for corn, soybeans, and wheat compared to market analysts estimates. Our primary research objective is to determine if AI can more accurately predict QAS estimates from USDA than the survey of Market analysts that Bloomberg and Reuters have historically provided. Our secondary objective is to decompose the surprise by the source of the surprise. In this effort, we use the Extreme Gradient Boosting ML model to predict the stock estimate of the three major commodities (Corn, Soybean, and Wheat). We used grain stocks and production by state, carry-over stock from the previous year, weekly grain loaded on trains and barges, weekly ethanol production, monthly ethanol crushed, and weekly accumulated exports, market analysts\u27 estimates from Bloomberg and Reuters from the year 2007 to the 4th quarter of 2022. We aggregated all these features every quarter to understand the estimate of stock. After accumulating all the features, we cross-checked the values with the national report of these particular years we found consistency among them. This means the features show actual values from each quarter to understand the accurate estimate of the stock. We also grouped each feature according to 10 Agricultural Regions. We found through our machine learning algorithm that production is the most important one to estimate the quarterly stock, with carry-over and accumulated exports in 2nd and 3rd most essential features of the model. We also found that ethanol production and grain exports have an inverse relation with the grain stock every quarter

    XGBoost: A Scalable Tree Boosting System.

    Get PDF
    ABSTRACT Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable endto-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems

    Efficient second-order gradient boosting for conditional random fields

    No full text
    Abstract Conditional random fields (CRFs) are an important class of models for accurate structured prediction, but effective design of the feature functions is a major challenge when applying CRF models to real world data. Gradient boosting, which is used to automatically induce and select feature functions, is a natural candidate solution to the problem. However, it is non-trivial to derive gradient boosting algorithms for CRFs due to the dense Hessian matrices introduced by variable dependencies. Existing approaches thus use only first-order information when optimizing likelihood, and hence face convergence issues. We incorporate second-order information by deriving a Markov Chain mixing rate bound to quantify the dependencies, and introduce a gradient boosting algorithm that iteratively optimizes an adaptive upper bound of the objective function. The resulting algorithm induces and selects features for CRFs via functional space optimization, with provable convergence guarantees. Experimental results on three real world datasets demonstrate that the mixing rate based upper bound is effective for learning CRFs with non-linear potentials
    corecore