6,010 research outputs found

    A Comparative Study of Ensemble-based Forecasting Models for Stock Index Prediction

    Get PDF
    Stock prices as time series are, often, non-linear and non-stationary. This paper presents an ensemble forecasting model that integrates Empirical Mode Decomposition (EMD) and its variation Ensemble Empirical Mode Decomposition (EEMD) with Artificial Neural Network (ANN) for short-term forecasts of stock index. In first stage, the data is decomposed into a smaller set of Intrinsic Mode Functions (IMFs) and residuals using EMD and EEMD. In the next stage, IMFs and residue are taken as the inputs for the ANN model to train and predict the future stock price. The methodology was tested with weekly Nifty data for a period of 8 years. The results suggest that the ensemble forecast model using aggregation of the decomposed series performs better than traditional ANN and Support Vector Regression Models. Further, trading strategies based on EEMD-ANN models yielded better return on investments than Buy-and-Hold strategy

    HotPoint: hot spot prediction server for protein interfaces

    Get PDF
    The energy distribution along the proteinā€“protein interface is not homogenous; certain residues contribute more to the binding free energy, called ā€˜hot spotsā€™. Here, we present a web server, HotPoint, which predicts hot spots in protein interfaces using an empirical model. The empirical model incorporates a few simple rules consisting of occlusion from solvent and total knowledge-based pair potentials of residues. The prediction model is computationally efficient and achieves high accuracy of 70%. The input to the HotPoint server is a protein complex and two chain identifiers that form an interface. The server provides the hot spot prediction results, a table of residue properties and an interactive 3D visualization of the complex with hot spots highlighted. Results are also downloadable as text files. This web server can be used for analysis of any proteinā€“protein interface which can be utilized by researchers working on binding sites characterization and rational design of small molecules for protein interactions. HotPoint is accessible at http://prism.ccbb.ku.edu.tr/hotpoint

    Predicting crop yields and soilā€plant nitrogen dynamics in the US Corn Belt

    Get PDF
    We used the Agricultural Production Systems sIMulator (APSIM) to predict and explain maize and soybean yields, phenology, and soil water and nitrogen (N) dynamics during the growing season in Iowa, USA. Historical, current and forecasted weather data were used to drive simulations, which were released in public four weeks after planting. In this paper, we (1) describe the methodology used to perform forecasts; (2) evaluate model prediction accuracy against data collected from 10 locations over four years; and (3) identify inputs that are key in forecasting yields and soil N dynamics. We found that the predicted median yield at planting was a very good indicator of endā€ofā€season yields (relative root mean square error [RRMSE] of āˆ¼20%). For reference, the prediction at maturity, when all the weather was known, had a RRMSE of 14%. The good prediction at planting time was explained by the existence of shallow water tables, which decreased model sensitivity to unknown summer precipitation by 50ā€“64%. Model initial conditions and management information accounted for oneā€fourth of the variation in maize yield. End of season model evaluations indicated that the model simulated well crop phenology (R2 = 0.88), root depth (R2 = 0.83), biomass production (R2 = 0.93), grain yield (R2 = 0.90), plant N uptake (R2 = 0.87), soil moisture (R2 = 0.42), soil temperature (R2 = 0.93), soil nitrate (R2 = 0.77), and water table depth (R2 = 0.41). We concluded that model setā€up by the user (e.g. inclusion of water table), initial conditions, and early season measurements are very important for accurate predictions of soil water, N and crop yields in this environment

    Maize Yield and Nitrate Loss Prediction with Machine Learning Algorithms

    Get PDF
    Pre-season prediction of crop production outcomes such as grain yields and N losses can provide insights to stakeholders when making decisions. Simulation models can assist in scenario planning, but their use is limited because of data requirements and long run times. Thus, there is a need for more computationally expedient approaches to scale up predictions. We evaluated the potential of five machine learning (ML) algorithms as meta-models for a cropping systems simulator (APSIM) to inform future decision-support tool development. We asked: 1) How well do ML meta-models predict maize yield and N losses using pre-season information? 2) How many data are needed to train ML algorithms to achieve acceptable predictions?; 3) Which input data variables are most important for accurate prediction?; and 4) Do ensembles of ML meta-models improve prediction? The simulated dataset included more than 3 million genotype, environment and management scenarios. Random forests most accurately predicted maize yield and N loss at planting time, with a RRMSE of 14% and 55%, respectively. ML meta-models reasonably reproduced simulated maize yields but not N loss. They also differed in their sensitivities to the size of the training dataset. Across all ML models, yield prediction error decreased by 10-40% as the training dataset increased from 0.5 to 1.8 million data points, whereas N loss prediction error showed no consistent pattern. ML models also differed in their sensitivities to input variables. Averaged across all ML models, weather conditions, soil properties, management information and initial conditions were roughly equally important when predicting yields. Modest prediction improvements resulted from ML ensembles. These results can help accelerate progress in coupling simulation models and ML toward developing dynamic decision support tools for pre-season management

    Evaluation of statistical and process-based models as nitrogen recommendation tools in maize production systems

    Get PDF
    Optimizing nitrogen (N) management in maize (Zea mays L.) production systems is critical and essential to ensure profitability, productivity, and environmental sustainability. However, it represents a challenge because N is highly mobile within the soil-plant-atmospheric system. Therefore finding the optimum N rate for maize is a difficult task. The overall goal of this research was to evaluate crop model and statistical -based approaches to making N recommendations for maize and quantify prediction accuracy in two major maize production regions: Iowa, USA and Buenos Aires, Argentina. I addressed three questions: 1) how accurately process-based modeling and statistical based approaches can simulate yields and optimal N rates, 2) how does the accuracy change when models are used as a forecasting tools (with limited input data), and 3) which soil, crop, and atmospheric variables are most important to improve understanding of optimum N rate variability from year-to-year and from field-to-field? Data to test crop model predictions included yield response to N from a 16-year field experiment conducted in central Iowa, USA with two crop rotations totaling 31 N-trials. Data to test statistical models included a 5-year yield response to N from central-west Buenos Aires, Argentina with different rotations, soil properties, and landscape positions totaling 51 trials. The statistical-based approach predicted optimal N rates with higher accuracy than process-based models (root mean square error, RMSE of 42 vs 62 kg N ha-1, respectively). Yields that were predicted at the end of the season had a RMSE that ranged from 1 to 1.3 Mg ha-1. The accuracy of yield predictions at planting decreased more for optimal N rates when using process-based models. Optimal N rate at planting was predicted with similar accuracy to that predicted at the end-of-season (RMSE 60 and 47 kg N ha-1 for process- and statistical-based approach, respectively). Lastly, I found that the spring precipitation (April to June) and the precipitation events greater than 20 mm accumulated from planting to silking highly explained the variability in optimal N rates in both central Iowa and in central-west Buenos Aires

    KFC Server: interactive forecasting of protein interaction hot spots

    Get PDF
    The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) modelā€”a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted proteinā€“protein or proteinā€“DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org

    Correlation-Compressed Direct Coupling Analysis

    Full text link
    Learning Ising or Potts models from data has become an important topic in statistical physics and computational biology, with applications to predictions of structural contacts in proteins and other areas of biological data analysis. The corresponding inference problems are challenging since the normalization constant (partition function) of the Ising/Potts distributions cannot be computed efficiently on large instances. Different ways to address this issue have hence given size to a substantial methodological literature. In this paper we investigate how these methods could be used on much larger datasets than studied previously. We focus on a central aspect, that in practice these inference problems are almost always severely under-sampled, and the operational result is almost always a small set of leading (largest) predictions. We therefore explore an approach where the data is pre-filtered based on empirical correlations, which can be computed directly even for very large problems. Inference is only used on the much smaller instance in a subsequent step of the analysis. We show that in several relevant model classes such a combined approach gives results of almost the same quality as the computationally much more demanding inference on the whole dataset. We also show that results on whole-genome epistatic couplings that were obtained in a recent computation-intensive study can be retrieved by the new approach. The method of this paper hence opens up the possibility to learn parameters describing pair-wise dependencies in whole genomes in a computationally feasible and expedient manner.Comment: 15 pages, including 11 figure

    Risk Assessment and Prediction of Aflatoxin in Agro-Products

    Get PDF
    Aflatoxin (AFT), highly toxic and carcinogenic to humans, seriously threatens consumption safety of agro-products. It is necessary to conduct risk assessment of aflatoxin contamination in agro-food products to find out critical control points (CCPs) and develop prediction, prevention and control theories and technologies. In this chapter, risk assessment and prediction of aflatoxin contamination in peanut were taken as an example. The values under the limit of detection (LOD) were replaced by zero, 1/2 LOD or LOD according to their respective proportion, and the distribution of values higher than the LOD was fitted by @RISK software. AFB1 dietary exposure was evaluated based on non-parametric probability risk assessment and margin of exposure (MOE). A risk ranking method was adopted for mycotoxins based on food risk expectation ranking. Spatial analysis of AFB1 contamination was conducted using geographic information system (GIS). Average climatic conditions were calculated by Thiessen polygon method and the relationship between AFB1 concentration and average pre-harvest climatic conditions was obtained through multiple regression. To fulfill the purposes of reducing cost, increasing efficiency, maximizing the role of risk assessment and prediction, and improving the quality and safety of agricultural products, we will continuously focus on developing advanced and integrated technologies and solutions
    • ā€¦
    corecore