284,820 research outputs found

    Predictive hypothesis identification

    Get PDF
    While statistics focusses on hypothesis testing and on estimating (properties of) the true sampling distribution, in machine learning the performance of learning algorithms on future data is the primary issue. In this paper we bridge the gap with a general principle (PHI) that identifies hypotheses with best predictive performance. This includes predictive point and interval estimation, simple and composite hypothesis testing, (mixture) model selection, and others as special cases. For concrete instantiations we will recover well-known methods, variations thereof, and new ones. PHI nicely justifies, reconciles, and blends (a reparametrization invariant variation of) MAP, ML, MDL, and moment estimation. One particular feature of PHI is that it can genuinely deal with nested hypotheses

    Identification of potential serum peptide biomarkers of biliary tract cancer using MALDI MS profiling.

    Get PDF
    The aim of this discovery study was the identification of peptide serum biomarkers for detecting biliary tract cancer (BTC) using samples from healthy volunteers and benign cases of biliary disease as control groups. This work was based on the hypothesis that cancer-specific exopeptidase activities in serum can generate cancer-predictive peptide fragments from circulating proteins during coagulation

    Bayesian threshold selection for extremal models using measures of surprise

    Full text link
    Statistical extreme value theory is concerned with the use of asymptotically motivated models to describe the extreme values of a process. A number of commonly used models are valid for observed data that exceed some high threshold. However, in practice a suitable threshold is unknown and must be determined for each analysis. While there are many threshold selection methods for univariate extremes, there are relatively few that can be applied in the multivariate setting. In addition, there are only a few Bayesian-based methods, which are naturally attractive in the modelling of extremes due to data scarcity. The use of Bayesian measures of surprise to determine suitable thresholds for extreme value models is proposed. Such measures quantify the level of support for the proposed extremal model and threshold, without the need to specify any model alternatives. This approach is easily implemented for both univariate and multivariate extremes.Comment: To appear in Computational Statistics and Data Analysi

    Asset Pricing Theories, Models, and Tests

    Get PDF
    An important but still partially unanswered question in the investment field is why different assets earn substantially different returns on average. Financial economists have typically addressed this question in the context of theoretically or empirically motivated asset pricing models. Since many of the proposed “risk” theories are plausible, a common practice in the literature is to take the models to the data and perform “horse races” among competing asset pricing specifications. A “good” asset pricing model should produce small pricing (expected return) errors on a set of test assets and should deliver reasonable estimates of the underlying market and economic risk premia. This chapter provides an up-to-date review of the statistical methods that are typically used to estimate, evaluate, and compare competing asset pricing models. The analysis also highlights several pitfalls in the current econometric practice and offers suggestions for improving empirical tests

    Weak signal identification with semantic web mining

    Get PDF
    We investigate an automated identification of weak signals according to Ansoff to improve strategic planning and technological forecasting. Literature shows that weak signals can be found in the organization's environment and that they appear in different contexts. We use internet information to represent organization's environment and we select these websites that are related to a given hypothesis. In contrast to related research, a methodology is provided that uses latent semantic indexing (LSI) for the identification of weak signals. This improves existing knowledge based approaches because LSI considers the aspects of meaning and thus, it is able to identify similar textual patterns in different contexts. A new weak signal maximization approach is introduced that replaces the commonly used prediction modeling approach in LSI. It enables to calculate the largest number of relevant weak signals represented by singular value decomposition (SVD) dimensions. A case study identifies and analyses weak signals to predict trends in the field of on-site medical oxygen production. This supports the planning of research and development (R&D) for a medical oxygen supplier. As a result, it is shown that the proposed methodology enables organizations to identify weak signals from the internet for a given hypothesis. This helps strategic planners to react ahead of time
    corecore