444,034 research outputs found

    Perturbation selection and influence measures in local influence analysis

    Get PDF
    Cook's [J. Roy. Statist. Soc. Ser. B 48 (1986) 133--169] local influence approach based on normal curvature is an important diagnostic tool for assessing local influence of minor perturbations to a statistical model. However, no rigorous approach has been developed to address two fundamental issues: the selection of an appropriate perturbation and the development of influence measures for objective functions at a point with a nonzero first derivative. The aim of this paper is to develop a differential--geometrical framework of a perturbation model (called the perturbation manifold) and utilize associated metric tensor and affine curvatures to resolve these issues. We will show that the metric tensor of the perturbation manifold provides important information about selecting an appropriate perturbation of a model. Moreover, we will introduce new influence measures that are applicable to objective functions at any point. Examples including linear regression models and linear mixed models are examined to demonstrate the effectiveness of using new influence measures for the identification of influential observations.Comment: Published in at http://dx.doi.org/10.1214/009053607000000343 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Generating Aspect-oriented Multi-document Summarization with Event-Aspect Model

    Get PDF
    In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. We first develop an event-aspect LDA model to cluster sentences into aspects. We then use extended LexRank algorithm to rank the sentences in each cluster. We use Integer Linear Programming for sentence selection. Key features of our method include automatic grouping of semantically related sentences and sentence ranking based on extension of random walk model. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We compare our method with four baseline methods. Quantitative evaluation based on Rouge metric demonstrates the effectiveness and advantages of our method.

    Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

    Get PDF
    Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan

    A Design Based New Reusable Software Process Model for Component Based Development Environment

    Get PDF
    AbstractSoftware development considered to be an important part of software industry. Various metrics, algorithms and reusable process models has been designed but ultimately our main goal is only to find that part which will help us to select the optimal one which may be a metric, algorithm or a reusable software process model. For the various large applications some components need to be built separately and some of the components need to be modified according to the requirement for searching the optimal components. Now a day's component based software engineering considered to be the best approach for the software development at low cost and this software development best approach will totally dependent on the optimal selection of components. The aim of this paper is to describe the characteristics of some selected state of art CBSD models and a new reusable software process model has been designed for the optimal selection of components based on the new optimal algorithm

    EVALUATING THE PREDICTIVE CAPABILITY OF NUMERICAL MODELS CONSIDERING ROBUSTNESS TO NON-PROBABILISTIC UNCERTIANTY IN THE INPUT PARAMETERS

    Get PDF
    The paradigm of model evaluation is challenged by compensations between various forms of errors and uncertainties that are inherent to the model development process due to, for instance, imprecise model input parameters, scarcity of experimental data and lack of knowledge regarding an accurate mathematical representation of the system. When calibrating model input parameters based on fidelity to experiments, such compensations lead to non-unique solutions. In turn, the existence of non-unique solutions makes the selection and use of one `best\u27 numerical model risky. Therefore, it becomes necessary to evaluate model performance based not only on the fidelity of the predictions to experiments but also the model\u27s ability to satisfy fidelity threshold requirements in the face of uncertainties. The level of inherent uncertainty need not be known a priori as the model\u27s predictions can be evaluated for increasing levels of uncertainty, and a model form can be sought that yields the highest probability of satisfying a given fidelity threshold. By implementing these concepts, this manuscript presents a probabilistic formulation of a robust-satisfying approach, along with its associated metric. This new formulation evaluates the performance of a model form based on the probability that the model predictions match experimental data within a predefined fidelity threshold when subject to uncertainty in their input parameters. This approach can be used to evaluate the robustness and fidelity of a numerical model as part of a model validation campaign, or to compare multiple candidate model forms as part of a model selection campaign. In this thesis, the conceptual framework and mathematical formulation of this new probabilistic treatment of robust-satisfying approach is presented. The feasibility and application of this new approach is demonstrated on a structural steel frame with uncertain connection parameters, which has undergone static loading conditions

    The Generalized DEA Model of Fundamental Analysis of Public Firms, with Application to Portfolio Selection

    Get PDF
    Fundamental analysis is an approach for evaluating a public firm for its investmentworthiness by looking at its business at the basic or fundamental financial level. The focus of this thesis is on utilizing financial statement data and a new generalization of the Data Envelopment Analysis, termed the GDEA model, to determine a relative financial strength (RFS) indicator that represents the underlying business strength of a firm. This approach is based on maximizing a correlation metric between GDEA-based score of financial strength and stock price performance. The correlation maximization problem is a difficult binary nonlinear optimization that requires iterative re-configuration of parameters of financial statements as inputs and outputs. A two-step heuristic algorithm that combines random sampling and local search optimization is developed. Theoretical optimality conditions are also derived for checking solutions of the GDEA model. Statistical tests are developed for validating the utility of the RFS indicator for portfolio selection, and the approach is computationally tested and compared with competing approaches. The GDEA model is also further extended by incorporating Expert Information on input/output selection. In addition to deriving theoretical properties of the model, a new methodology is developed for testing if such exogenous expert knowledge can be significant in obtaining stronger RFS indicators. Finally, the RFS approach under expert information is applied in a Case Study, involving more than 800 firms covering all sectors of the U.S. stock market, to determine optimized RFS indicators for stock selection. Those selected stocks are then used within portfolio optimization models to demonstrate the superiority of the techniques developed in this thesis

    Optimal Design of Validation Experiments for Calibration and Validation of Complex Numerical Models

    Get PDF
    As prediction of the performance and behavior of complex engineering systems shifts from a primarily empirical-based approach to the use of complex physics-based numerical models, the role of experimentation is evolving to calibrate, validate, and quantify uncertainty of the numerical models. Oftentimes, these experiments are expensive, placing importance on selecting experimental settings to efficiently calibrate the numerical model with a limited number of experiments. The aim of this thesis is to reduce the experimental resources required to reach predictive maturity in complex numerical models by (i) aiding experimenters in determining the optimal settings for experiments, and (ii) aiding the model developers in assessing the predictive maturity of numerical models through a new, more refined coverage metric. Numerical model predictions entail uncertainties, primarily caused by imprecisely known input parameter values and biases, primarily caused by simplifications and idealizations in the model. Hence, calibration of numerical models involves not only updating of parameter values but also inferring the discrepancy bias, or empirically trained error model. Training of this error model throughout the domain of applicability becomes possible when experiments conducted at varying settings are available. Of course, for the trained discrepancy bias to be meaningful and a numerical model to be predictively mature, the validation experiments must sufficiently cover the operational domain. Otherwise, poor training of the discrepancy bias and overconfidence in model predictions may result. Thus, coverage metrics are used to quantify the ability of a set of validation experiments to represent an entire operation domain. This thesis is composed of two peer-reviewed journal articles. The first article focuses on the optimal design of validation experiments. The ability to improve the predictive maturity of a plasticity material model is assessed for several index-based and distance-based batch sequential design selection criteria through a detailed analysis of discrepancy bias and coverage. Furthermore, the effect of experimental uncertainty, complexity of discrepancy bias, and initial experimental settings on the performance of each criterion is evaluated. Lastly, a technique that integrates index-based and distance-based selection criteria to both exploit the available knowledge regarding the discrepancy bias and explore the operational domain is evaluated. This article is published in Structural and Multidisciplinary Optimization in 2013. The second article is focused on developing a coverage metric. Four characteristics of an exemplar coverage metric are identified and the ability of coverage metrics from the literature to satisfy the four criteria is evaluated. No existing coverage metric is determined to satisfy all four criteria. As a solution, a new coverage metric is proposed which exhibits satisfactory performance in all four criteria. The performance of the proposed coverage metric is compared to the existing coverage metrics using an application to the plasticity material model as well as a high-dimensional Rosenbrock function. This article is published in Mechanical Systems and Signal Processing in 2014
    • ā€¦
    corecore