1,170,503 research outputs found
Evaluating hedge fund performance: a stochastic dominance approach
We introduce a general and flexible framework for hedge fund performance evaluation and asset allocation: stochastic dominance (SD) theory. Our approach utilizes statistical tests for stochastic dominance to compare the returns of hedge funds. We form hedge fund portfolios by using SD criteria and examine the out-of-sample performance of these hedge fund portfolios. Compared to performance of portfolios of randomly selected hedge funds and mean-variance e¢ cient hedge funds, our results show that fund selection method based on SD criteria greatly improves the performance of hedge fund portfolio
Uplift Modeling with Multiple Treatments and General Response Types
Randomized experiments have been used to assist decision-making in many
areas. They help people select the optimal treatment for the test population
with certain statistical guarantee. However, subjects can show significant
heterogeneity in response to treatments. The problem of customizing treatment
assignment based on subject characteristics is known as uplift modeling,
differential response analysis, or personalized treatment learning in
literature. A key feature for uplift modeling is that the data is unlabeled. It
is impossible to know whether the chosen treatment is optimal for an individual
subject because response under alternative treatments is unobserved. This
presents a challenge to both the training and the evaluation of uplift models.
In this paper we describe how to obtain an unbiased estimate of the key
performance metric of an uplift model, the expected response. We present a new
uplift algorithm which creates a forest of randomized trees. The trees are
built with a splitting criterion designed to directly optimize their uplift
performance based on the proposed evaluation method. Both the evaluation method
and the algorithm apply to arbitrary number of treatments and general response
types. Experimental results on synthetic data and industry-provided data show
that our algorithm leads to significant performance improvement over other
applicable methods
Comparison between statistical and dynamical downscaling of rainfall over the GwadarâOrmara basin, Pakistan
Abstract This paper evaluated and compared the performance of a statistical downscaling method and a dynamical downscaling method to simulate the spatialâtemporal rainfall distribution. Outputs from RegCM4 Regional Climate Model (RCM) and the CanESM2 AtmosphereâOcean General Circulation Model (AOGCM) were selected for the data scarce GwadarâOrmara basin, Pakistan. The evaluation was based on the climatological average and standard deviation for historic (1971â2000) and future (2041â2070) time periods under Representative Concentration Pathways (RCP) 4.5 and 8.5 scenarios. The performance evaluation showed that statistical downscaling is preferred to simulate and project rainfall patterns in the study area. Additionally, the Statistical DownScaling Model (SDSM) showed low R2 values in calibration and validation of the simulations with respect to observed data for the historic period. Overall, SDSM generated satisfactory results in simulating the monthly rainfall cycle of the entire basin. In this study, RegCM4 showed large rainfall errors and missed one rainfall season in the historic period. This study also explored whether the gridâbased rainfall time series of the Asian PrecipitationâHighly Resolved Observational Daily Integration Towards Evaluation (APHRODITE) dataset could be used to enlarge and complement the sample of in situ observed rainfall time series. A spatial correlogram was used for observed and APHRODITE rainfall data to assess the consistency between the two data sources, which resulted in rejecting APHRODITE data. For the future time period (2041â2070) under RCPs 4.5 and 8.5 scenarios, rainfall projections did not show significant difference for both downscaling approaches. This may relate to the driving model (CanESM2 AOGCM) and not necessarily suggests poor performance of downscaling; either statistical or dynamical. Hence, the study recommends evaluating a multiâmodel ensemble including other GCMs and RCMs for the same area of study
Network tomography based on 1-D projections
Network tomography has been regarded as one of the most promising
methodologies for performance evaluation and diagnosis of the massive and
decentralized Internet. This paper proposes a new estimation approach for
solving a class of inverse problems in network tomography, based on marginal
distributions of a sequence of one-dimensional linear projections of the
observed data. We give a general identifiability result for the proposed method
and study the design issue of these one dimensional projections in terms of
statistical efficiency. We show that for a simple Gaussian tomography model,
there is an optimal set of one-dimensional projections such that the estimator
obtained from these projections is asymptotically as efficient as the maximum
likelihood estimator based on the joint distribution of the observed data. For
practical applications, we carry out simulation studies of the proposed method
for two instances of network tomography. The first is for traffic demand
tomography using a Gaussian Origin-Destination traffic model with a power
relation between its mean and variance, and the second is for network delay
tomography where the link delays are to be estimated from the end-to-end path
delays. We compare estimators obtained from our method and that obtained from
using the joint distribution and other lower dimensional projections, and show
that in both cases, the proposed method yields satisfactory results.Comment: Published at http://dx.doi.org/10.1214/074921707000000238 in the IMS
Lecture Notes Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
RealText-cs - Corpus based domain independent Content Selection model
Content selection is a highly domain dependent task responsible for retrieving relevant information from a knowledge source using a given communicative goal. This paper presents a domain independent content selection model using keywords as communicative goal. We employ DBpedia triple store as our knowledge source and triples are selected based on weights assigned to each triple. The calculation of the weights is carried out through log likelihood distance between a domain corpus and a general reference corpus. The method was evaluated using keywords extracted from QALD dataset and the performance was compared with cross entropy based statistical content selection. The evaluation results showed that the proposed method can perform 32% better than cross entropy based statistical content selection
- âŚ