551 research outputs found
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
Universality, limits and predictability of gold-medal performances at the Olympic Games
Inspired by the Games held in ancient Greece, modern Olympics represent the
world's largest pageant of athletic skill and competitive spirit. Performances
of athletes at the Olympic Games mirror, since 1896, human potentialities in
sports, and thus provide an optimal source of information for studying the
evolution of sport achievements and predicting the limits that athletes can
reach. Unfortunately, the models introduced so far for the description of
athlete performances at the Olympics are either sophisticated or unrealistic,
and more importantly, do not provide a unified theory for sport performances.
Here, we address this issue by showing that relative performance improvements
of medal winners at the Olympics are normally distributed, implying that the
evolution of performance values can be described in good approximation as an
exponential approach to an a priori unknown limiting performance value. This
law holds for all specialties in athletics-including running, jumping, and
throwing-and swimming. We present a self-consistent method, based on normality
hypothesis testing, able to predict limiting performance values in all
specialties. We further quantify the most likely years in which athletes will
breach challenging performance walls in running, jumping, throwing, and
swimming events, as well as the probability that new world records will be
established at the next edition of the Olympic Games.Comment: 8 pages, 3 figures, 1 table. Supporting information files and data
are available at filrad.homelinux.or
A frequentist framework of inductive reasoning
Reacting against the limitation of statistics to decision procedures, R. A.
Fisher proposed for inductive reasoning the use of the fiducial distribution, a
parameter-space distribution of epistemological probability transferred
directly from limiting relative frequencies rather than computed according to
the Bayes update rule. The proposal is developed as follows using the
confidence measure of a scalar parameter of interest. (With the restriction to
one-dimensional parameter space, a confidence measure is essentially a fiducial
probability distribution free of complications involving ancillary statistics.)
A betting game establishes a sense in which confidence measures are the only
reliable inferential probability distributions. The equality between the
probabilities encoded in a confidence measure and the coverage rates of the
corresponding confidence intervals ensures that the measure's rule for
assigning confidence levels to hypotheses is uniquely minimax in the game.
Although a confidence measure can be computed without any prior distribution,
previous knowledge can be incorporated into confidence-based reasoning. To
adjust a p-value or confidence interval for prior information, the confidence
measure from the observed data can be combined with one or more independent
confidence measures representing previous agent opinion. (The former confidence
measure may correspond to a posterior distribution with frequentist matching of
coverage probabilities.) The representation of subjective knowledge in terms of
confidence measures rather than prior probability distributions preserves
approximate frequentist validity.Comment: major revisio
Recommended from our members
Computational framework for longevity risk management
Longevity risk threatens the financial stability of private and government sponsored defined benefit pension systems as well as social security schemes, in an environment already characterized by persistent low interest rates and heightened financial uncertainty. The mortality experience of countries in the industrialized world would suggest a substantial age-time interaction, with the two dominant trends affecting different age groups at different times. From a statistical point of view, this indicates a dependence structure. It is observed that mortality improvements are similar for individuals of contiguous ages (Wills and Sherris, Integrating financial and demographic longevity risk models: an Australian model for financial applications, Discussion Paper PI-0817, 2008). Moreover, considering the dataset by single ages, the correlations between the residuals for adjacent age groups tend to be high (as noted in Denton et al., J Population Econ 18:203-227, 2005). This suggests that there is value in exploring the dependence structure, also across time, in other words the inter-period correlation. In this research, we focus on the projections of mortality rates, contravening the most commonly encountered dependence property which is the "lack of dependence" (Denuit et al., Actuarial theory for dependent risks: measures. Orders and models, Wiley, New York, 2005). By taking into account the presence of dependence across age and time which leads to systematic over-estimation or under-estimation of uncertainty in the estimates (Liu and Braun, J Probability Stat, 813583:15, 2010), the paper analyzes a tailor-made bootstrap methodology for capturing the spatial dependence in deriving confidence intervals for mortality projection rates. We propose a method which leads to a prudent measure of longevity risk, avoiding the structural incompleteness of the ordinary simulation bootstrap methodology which involves the assumption of independence
Estimating the incidence of acute infectious intestinal disease in the community in the UK:A retrospective telephone survey
Objectives: To estimate the burden of intestinal infectious disease (IID) in the UK and determine whether disease burden estimations using a retrospective study design differ from those using a prospective study design. Design/Setting: A retrospective telephone survey undertaken in each of the four countries comprising the United Kingdom. Participants were randomly asked about illness either in the past 7 or 28 days. Participants: 14,813 individuals for all of whom we had a legible recording of their agreement to participate Outcomes: Self-reported IID, defined as loose stools or clinically significant vomiting lasting less than two weeks, in the absence of a known non-infectious cause. Results: The rate of self-reported IID varied substantially depending on whether asked for illness in the previous 7 or 28 days. After standardising for age and sex, and adjusting for the number of interviews completed each month and the relative size of each UK country, the estimated rate of IID in the 7-day recall group was 1,530 cases per 1,000 person-years (95% CI: 1135 – 2113), while in the 28-day recall group it was 533 cases per 1,000 person-years (95% CI: 377 – 778). There was no significant variation in rates between the four countries. Rates in this study were also higher than in a related prospective study undertaken at the same time. Conclusions: The estimated burden of disease from IID varied dramatically depending on study design. Retrospective studies of IID give higher estimates of disease burden than prospective studies. Of retrospective studies longer recall periods give lower estimated rates than studies with short recall periods. Caution needs to be exercised when comparing studies of self-reported IID as small changes in study design or case definition can markedly affect estimated rates
Recommended from our members
A demonstration of 'broken' visual space
It has long been assumed that there is a distorted mapping between real and ‘perceived’ space, based on demonstrations of systematic errors in judgements of slant, curvature, direction and separation. Here, we have applied a direct test to the notion of a coherent visual space. In an immersive virtual environment, participants judged the relative distance of two squares displayed in separate intervals. On some trials, the virtual scene expanded by a factor of four between intervals although, in line with recent results, participants did not report any noticeable change in the scene. We found that there was no consistent depth ordering of objects that can explain the distance matches participants made in this environment (e.g. A > B > D yet also A < C < D) and hence no single one-to-one mapping between participants’ perceived space and any real 3D environment. Instead, factors that affect pairwise comparisons of distances dictate participants’ performance. These data contradict, more directly than previous experiments, the idea that the visual system builds and uses a coherent 3D internal representation of a scene
Assessing the impact of a health intervention via user-generated Internet content
Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign
Optimally splitting cases for training and testing high dimensional classifiers
<p>Abstract</p> <p>Background</p> <p>We consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate?</p> <p>Results</p> <p>We develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of better understanding the factors that determine the best split proportions and to evaluate commonly used splitting strategies (1/2 training or 2/3 training) under a wide variety of conditions. These methods are based on a decomposition of the MSE into three intuitive component parts.</p> <p>Conclusions</p> <p>By applying these approaches to a number of synthetic and real microarray datasets we show that for linear classifiers the optimal proportion depends on the overall number of samples available and the degree of differential expression between the classes. The optimal proportion was found to depend on the full dataset size (n) and classification accuracy - with higher accuracy and smaller <it>n </it>resulting in more assigned to the training set. The commonly used strategy of allocating 2/3rd of cases for training was close to optimal for reasonable sized datasets (<it>n </it>≥ 100) with strong signals (i.e. 85% or greater full dataset accuracy). In general, we recommend use of our nonparametric resampling approach for determing the optimal split. This approach can be applied to any dataset, using any predictor development method, to determine the best split.</p
Fluconazole for empiric antifungal therapy in cancer patients with fever and neutropenia
BACKGROUND: Several clinical trials have demonstrated the efficacy of fluconazole as empiric antifungal therapy in cancer patients with fever and neutropenia. Our objective was to assess the frequency and resource utilization associated with treatment failure in cancer patients given empiric fluconazole antifungal therapy in routine inpatient care. METHODS: We performed a retrospective cohort study of cancer patients treated with oral or intravenous fluconazole between 7/97 and 6/01 in a tertiary care hospital. The final study cohort included cancer patients with neutropenia (an absolute neutrophil count below 500 cells/mm(3)) and fever (a temperature above 38°C or 100.4°F), who were receiving at least 96 hours of parenteral antibacterial therapy prior to initiating fluconazole. Patients' responses to empiric therapy were assessed by reviewing patient charts. RESULTS: Among 103 cancer admissions with fever and neutropenia, treatment failure after initiating empiric fluconazole antifungal therapy occurred in 41% (95% confidence interval (CI) 31% – 50%) of admissions. Patients with a diagnosis of hematological malignancy had increased risk of treatment failure (OR = 4.6, 95% CI 1.5 – 14.8). When treatment failure occurred the mean adjusted increases in length of stay and total costs were 7.4 days (95% CI 3.3 – 11.5) and $18,925 (95% CI 3,289 – 34,563), respectively. CONCLUSION: Treatment failure occurred in more than one-third of neutropenic cancer patients on fluconazole as empiric antifungal treatment for fever in routine clinical treatment. The increase in costs when treatment failure occurs is substantial
- …