551 research outputs found

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    Universality, limits and predictability of gold-medal performances at the Olympic Games

    Get PDF
    Inspired by the Games held in ancient Greece, modern Olympics represent the world's largest pageant of athletic skill and competitive spirit. Performances of athletes at the Olympic Games mirror, since 1896, human potentialities in sports, and thus provide an optimal source of information for studying the evolution of sport achievements and predicting the limits that athletes can reach. Unfortunately, the models introduced so far for the description of athlete performances at the Olympics are either sophisticated or unrealistic, and more importantly, do not provide a unified theory for sport performances. Here, we address this issue by showing that relative performance improvements of medal winners at the Olympics are normally distributed, implying that the evolution of performance values can be described in good approximation as an exponential approach to an a priori unknown limiting performance value. This law holds for all specialties in athletics-including running, jumping, and throwing-and swimming. We present a self-consistent method, based on normality hypothesis testing, able to predict limiting performance values in all specialties. We further quantify the most likely years in which athletes will breach challenging performance walls in running, jumping, throwing, and swimming events, as well as the probability that new world records will be established at the next edition of the Olympic Games.Comment: 8 pages, 3 figures, 1 table. Supporting information files and data are available at filrad.homelinux.or

    A frequentist framework of inductive reasoning

    Full text link
    Reacting against the limitation of statistics to decision procedures, R. A. Fisher proposed for inductive reasoning the use of the fiducial distribution, a parameter-space distribution of epistemological probability transferred directly from limiting relative frequencies rather than computed according to the Bayes update rule. The proposal is developed as follows using the confidence measure of a scalar parameter of interest. (With the restriction to one-dimensional parameter space, a confidence measure is essentially a fiducial probability distribution free of complications involving ancillary statistics.) A betting game establishes a sense in which confidence measures are the only reliable inferential probability distributions. The equality between the probabilities encoded in a confidence measure and the coverage rates of the corresponding confidence intervals ensures that the measure's rule for assigning confidence levels to hypotheses is uniquely minimax in the game. Although a confidence measure can be computed without any prior distribution, previous knowledge can be incorporated into confidence-based reasoning. To adjust a p-value or confidence interval for prior information, the confidence measure from the observed data can be combined with one or more independent confidence measures representing previous agent opinion. (The former confidence measure may correspond to a posterior distribution with frequentist matching of coverage probabilities.) The representation of subjective knowledge in terms of confidence measures rather than prior probability distributions preserves approximate frequentist validity.Comment: major revisio

    Estimating the incidence of acute infectious intestinal disease in the community in the UK:A retrospective telephone survey

    Get PDF
    Objectives: To estimate the burden of intestinal infectious disease (IID) in the UK and determine whether disease burden estimations using a retrospective study design differ from those using a prospective study design. Design/Setting: A retrospective telephone survey undertaken in each of the four countries comprising the United Kingdom. Participants were randomly asked about illness either in the past 7 or 28 days. Participants: 14,813 individuals for all of whom we had a legible recording of their agreement to participate Outcomes: Self-reported IID, defined as loose stools or clinically significant vomiting lasting less than two weeks, in the absence of a known non-infectious cause. Results: The rate of self-reported IID varied substantially depending on whether asked for illness in the previous 7 or 28 days. After standardising for age and sex, and adjusting for the number of interviews completed each month and the relative size of each UK country, the estimated rate of IID in the 7-day recall group was 1,530 cases per 1,000 person-years (95% CI: 1135 – 2113), while in the 28-day recall group it was 533 cases per 1,000 person-years (95% CI: 377 – 778). There was no significant variation in rates between the four countries. Rates in this study were also higher than in a related prospective study undertaken at the same time. Conclusions: The estimated burden of disease from IID varied dramatically depending on study design. Retrospective studies of IID give higher estimates of disease burden than prospective studies. Of retrospective studies longer recall periods give lower estimated rates than studies with short recall periods. Caution needs to be exercised when comparing studies of self-reported IID as small changes in study design or case definition can markedly affect estimated rates

    Assessing the impact of a health intervention via user-generated Internet content

    Get PDF
    Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign

    Optimally splitting cases for training and testing high dimensional classifiers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate?</p> <p>Results</p> <p>We develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of better understanding the factors that determine the best split proportions and to evaluate commonly used splitting strategies (1/2 training or 2/3 training) under a wide variety of conditions. These methods are based on a decomposition of the MSE into three intuitive component parts.</p> <p>Conclusions</p> <p>By applying these approaches to a number of synthetic and real microarray datasets we show that for linear classifiers the optimal proportion depends on the overall number of samples available and the degree of differential expression between the classes. The optimal proportion was found to depend on the full dataset size (n) and classification accuracy - with higher accuracy and smaller <it>n </it>resulting in more assigned to the training set. The commonly used strategy of allocating 2/3rd of cases for training was close to optimal for reasonable sized datasets (<it>n </it>≥ 100) with strong signals (i.e. 85% or greater full dataset accuracy). In general, we recommend use of our nonparametric resampling approach for determing the optimal split. This approach can be applied to any dataset, using any predictor development method, to determine the best split.</p

    Fluconazole for empiric antifungal therapy in cancer patients with fever and neutropenia

    Get PDF
    BACKGROUND: Several clinical trials have demonstrated the efficacy of fluconazole as empiric antifungal therapy in cancer patients with fever and neutropenia. Our objective was to assess the frequency and resource utilization associated with treatment failure in cancer patients given empiric fluconazole antifungal therapy in routine inpatient care. METHODS: We performed a retrospective cohort study of cancer patients treated with oral or intravenous fluconazole between 7/97 and 6/01 in a tertiary care hospital. The final study cohort included cancer patients with neutropenia (an absolute neutrophil count below 500 cells/mm(3)) and fever (a temperature above 38°C or 100.4°F), who were receiving at least 96 hours of parenteral antibacterial therapy prior to initiating fluconazole. Patients' responses to empiric therapy were assessed by reviewing patient charts. RESULTS: Among 103 cancer admissions with fever and neutropenia, treatment failure after initiating empiric fluconazole antifungal therapy occurred in 41% (95% confidence interval (CI) 31% – 50%) of admissions. Patients with a diagnosis of hematological malignancy had increased risk of treatment failure (OR = 4.6, 95% CI 1.5 – 14.8). When treatment failure occurred the mean adjusted increases in length of stay and total costs were 7.4 days (95% CI 3.3 – 11.5) and $18,925 (95% CI 3,289 – 34,563), respectively. CONCLUSION: Treatment failure occurred in more than one-third of neutropenic cancer patients on fluconazole as empiric antifungal treatment for fever in routine clinical treatment. The increase in costs when treatment failure occurs is substantial
    corecore