46,118 research outputs found

    Search for Evergreens in Science: A Functional Data Analysis

    Full text link
    Evergreens in science are papers that display a continual rise in annual citations without decline, at least within a sufficiently long time period. Aiming to better understand evergreens in particular and patterns of citation trajectory in general, this paper develops a functional data analysis method to cluster citation trajectories of a sample of 1699 research papers published in 1980 in the American Physical Society (APS) journals. We propose a functional Poisson regression model for individual papers' citation trajectories, and fit the model to the observed 30-year citations of individual papers by functional principal component analysis and maximum likelihood estimation. Based on the estimated paper-specific coefficients, we apply the K-means clustering algorithm to cluster papers into different groups, for uncovering general types of citation trajectories. The result demonstrates the existence of an evergreen cluster of papers that do not exhibit any decline in annual citations over 30 years.Comment: 40 pages, 9 figure

    Measuring and Fostering Non-Cognitive Skills in Adolescence: Evidence from Chicago Public Schools and the OneGoal Program

    Get PDF
    Recent evidence has established that non-cognitive skills (e.g., persistence and selfcontrol) are valuable in the labor market and are malleable throughout adolescence. Some recent high school interventions have been developed to foster these skills, but there is little evidence on whether they are effective. Using administrative data, we apply two methods to evaluate an intervention called OneGoal, which attempts to help disadvantaged students attend and complete college in part by teaching non-cognitive skills. First, we compare the outcomes of participants and non-participants with similar pre-program cognitive and non-cognitive skills. In doing so, we develop and validate a measure of non-cognitive skill that is based on readily available data and rivals standard measures of cognitive skill in predicting educational attainment. Second, we use an instrumental variable difference-in-difference approach that exploits the fact that OneGoal was introduced into different schools at different times. We estimate that OneGoal improves academic indicators, increases college enrollment by 10–20 percentage points, and reduces arrest rates by 5 percentage points for males. We demonstrate that improvements in non-cognitive skill account for 15–30 percent of the treatment effects

    Comprehensive Review of Opinion Summarization

    Get PDF
    The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe

    Spectral gene set enrichment (SGSE)

    Get PDF
    Motivation: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. Results: We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracey-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Availability: http://cran.r-project.org/web/packages/PCGSE/index.html Contact: [email protected] or [email protected]

    A user profiling component with the aid of user ontologies

    Get PDF
    Abstract: What follows is a contribution to the field of user modeling for adaptive teaching and learning programs especially in the medical field. The paper outlines existing approaches to the problem of extracting user information in a form that can be exploited by adaptive software. We focus initially on the so-called stereotyping method, which allocates users into classes adaptively, reflecting characteristics such as physical data, social background, and computer experience. The user classifications of the stereotyping method are however ad hoc and unprincipled, and they can be exploited by the adaptive system only after a large number of trials by various kinds of users. We argue that the remedy is to create a database of user ontologies from which readymade taxonomies can be derived in such a way as to enable associated software to support a variety of different types of users

    Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods

    Full text link
    Measuring the similarity of short written contexts is a fundamental problem in Natural Language Processing. This article provides a unifying framework by which short context problems can be categorized both by their intended application and proposed solution. The goal is to show that various problems and methodologies that appear quite different on the surface are in fact very closely related. The axes by which these categorizations are made include the format of the contexts (headed versus headless), the way in which the contexts are to be measured (first-order versus second-order similarity), and the information used to represent the features in the contexts (micro versus macro views). The unifying thread that binds together many short context applications and methods is the fact that similarity decisions must be made between contexts that share few (if any) words in common.Comment: 23 page
    • …
    corecore