46,118 research outputs found
Search for Evergreens in Science: A Functional Data Analysis
Evergreens in science are papers that display a continual rise in annual
citations without decline, at least within a sufficiently long time period.
Aiming to better understand evergreens in particular and patterns of citation
trajectory in general, this paper develops a functional data analysis method to
cluster citation trajectories of a sample of 1699 research papers published in
1980 in the American Physical Society (APS) journals. We propose a functional
Poisson regression model for individual papers' citation trajectories, and fit
the model to the observed 30-year citations of individual papers by functional
principal component analysis and maximum likelihood estimation. Based on the
estimated paper-specific coefficients, we apply the K-means clustering
algorithm to cluster papers into different groups, for uncovering general types
of citation trajectories. The result demonstrates the existence of an evergreen
cluster of papers that do not exhibit any decline in annual citations over 30
years.Comment: 40 pages, 9 figure
Measuring and Fostering Non-Cognitive Skills in Adolescence: Evidence from Chicago Public Schools and the OneGoal Program
Recent evidence has established that non-cognitive skills (e.g., persistence and selfcontrol) are valuable in the labor market and are malleable throughout adolescence. Some recent high school interventions have been developed to foster these skills, but there is little evidence on whether they are effective. Using administrative data, we apply two methods to evaluate an intervention called OneGoal, which attempts to help disadvantaged students attend and complete college in part by teaching non-cognitive skills. First, we compare the outcomes of participants and non-participants with similar pre-program cognitive and non-cognitive skills. In doing so, we develop and validate a measure of non-cognitive skill that is based on readily available data and rivals standard measures of cognitive skill in predicting educational attainment. Second, we use an instrumental variable difference-in-difference approach that exploits the fact that OneGoal was introduced into different schools at different times. We estimate that OneGoal improves academic indicators, increases college enrollment by 10–20 percentage points, and reduces arrest rates by 5 percentage points for males. We demonstrate that improvements in non-cognitive skill account for 15–30 percent of the treatment effects
Comprehensive Review of Opinion Summarization
The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe
Spectral gene set enrichment (SGSE)
Motivation: Gene set testing is typically performed in a supervised context
to quantify the association between groups of genes and a clinical phenotype.
In many cases, however, a gene set-based interpretation of genomic data is
desired in the absence of a phenotype variable. Although methods exist for
unsupervised gene set testing, they predominantly compute enrichment relative
to clusters of the genomic variables with performance strongly dependent on the
clustering algorithm and number of clusters. Results: We propose a novel
method, spectral gene set enrichment (SGSE), for unsupervised competitive
testing of the association between gene sets and empirical data sources. SGSE
first computes the statistical association between gene sets and principal
components (PCs) using our principal component gene set enrichment (PCGSE)
method. The overall statistical association between each gene set and the
spectral structure of the data is then computed by combining the PC-level
p-values using the weighted Z-method with weights set to the PC variance scaled
by Tracey-Widom test p-values. Using simulated data, we show that the SGSE
algorithm can accurately recover spectral features from noisy data. To
illustrate the utility of our method on real data, we demonstrate the superior
performance of the SGSE method relative to standard cluster-based techniques
for testing the association between MSigDB gene sets and the variance structure
of microarray gene expression data. Availability:
http://cran.r-project.org/web/packages/PCGSE/index.html Contact:
[email protected] or [email protected]
A user profiling component with the aid of user ontologies
Abstract: What follows is a contribution to the field of user modeling for adaptive teaching and learning programs especially in the medical field. The paper outlines existing approaches to the problem of extracting user information in a form that can be exploited by adaptive software. We focus initially on the so-called stereotyping method, which allocates users into classes adaptively, reflecting characteristics such as physical data, social background, and computer experience. The user classifications of the stereotyping method are however ad hoc and unprincipled, and they can be exploited by the adaptive system only after a large number of trials by various kinds of users. We argue that the remedy is to create a database of user ontologies from which readymade taxonomies can be derived in such a way as to enable associated software to support a variety of different types of users
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods
Measuring the similarity of short written contexts is a fundamental problem
in Natural Language Processing. This article provides a unifying framework by
which short context problems can be categorized both by their intended
application and proposed solution. The goal is to show that various problems
and methodologies that appear quite different on the surface are in fact very
closely related. The axes by which these categorizations are made include the
format of the contexts (headed versus headless), the way in which the contexts
are to be measured (first-order versus second-order similarity), and the
information used to represent the features in the contexts (micro versus macro
views). The unifying thread that binds together many short context applications
and methods is the fact that similarity decisions must be made between contexts
that share few (if any) words in common.Comment: 23 page
- …