128 research outputs found
Flexible constrained sampling with guarantees for pattern mining
Pattern sampling has been proposed as a potential solution to the infamous
pattern explosion. Instead of enumerating all patterns that satisfy the
constraints, individual patterns are sampled proportional to a given quality
measure. Several sampling algorithms have been proposed, but each of them has
its limitations when it comes to 1) flexibility in terms of quality measures
and constraints that can be used, and/or 2) guarantees with respect to sampling
accuracy. We therefore present Flexics, the first flexible pattern sampler that
supports a broad class of quality measures and constraints, while providing
strong guarantees regarding sampling accuracy. To achieve this, we leverage the
perspective on pattern mining as a constraint satisfaction problem and build
upon the latest advances in sampling solutions in SAT as well as existing
pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
a variety of pattern languages, which allows us to introduce and tackle the
novel task of sampling sets of patterns. We introduce and empirically evaluate
two variants of Flexics: 1) a generic variant that addresses the well-known
itemset sampling task and the novel pattern set sampling task as well as a wide
range of expressive constraints within these tasks, and 2) a specialized
variant that exploits existing frequent itemset techniques to achieve
substantial speed-ups. Experiments show that Flexics is both accurate and
efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal
(ECML/PKDD 2017 journal track
The success-index: an alternative approach to the h-index for evaluating an individual's research output
Among the most recent bibliometric indicators for normalizing the differences among fields of science in terms of citation behaviour, Kosmulski (J Informetr 5(3):481-485, 2011) proposed the NSP (number of successful paper) index. According to the authors, NSP deserves much attention for its great simplicity and immediate meaning— equivalent to those of the h-index—while it has the disadvantage of being prone to manipulation and not very efficient in terms of statistical significance. In the first part of the paper, we introduce the success-index, aimed at reducing the NSP-index's limitations, although requiring more computing effort. Next, we present a detailed analysis of the success-index from the point of view of its operational properties and a comparison with the h-index's ones. Particularly interesting is the examination of the success-index scale of measurement, which is much richer than the h-index's. This makes success-index much more versatile for different types of analysis—e.g., (cross-field) comparisons of the scientific output of (1) individual researchers, (2) researchers with different seniority, (3) research institutions of different size, (4) scientific journals, etc
An index to quantify an individual's scientific research output that takes into account the effect of multiple coauthorship
I propose the index ("hbar"), defined as the number of papers of an
individual that have citation count larger than or equal to the of all
coauthors of each paper, as a useful index to characterize the scientific
output of a researcher that takes into account the effect of multiple
coauthorship. The bar is higher for .Comment: A few minor changes from v1. To be published in Scientometric
The hw-rank: an h-index variant for ranking web pages
We introduce a novel ranking of search results based on a variant of the h-index for directed information networks such as the Web. The h-index was originally
introduced to measure an individual researcher’s scientific output and influence, but here a variant of it is applied to assess the ‘‘importance’’ of web pages. Like PageRank, the‘‘importance’’ of a page is defined by the ‘‘importance’’ of the pages linking to it. However,
unlike the computation of PageRank which involves the whole web graph, computing the h-index for web pages (the hw-rank) is based on a local computation and only the
neighbors of the neighbors of the given node are considered. Preliminary results show a strong correlation between ranking with the hw-rank and PageRank, and moreover its computation is simpler and less complex than computation of the PageRank. Further, larger scale experiments are needed in order to assess the applicability of the method
Statistical regularities in the rank-citation profile of scientists
Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile ci(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each ci(r) to a common distribution function. Since two scientists with equivalent Hirsch h-index can have significantly different ci(r) profiles, our results demonstrate the utility of the βi scaling parameter in conjunction with hi for quantifying individual publication impact. We show that the total number of citations Ci tallied from a scientist's Ni papers scales as . Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress
Natural disasters and indicators of social cohesion
Do adversarial environmental conditions create social cohesion? We provide new answers to this question by exploiting spatial and temporal variation in exposure to earthquakes across Chile. Using a variety of methods and controlling for a number of socio-economic variables, we find that exposure to earthquakes has a positive effect on several indicators of social cohesion. Social cohesion increases after a big earthquake and slowly erodes in periods where environmental conditions are less adverse. Our results contribute to the current debate on whether and how environmental conditions shape formal and informal institutions
Flaring Stars in a Non-targeted mm-wave Survey with SPT-3G
We present a flare star catalog from four years of non-targeted
millimeter-wave survey data from the South Pole Telescope (SPT). The data were
taken with the SPT-3G camera and cover a 1500-square-degree region of the sky
from to in right ascension and
to in declination. This region was observed on a
nearly daily cadence from 2019-2022 and chosen to avoid the plane of the
galaxy. A short-duration transient search of this survey yields 111 flaring
events from 66 stars, increasing the number of both flaring events and detected
flare stars by an order of magnitude from the previous SPT-3G data release. We
provide cross-matching to Gaia DR3, as well as matches to X-ray point sources
found in the second ROSAT all-sky survey. We have detected flaring stars across
the main sequence, from early-type A stars to M dwarfs, as well as a large
population of evolved stars. These stars are mostly nearby, spanning 10 to 1000
parsecs in distance. Most of the flare spectral indices are constant or gently
rising as a function of frequency at 95/150/220 GHz. The timescale of these
events can range from minutes to hours, and the peak luminosities
range from to erg s in the SPT-3G frequency bands
- …