128 research outputs found

    Flexible constrained sampling with guarantees for pattern mining

    Get PDF
    Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that can be used, and/or 2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: 1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and 2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal (ECML/PKDD 2017 journal track

    The success-index: an alternative approach to the h-index for evaluating an individual's research output

    Get PDF
    Among the most recent bibliometric indicators for normalizing the differences among fields of science in terms of citation behaviour, Kosmulski (J Informetr 5(3):481-485, 2011) proposed the NSP (number of successful paper) index. According to the authors, NSP deserves much attention for its great simplicity and immediate meaning— equivalent to those of the h-index—while it has the disadvantage of being prone to manipulation and not very efficient in terms of statistical significance. In the first part of the paper, we introduce the success-index, aimed at reducing the NSP-index's limitations, although requiring more computing effort. Next, we present a detailed analysis of the success-index from the point of view of its operational properties and a comparison with the h-index's ones. Particularly interesting is the examination of the success-index scale of measurement, which is much richer than the h-index's. This makes success-index much more versatile for different types of analysis—e.g., (cross-field) comparisons of the scientific output of (1) individual researchers, (2) researchers with different seniority, (3) research institutions of different size, (4) scientific journals, etc

    An index to quantify an individual's scientific research output that takes into account the effect of multiple coauthorship

    Full text link
    I propose the index \hbar ("hbar"), defined as the number of papers of an individual that have citation count larger than or equal to the \hbar of all coauthors of each paper, as a useful index to characterize the scientific output of a researcher that takes into account the effect of multiple coauthorship. The bar is higher for \hbar.Comment: A few minor changes from v1. To be published in Scientometric

    The hw-rank: an h-index variant for ranking web pages

    Get PDF
    We introduce a novel ranking of search results based on a variant of the h-index for directed information networks such as the Web. The h-index was originally introduced to measure an individual researcher’s scientific output and influence, but here a variant of it is applied to assess the ‘‘importance’’ of web pages. Like PageRank, the‘‘importance’’ of a page is defined by the ‘‘importance’’ of the pages linking to it. However, unlike the computation of PageRank which involves the whole web graph, computing the h-index for web pages (the hw-rank) is based on a local computation and only the neighbors of the neighbors of the given node are considered. Preliminary results show a strong correlation between ranking with the hw-rank and PageRank, and moreover its computation is simpler and less complex than computation of the PageRank. Further, larger scale experiments are needed in order to assess the applicability of the method

    Statistical regularities in the rank-citation profile of scientists

    Get PDF
    Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile ci(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each ci(r) to a common distribution function. Since two scientists with equivalent Hirsch h-index can have significantly different ci(r) profiles, our results demonstrate the utility of the βi scaling parameter in conjunction with hi for quantifying individual publication impact. We show that the total number of citations Ci tallied from a scientist's Ni papers scales as . Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress

    Natural disasters and indicators of social cohesion

    Get PDF
    Do adversarial environmental conditions create social cohesion? We provide new answers to this question by exploiting spatial and temporal variation in exposure to earthquakes across Chile. Using a variety of methods and controlling for a number of socio-economic variables, we find that exposure to earthquakes has a positive effect on several indicators of social cohesion. Social cohesion increases after a big earthquake and slowly erodes in periods where environmental conditions are less adverse. Our results contribute to the current debate on whether and how environmental conditions shape formal and informal institutions

    Flaring Stars in a Non-targeted mm-wave Survey with SPT-3G

    Full text link
    We present a flare star catalog from four years of non-targeted millimeter-wave survey data from the South Pole Telescope (SPT). The data were taken with the SPT-3G camera and cover a 1500-square-degree region of the sky from 20h40m0s20^{h}40^{m}0^{s} to 3h20m0s3^{h}20^{m}0^{s} in right ascension and 42-42^{\circ} to 70-70^{\circ} in declination. This region was observed on a nearly daily cadence from 2019-2022 and chosen to avoid the plane of the galaxy. A short-duration transient search of this survey yields 111 flaring events from 66 stars, increasing the number of both flaring events and detected flare stars by an order of magnitude from the previous SPT-3G data release. We provide cross-matching to Gaia DR3, as well as matches to X-ray point sources found in the second ROSAT all-sky survey. We have detected flaring stars across the main sequence, from early-type A stars to M dwarfs, as well as a large population of evolved stars. These stars are mostly nearby, spanning 10 to 1000 parsecs in distance. Most of the flare spectral indices are constant or gently rising as a function of frequency at 95/150/220 GHz. The timescale of these events can range from minutes to hours, and the peak νLν\nu L_{\nu} luminosities range from 102710^{27} to 103110^{31} erg s1^{-1} in the SPT-3G frequency bands
    corecore