53 research outputs found

    PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search

    Full text link
    This paper studies density-based clustering of point sets. These methods use dense regions of points to detect clusters of arbitrary shapes. In particular, we study variants of density peaks clustering, a popular type of algorithm that has been shown to work well in practice. Our goal is to cluster large high-dimensional datasets, which are prevalent in practice. Prior solutions are either sequential, and cannot scale to large data, or are specialized for low-dimensional data. This paper unifies the different variants of density peaks clustering into a single framework, PECANN, by abstracting out several key steps common to this class of algorithms. One such key step is to find nearest neighbors that satisfy a predicate function, and one of the main contributions of this paper is an efficient way to do this predicate search using graph-based approximate nearest neighbor search (ANNS). To provide ample parallelism, we propose a doubling search technique that enables points to find an approximate nearest neighbor satisfying the predicate in a small number of rounds. Our technique can be applied to many existing graph-based ANNS algorithms, which can all be plugged into PECANN. We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading. Compared to the state-of-the-art FASTDP algorithm for high-dimensional density peaks clustering, which is sequential, our best algorithm is 45x-734x faster while achieving competitive ARI scores. Compared to the state-of-the-art parallel DPC-based algorithm, which is optimized for low dimensions, we show that PECANN is two orders of magnitude faster. As far as we know, our work is the first to evaluate DPC variants on large high-dimensional real-world image and text embedding datasets

    DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries

    Full text link
    We study the problem of vector set search\textit{vector set search} with vector set queries\textit{vector set queries}. This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are sets\textit{sets} of vectors. We identify this problem as a core subroutine for semantic search applications and find that existing solutions are unacceptably slow. Towards this end, we present a new approximate search algorithm, DESSERT (D{\bf D}ESSERT E{\bf E}ffeciently S{\bf S}earches S{\bf S}ets of E{\bf E}mbeddings via R{\bf R}etrieval T{\bf T}ables). DESSERT is a general tool with strong theoretical guarantees and excellent empirical performance. When we integrate DESSERT into ColBERT, a state-of-the-art semantic search model, we find a 2-5x speedup on the MS MARCO and LoTTE retrieval benchmarks with minimal loss in recall, underscoring the effectiveness and practical applicability of our proposal.Comment: Code available, https://github.com/ThirdAIResearch/Desser

    Identifying publications in questionable journals in the context of performance-based research funding

    Get PDF
    In this article we discuss the five yearly screenings for publications in questionable journals which have been carried out in the context of the performance-based research funding model in Flanders, Belgium. The Flemish funding model expanded from 2010 onwards, with a comprehensive bibliographic database for research output in the social sciences and humanities. Along with an overview of the procedures followed during the screenings for articles in questionable journals submitted for inclusion in this database, we present a bibliographic analysis of the publications identified. First, we show how the yearly number of publications in questionable journals has evolved over the period 2003–2016. Second, we present a disciplinary classification of the identified journals. In the third part of the results section, three authorship characteristics are discussed: multi-authorship, the seniority–or experience level–of authors in general and of the first author in particular, and the relation of the disciplinary scope of the journal (cognitive classification) with the departmental affiliation of the authors (organizational classification). Our results regarding yearly rates of publications in questionable journals indicate that awareness of the risks of questionable journals does not lead to a turn away from open access in general. The number of publications in open access journals rises every year, while the number of publications in questionable journals decreases from 2012 onwards. We find further that both early career and more senior researchers publish in questionable journals. We show that the average proportion of senior authors contributing to publications in questionable journals is somewhat higher than that for publications in open access journals. In addition, this paper yields insight into the extent to which publications in questionable journals pose a threat to the public and political legitimacy of a performance-based research funding system of a western European region. We include concrete suggestions for those tasked with maintaining bibliographic databases and screening for publications in questionable journals

    Predatory Open Access journals: A review of past screenings within the Flemish performance based research funding system (2014 – 2018)

    Get PDF
    From 2013 – 2014 onwards, our group (ECOOM - UAntwerpen) has been monitoring Predatory Open Access publication patterns in Flemish (Belgium) SSH scholarship. In light of the Flemish Performance Based Research Funding System, these screening exercises are conducted to assist university review boards with the decision-making processes concerning what is and what is not to be considered a peer reviewed periodical. Each year, the results of these monitoring exercises than, are published in as a report, and presented to the Authoritative Penal. In the introductory part of this essay, we will present a general background against which these yearly screenings emerged. Second, we will present the sources used and the methods deployed for the yearly screenings. Thereafter, we will shortly present the yearly results these exercises yielded. In the third section, we present a more comprehensive analysis of the results. We conclude with reflecting on the past exercises and the findings presented in this report, and discuss some implications for colleagues and scholars manoeuvring through the contemporary journal landscape

    Predatory Open Access journals: A review of past screenings within the Flemish performance based research funding system (2014 – 2018)

    Get PDF
    From 2013 – 2014 onwards, our group (ECOOM - UAntwerpen) has been monitoring Predatory Open Access publication patterns in Flemish (Belgium) SSH scholarship. In light of the Flemish Performance Based Research Funding System, these screening exercises are conducted to assist university review boards with the decision-making processes concerning what is and what is not to be considered a peer reviewed periodical. Each year, the results of these monitoring exercises than, are published in as a report, and presented to the Authoritative Penal. In the introductory part of this essay, we will present a general background against which these yearly screenings emerged. Second, we will present the sources used and the methods deployed for the yearly screenings. Thereafter, we will shortly present the yearly results these exercises yielded. In the third section, we present a more comprehensive analysis of the results. We conclude with reflecting on the past exercises and the findings presented in this report, and discuss some implications for colleagues and scholars manoeuvring through the contemporary journal landscape

    Predatory Open Access journals: A review of past screenings within the Flemish performance based research funding system (2014 – 2018)

    Get PDF
    From 2013 – 2014 onwards, our group (ECOOM - UAntwerpen) has been monitoring Predatory Open Access publication patterns in Flemish (Belgium) SSH scholarship. In light of the Flemish Performance Based Research Funding System, these screening exercises are conducted to assist university review boards with the decision-making processes concerning what is and what is not to be considered a peer reviewed periodical. Each year, the results of these monitoring exercises than, are published in as a report, and presented to the Authoritative Penal. In the introductory part of this essay, we will present a general background against which these yearly screenings emerged. Second, we will present the sources used and the methods deployed for the yearly screenings. Thereafter, we will shortly present the yearly results these exercises yielded. In the third section, we present a more comprehensive analysis of the results. We conclude with reflecting on the past exercises and the findings presented in this report, and discuss some implications for colleagues and scholars manoeuvring through the contemporary journal landscape

    Joint Observation of the Galactic Center with MAGIC and CTA-LST-1

    Get PDF
    MAGIC is a system of two Imaging Atmospheric Cherenkov Telescopes (IACTs), designed to detect very-high-energy gamma rays, and is operating in stereoscopic mode since 2009 at the Observatorio del Roque de Los Muchachos in La Palma, Spain. In 2018, the prototype IACT of the Large-Sized Telescope (LST-1) for the Cherenkov Telescope Array, a next-generation ground-based gamma-ray observatory, was inaugurated at the same site, at a distance of approximately 100 meters from the MAGIC telescopes. Using joint observations between MAGIC and LST-1, we developed a dedicated analysis pipeline and established the threefold telescope system via software, achieving the highest sensitivity in the northern hemisphere. Based on this enhanced performance, MAGIC and LST-1 have been jointly and regularly observing the Galactic Center, a region of paramount importance and complexity for IACTs. In particular, the gamma-ray emission from the dynamical center of the Milky Way is under debate. Although previous measurements suggested that a supermassive black hole Sagittarius A* plays a primary role, its radiation mechanism remains unclear, mainly due to limited angular resolution and sensitivity. The enhanced sensitivity in our novel approach is thus expected to provide new insights into the question. We here present the current status of the data analysis for the Galactic Center joint MAGIC and LST-1 observations
    • …
    corecore