Search CORE

477 research outputs found

Recommended from our members

Parallel computing in information retrieval - An updated review

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for Text Retrieval. We analyse parallel IR systems using a classification due to Rasmussen [1] and describe some parallel IR systems. We give a description of the retrieval models used in parallel Information Processing.. We describe areas of research which we believe are needed

City Research Online

Crossref

An Indexing Scheme and Descriptor for 3D Object Retrieval Based on Local Shape Querying

Author: Theoharis Theoharis
van Blokland Bart Iver
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

A binary descriptor indexing scheme based on Hamming distance called the Hamming tree for local shape queries is presented. A new binary clutter resistant descriptor named Quick Intersection Count Change Image (QUICCI) is also introduced. This local shape descriptor is extremely small and fast to compare. Additionally, a novel distance function called Weighted Hamming applicable to QUICCI images is proposed for retrieval applications. The effectiveness of the indexing scheme and QUICCI is demonstrated on 828 million QUICCI images derived from the SHREC2017 dataset, while the clutter resistance of QUICCI is shown using the clutterbox experiment.Comment: 13 pages, 13 figures, to be published in a Special Issue in Computers & Graphic

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

Partial 3D Object Retrieval using Local Binary QUICCI Descriptors and Dissimilarity Tree Indexing

Author: Theoharis Theoharis
van Blokland Bart Iver
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

A complete pipeline is presented for accurate and efficient partial 3D object retrieval based on Quick Intersection Count Change Image (QUICCI) binary local descriptors and a novel indexing tree. It is shown how a modification to the QUICCI query descriptor makes it ideal for partial retrieval. An indexing structure called Dissimilarity Tree is proposed which can significantly accelerate searching the large space of local descriptors; this is applicable to QUICCI and other binary descriptors. The index exploits the distribution of bits within descriptors for efficient retrieval. The retrieval pipeline is tested on the artificial part of SHREC'16 dataset with near-ideal retrieval results.Comment: 19 pages, 17 figures, to be published in Computers & Graphic

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

Exploiting the Computational Power of Ternary Content Addressable Memory

Author: Tirdad Kamran
Publication venue: 'University of Waterloo'
Publication date: 01/01/2011
Field of study

Ternary Content Addressable Memory or in short TCAM is a special type of memory that can execute a certain set of operations in parallel on all of its words. Because of power consumption and relatively small storage capacity, it has only been used in special environments. Over the past few years its cost has been reduced and its storage capacity has increased signifi cantly and these exponential trends are continuing. Hence it can be used in more general environments for larger problems. In this research we study how to exploit its computational power in order to speed up fundamental problems and needless to say that we barely scratched the surface. The main problems that has been addressed in our research are namely Boolean matrix multiplication, approximate subset queries using bloom filters, Fixed universe priority queues and network flow classi cation. For Boolean matrix multiplication our simple algorithm has a run time of O (d(N^2)/w) where N is the size of the square matrices, w is the number of bits in each word of TCAM and d is the maximum number of ones in a row of one of the matrices. For the Fixed universe priority queue problems we propose two data structures one with constant time complexity and space of O((1/ε)n(U^ε)) and the other one in linear space and amortized time complexity of O((lg lg U)/(lg lg lg U)) which beats the best possible data structure in the RAM model namely Y-fast trees. Considering each word of TCAM as a bloom filter, we modify the hash functions of the bloom filter and propose a data structure which can use the information capacity of each word of TCAM more efi ciently by using the co-occurrence probability of possible members. And finally in the last chapter we propose a novel technique for network flow classi fication using TCAM

University of Waterloo's Institutional Repository

Data Management for Dynamic Multimedia Analytics and Retrieval

Author: Gasser Ralph Marc Philipp
Publication venue
Publication date: 01/01/2023
Field of study

Multimedia data in its various manifestations poses a unique challenge from a data storage and data management perspective, especially if search, analysis and analytics in large data corpora is considered. The inherently unstructured nature of the data itself and the curse of dimensionality that afflicts the representations we typically work with in its stead are cause for a broad range of issues that require sophisticated solutions at different levels. This has given rise to a huge corpus of research that puts focus on techniques that allow for effective and efficient multimedia search and exploration. Many of these contributions have led to an array of purpose-built, multimedia search systems. However, recent progress in multimedia analytics and interactive multimedia retrieval, has demonstrated that several of the assumptions usually made for such multimedia search workloads do not hold once a session has a human user in the loop. Firstly, many of the required query operations cannot be expressed by mere similarity search and since the concrete requirement cannot always be anticipated, one needs a flexible and adaptable data management and query framework. Secondly, the widespread notion of staticity of data collections does not hold if one considers analytics workloads, whose purpose is to produce and store new insights and information. And finally, it is impossible even for an expert user to specify exactly how a data management system should produce and arrive at the desired outcomes of the potentially many different queries. Guided by these shortcomings and motivated by the fact that similar questions have once been answered for structured data in classical database research, this Thesis presents three contributions that seek to mitigate the aforementioned issues. We present a query model that generalises the notion of proximity-based query operations and formalises the connection between those queries and high-dimensional indexing. We complement this by a cost-model that makes the often implicit trade-off between query execution speed and results quality transparent to the system and the user. And we describe a model for the transactional and durable maintenance of high-dimensional index structures. All contributions are implemented in the open-source multimedia database system Cottontail DB, on top of which we present an evaluation that demonstrates the effectiveness of the proposed models. We conclude by discussing avenues for future research in the quest for converging the fields of databases on the one hand and (interactive) multimedia retrieval and analytics on the other

edoc

Improving the open cluster census. I. Comparison of clustering algorithms applied to Gaia DR2 data

Author: Hunt Emily L.
Reffert Sabine
Publication venue: 'EDP Sciences'
Publication date: 08/12/2020
Field of study

The census of open clusters in the Milky Way is in a never-before seen state of flux. Recent works have reported hundreds of new open clusters thanks to the incredible astrometric quality of the Gaia satellite, but other works have also reported that many open clusters discovered in the pre Gaia era may be associations. We aim to conduct a comparison of clustering algorithms used to detect open clusters, attempting to statistically quantify their strengths and weaknesses by deriving the sensitivity, specificity, and precision of each as well as their true positive rate against a larger sample. We selected DBSCAN, HDBSCAN, and Gaussian mixture models for further study, owing to their speed and appropriateness for use with Gaia data. We developed a preprocessing pipeline for Gaia data and developed the algorithms further for the specific application to open clusters. We derived detection rates for all 1385 open clusters in the fields in our study as well as more detailed performance statistics for 100 of these open clusters. DBSCAN was sensitive to 50% to 62% of the true positive open clusters in our sample, with generally very good specificity and precision. HDBSCAN traded precision for a higher sensitivity of up to 82%, especially across different distances and scales of open clusters. Gaussian mixture models were slow and only sensitive to 33% of open clusters in our sample, which tended to be larger objects. Additionally, we report on 41 new open cluster candidates detected by HDBSCAN, three of which are closer than 500 pc. When used with additional post-processing to mitigate its false positives, we have found that HDBSCAN is the most sensitive and effective algorithm for recovering open clusters in Gaia data. Our results suggest that many more new and already reported open clusters have yet to be detected in Gaia data.Comment: 28 pages, 13 figures, and 8 tables. Accepted in A&A. Supporting data is available on request until archiving at the CDS is complete

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

Author: Gong Ming
Jiang Daxin
Pei Jian
Ren Houxing
Shou Linjun
Zhuang Shengyao
Zuccon Guido
Publication venue
Publication date: 19/01/2023
Field of study

The Differentiable Search Index (DSI) is an emerging paradigm for information retrieval. Unlike traditional retrieval architectures where index and retrieval are two different and separate components, DSI uses a single transformer model to perform both indexing and retrieval. In this paper, we identify and tackle an important issue of current DSI models: the data distribution mismatch that occurs between the DSI indexing and retrieval processes. Specifically, we argue that, at indexing, current DSI methods learn to build connections between the text of long documents and the identifier of the documents, but then retrieval of document identifiers is based on queries that are commonly much shorter than the indexed documents. This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages. To address this fundamental problem of current DSI models, we propose a simple yet effective indexing framework for DSI, called DSI-QG. When indexing, DSI-QG represents documents with a number of potentially relevant queries generated by a query generation model and re-ranked and filtered by a cross-encoder ranker. The presence of these queries at indexing allows the DSI models to connect a document identifier to a set of queries, hence mitigating data distribution mismatches present between the indexing and the retrieval phases. Empirical results on popular mono-lingual and cross-lingual passage retrieval datasets show that DSI-QG significantly outperforms the original DSI model.Comment: 11 page

arXiv.org e-Print Archive

Database support for large-scale multimedia retrieval

Author: Giangreco Ivan
Publication venue
Publication date: 01/01/2018
Field of study

With the increasing proliferation of recording devices and the resulting abundance of multimedia data available nowadays, searching and managing these ever-growing collections becomes more and more difficult. In order to support retrieval tasks within large multimedia collections, not only the sheer size, but also the complexity of data and their associated metadata pose great challenges, in particular from a data management perspective. Conventional approaches to address this task have been shown to have only limited success, particularly due to the lack of support for the given data and the required query paradigms. In the area of multimedia research, the missing support for efficiently and effectively managing multimedia data and metadata has recently been recognised as a stumbling block that constraints further developments in the field. In this thesis, we bridge the gap between the database and the multimedia retrieval research areas. We approach the problem of providing a data management system geared towards large collections of multimedia data and the corresponding query paradigms. To this end, we identify the necessary building-blocks for a multimedia data management system which adopts the relational data model and the vector-space model. In essence, we make the following main contributions towards a holistic model of a database system for multimedia data: We introduce an architectural model describing a data management system for multimedia data from a system architecture perspective. We further present a data model which supports the storage of multimedia data and the corresponding metadata, and provides similarity-based search operations. This thesis describes an extensive query model for a very broad range of different query paradigms specifying both logical and executional aspects of a query. Moreover, we consider the efficiency and scalability of the system in a distribution and a storage model, and provide a large and diverse set of index structures for high-dimensional data coming from the vector-space model. Thee developed models crystallise into the scalable multimedia data management system ADAMpro which has been implemented within the iMotion/vitrivr retrieval stack. We quantitatively evaluate our concepts on collections that exceed the current state of the art. The results underline the benefits of our approach and assist in understanding the role of the introduced concepts. Moreover, the findings provide important implications for future research in the field of multimedia data management

edoc

Comparative cellular analysis of motor cortex in human, marmoset and mouse

Author: Aevermann B.D.
Aldridge A.I.
Ament S.A.
Bakken T.E.
Bartlett A.
Behrens M.M.
Bertagnolli D.
Bravo H.C.
Casper T.
Castanon R.G.
Chun J.
Crichton K.
Crow M.
Daigle T.L.
Dalley R.
Dee N.
Dembrow N.
Diep D.
Ding S.L.
Dobin A.
Dong W.X.
Ecker J.R.
Eggermont J.
Fang R.X.
Feng G.P.
Fischer S.
Gillis J.
Goldman M.
Goldy J.
Graybuck L.T.
Hawrylycz M.
Herb B.R.
Hertzano R.
Hodge R.D.
Hof P.R.
Hollt T.
Horwitz G.D.
Hou X.M.
Hu Q.W.
Jorstad N.L.
Kalmbach B.E.
Kancherla J.
Keene C.D.
Kharchenko P.V.
Ko A.L.
Koch C.
Krienen F.M.
Kroll M.
Lake B.B.
Lathia K.
Lein E.S.
Lelieveldt B.P.
Lew B. van
Li Y.E.
Linnarsson S.
Liu C.E.S.
Liu H.Q.
Lucero J.D.
Luo C.Y.
Macosko E.Z.
Mahurkar A.
McCarroll S.A.
McMillen D.
Miller J.A.
Moussa M.
Mukamel E.A.
Nery J.R.
Nicovich P.R.
Niu S.Y.
Orvis J.
Osteen J.K.
Owen S.
Palmer C.R.
Pham T.
Pinto-Duarte A.
Plongthongkum N.
Poirion O.
Preissl S.
Reed N.M.
Regev A.
Ren B.
Rimorin C.
Rivkin A.
Romanow W.J.
Scheuermann R.H.
Sedeno-Cortes A.E.
Siletti K.
Smith K.
Somasundaram S.
Sorensen S.A.
Spain W.J.
Sulc J.
Tasic B.
Tian W.
Tieu M.
Ting J.T.
Torkelson A.
Tung H.R.
Wang X.N.
White O.R.
Xie F.M.
Yanny A.M.
Yao Z.Z.
Zeng H.K.
Zhang K.
Zhang R.E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2021
Field of study

The primary motor cortex (M1) is essential for voluntary fine-motor control and is functionally conserved across mammals(1). Here, using high-throughput transcriptomic and epigenomic profiling of more than 450,000 single nuclei in humans, marmoset monkeys and mice, we demonstrate a broadly conserved cellular makeup of this region, with similarities that mirror evolutionary distance and are consistent between the transcriptome and epigenome. The core conserved molecular identities of neuronal and non-neuronal cell types allow us to generate a cross-species consensus classification of cell types, and to infer conserved properties of cell types across species. Despite the overall conservation, however, many species-dependent specializations are apparent, including differences in cell-type proportions, gene expression, DNA methylation and chromatin state. Few cell-type marker genes are conserved across species, revealing a short list of candidate genes and regulatory mechanisms that are responsible for conserved features of homologous cell types, such as the GABAergic chandelier cells. This consensus transcriptomic classification allows us to use patch-seq (a combination of whole-cell patch-clamp recordings, RNA sequencing and morphological characterization) to identify corticospinal Betz cells from layer 5 in non-human primates and humans, and to characterize their highly specialized physiology and anatomy. These findings highlight the robust molecular underpinnings of cell-type diversity in M1 across mammals, and point to the genes and regulatory pathways responsible for the functional identity of cell types and their species-specific adaptations.Cardiovascular Aspects of Radiolog

Leiden University Scholary Publications

Comparative cellular analysis of motor cortex in human, marmoset and mouse

Author: Aevermann Brian D
Aldridge Andrew I
Ament Seth A
Bakken Trygve E
Bartlett Anna
Behrens M Margarita
Bertagnolli Darren
Bravo Hector Corrada
Casper Tamara
Castanon Rosa G
Chun Jerold
Crichton Kirsten
Crow Megan
Daigle Tanya L
Dalley Rachel
Dee Nick
Dembrow Nikolai
Diep Dinh
Ding Song-Lin
Dobin Alexander
Dong Weixiu
Ecker Joseph R
Eggermont Jeroen
Fang Rongxin
Feng Guoping
Fischer Stephan
Gillis Jesse
Goldman Melissa
Goldy Jeff
Graybuck Lucas T
Hawrylycz Michael
Herb Brian R
Hertzano Ronna
Hodge Rebecca D
Hof Patrick R
Horwitz Gregory D
Hou Xiaomeng
Hu Qiwen
Höllt Thomas
Jorstad Nikolas L
Kalmbach Brian E
Kancherla Jayaram
Keene C Dirk
Kharchenko Peter V
Ko Andrew L
Koch Christof
Krienen Fenna M
Kroll Matthew
Lake Blue B
Lathia Kanan
Lein Ed S
Lelieveldt Boudewijn P
Li Yang Eric
Linnarsson Sten
Liu Christine S
Liu Hanqing
Lucero Jacinta D
Luo Chongyuan
Macosko Evan Z
Mahurkar Anup
McCarroll Steven A
McMillen Delissa
Miller Jeremy A
Moussa Marmar
Mukamel Eran A
Nery Joseph R
Nicovich Philip R
Niu Sheng-Yong
Orvis Joshua
Osteen Julia K
Owen Scott
Palmer Carter R
Pham Thanh
Pinto-Duarte António
Plongthongkum Nongluk
Poirion Olivier
Preissl Sebastian
Reed Nora M
Regev Aviv
Ren Bing
Rimorin Christine
Rivkin Angeline
Romanow William J
Scheuermann Richard H
Sedeño-Cortés Adriana E
Siletti Kimberly
Smith Kimberly
Somasundaram Saroja
Sorensen Staci A
Spain William J
Sulc Josef
Tasic Bosiljka
Tian Wei
Tieu Michael
Ting Jonathan T
Torkelson Amy
Tung Herman
van Lew Baldur
Wang Xinxin
White Owen R
Xie Fangming
Yanny Anna Marie
Yao Zizhen
Zeng Hongkui
Zhang Kun
Zhang Renee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2021
Field of study

The primary motor cortex (M1) is essential for voluntary fine-motor control and is functionally conserved across mammals1. Here, using high-throughput transcriptomic and epigenomic profiling of more than 450,000 single nuclei in humans, marmoset monkeys and mice, we demonstrate a broadly conserved cellular makeup of this region, with similarities that mirror evolutionary distance and are consistent between the transcriptome and epigenome. The core conserved molecular identities of neuronal and non-neuronal cell types allow us to generate a cross-species consensus classification of cell types, and to infer conserved properties of cell types across species. Despite the overall conservation, however, many species-dependent specializations are apparent, including differences in cell-type proportions, gene expression, DNA methylation and chromatin state. Few cell-type marker genes are conserved across species, revealing a short list of candidate genes and regulatory mechanisms that are responsible for conserved features of homologous cell types, such as the GABAergic chandelier cells. This consensus transcriptomic classification allows us to use patch-seq (a combination of whole-cell patch-clamp recordings, RNA sequencing and morphological characterization) to identify corticospinal Betz cells from layer 5 in non-human primates and humans, and to characterize their highly specialized physiology and anatomy. These findings highlight the robust molecular underpinnings of cell-type diversity in M1 across mammals, and point to the genes and regulatory pathways responsible for the functional identity of cell types and their species-specific adaptations

DSpace@MIT

Cold Spring Harbor Laboratory Institutional Repository

eScholarship - University of California