Search CORE

5 research outputs found

A keyquery-based classification system for CORE

Author: Gollub Tim
Hagen Matthias
Stein Benno
Völske Michael
Publication venue
Publication date: 26/04/2017
Field of study

We apply keyquery-based taxonomy composition to compute a classification system for the CORE dataset, a shared crawl of about 850,000 scientific papers. Keyquery-based taxonomy composition can be understood as a two-phase hierarchical document clustering technique that utilizes search queries as cluster labels: In a first phase, the document collection is indexed by a reference search engine, and the documents are tagged with the search queries they are relevant—for their so-called keyqueries. In a second phase, a hierarchical clustering is formed from the keyqueries within an iterative process. We use the explicit topic model ESA as document retrieval model in order to index the CORE dataset in the reference search engine. Under the ESA retrieval model, documents are represented as vectors of similarities to Wikipedia articles; a methodology proven to be advantageous for text categorization tasks. Our paper presents the generated taxonomy and reports on quantitative properties such as document coverage and processing requirements

Online-Publikationssystem der Bauhaus-Universität Weimar

Digitale Bibliothek Thüringen

A survey on big data indexing strategies

Author: Abdullahi Ibrahim
Adamu Fatimah
Cottrell R. Les
Habbal Adib M. Monzer
Hassan Suhaidi
White Bebo
Publication venue
Publication date: 01/01/2015
Field of study

The operations of the Internet have led to a significant growth and accumulation of data known as Big Data.Individuals and organizations that utilize this data, had no idea, nor were they prepared for this data explosion.Hence, the available solutions cannot meet the needs of the growing heterogeneous data in terms of processing. This results in inefficient information retrieval or search query results.The design of indexing strategies that can support this need is required. A survey on various indexing strategies and how they are utilized for solving Big Data management issues can serve as a guide for choosing the strategy best suited for a problem, and can also serve as a base for the design of more efficient indexing strategies.The aim of the study is to explore the characteristics of the indexing strategies used in Big Data manageability by covering some of the weaknesses and strengths of B-tree, R-tree, to name but a few. This paper covers some popular indexing strategies used for Big Data management. It exposes the potentials of each by carefully exploring their properties in ways that are related to problem solving

UUM Repository

TaxoPublish:Towards a solution to automatically personalize taxonomies in e-catalogs

Author: Angermann Heiko
Ramzan Naeem
Publication venue: 'Elsevier BV'
Publication date: 01/12/2016
Field of study

Crossref

Research Repository and Portal - University of the West of Scotland

Geographic information extraction from texts

Author: Hu Xuke
Hu Yingjie
Kersten Jens
Resch Bernd
Publication venue
Publication date: 05/12/2023
Field of study

A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

Institute of Transport Research:Publications

Dynamic taxonomy composition via keyqueries

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref