Search CORE

80,194 research outputs found

Managing RDF Graphs using Mapreduce Algorithm with Indexing Solution for Future Direction

Author: Hetal k. Makvana, Prof. Ashutosh A. Abhangi
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/04/2018
Field of study

�Indexing solution� based on Big RDF (Resource Description Framework) graphs with improve processing which populate the semantic web, are the core data structure of the big web data, the natural transposition of big data on the web. Indexing data structure improve processing on the big RDF graph. it was present the �baseline operation� of fortunate web big data analytic. this require process, access and manage RDF graphs. It was dealing with severe temporal complexity. A solution to problem is represented by MapReduce model based algorithm for indexing solution which try to exploit the computation power offered by the MapReduce processing model in indexing order. this paper provide a survey on MapReduce based algorithm for state-of-the-art proposal using indexing solution

International Journal on Future Revolution in Computer Science & Communication Engineering

On the analysis of big data indexing execution strategies

Author: Chang Victor
Karim Ahmad
Saba Tanzila
Siddiqa Aisha
Publication venue: 'IOS Press'
Publication date: 01/01/2017
Field of study

Efficient response to search queries is very crucial for data analysts to obtain timely results from big data spanned over heterogeneous machines. Currently, a number of big-data processing frameworks are available in which search operations are performed in distributed and parallel manner. However, implementation of indexing mechanism results in noticeable reduction of overall query processing time. There is an urge to assess the feasibility and impact of indexing towards query execution performance. This paper investigates the performance of state-of-the-art clustered indexing approaches over Hadoop framework which is de facto standard for big data processing. Moreover, this study leverages a comparative analysis of non-clustered indexing overhead in terms of time and space taken by indexing process for varying volume data sets with increasing Index Hit Ratio. Furthermore, the experiments evaluate performance of search operations in terms of data access and retrieval time for queries that use indexes. We then validated the obtained results using Petri net mathematical modeling. We used multiple data sets in our experiments to manifest the impact of growing volume of data on indexing and data search and retrieval performance. The results and highlighted challenges favorably lead researchers towards improved implication of indexing mechanism in perspective of data retrieval from big data. Additionally, this study advocates selection of a non-clustered indexing solution so that optimized search performance over big data is obtained

Southampton (e-Prints Soton)

Crossref

A survey on big data indexing strategies

Author: Abdullahi Ibrahim
Adamu Fatimah
Cottrell R. Les
Habbal Adib M. Monzer
Hassan Suhaidi
White Bebo
Publication venue
Publication date: 01/01/2015
Field of study

The operations of the Internet have led to a significant growth and accumulation of data known as Big Data.Individuals and organizations that utilize this data, had no idea, nor were they prepared for this data explosion.Hence, the available solutions cannot meet the needs of the growing heterogeneous data in terms of processing. This results in inefficient information retrieval or search query results.The design of indexing strategies that can support this need is required. A survey on various indexing strategies and how they are utilized for solving Big Data management issues can serve as a guide for choosing the strategy best suited for a problem, and can also serve as a base for the design of more efficient indexing strategies.The aim of the study is to explore the characteristics of the indexing strategies used in Big Data manageability by covering some of the weaknesses and strengths of B-tree, R-tree, to name but a few. This paper covers some popular indexing strategies used for Big Data management. It exposes the potentials of each by carefully exploring their properties in ways that are related to problem solving

UUM Repository

Indexing Metric Spaces for Exact Similarity Search

Author: Chen Lu
Gao Yunjun
Jensen Christian S.
Li Zheng
Miao Xiaoye
Song Xuan
Zhu Yifan
Publication venue
Publication date: 07/05/2020
Field of study

With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

arXiv.org e-Print Archive

VBN

The OTree: multidimensional indexing with efficient data sampling for HPC

Author: Becerra Fontal Yolanda
Calmet Hadrien
Cugnasco Cesare
Eguzkitza Ane Beatriz
Houzeaux Guillaume
Labarta Mancho Jesús José
Santamaria Mateu Pol
Sirvent Pardell Raül
Torres Viñals Jordi
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2019
Field of study

Spatial big data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of petabytes of spatial data per year. However, many authors have pointed out that the lack of specialized frameworks for multidimensional Big Data is limiting possible applications and precluding many scientific breakthroughs. Paramount in achieving High-Performance Data Analytics is to optimize and reduce the I/O operations required to analyze large data sets. To do so, we need to organize and index the data according to its multidimensional attributes. At the same time, to enable fast and interactive exploratory analysis, it is vital to generate approximate representations of large datasets efficiently. In this paper, we propose the Outlook Tree (or OTree), a novel Multidimensional Indexing with efficient data Sampling (MIS) algorithm. The OTree enables exploratory analysis of large multidimensional datasets with arbitrary precision, a vital missing feature in current distributed data management solutions. Our algorithm reduces the indexing overhead and achieves high performance even for write-intensive HPC applications. Indeed, we use the OTree to store the scientific results of a study on the efficiency of drug inhalers. Then we compare the OTree implementation on Apache Cassandra, named Qbeast, with PostgreSQL and plain storage. Lastly, we demonstrate that our proposal delivers better performance and scalability.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Document Indexing Strategies in Big Data A Survey

Author: K. Swapnika, K. Swanthana, Y. Krishna Bhargavi
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/04/2018
Field of study

From past few years, the operations of the Internet have a significant growth and individuals, organizations were unaware of this data explosion. Because of the increasing quantity and diversity of digital documents available to end users, mechanism for their effective and efficient retrieval is given highest importance. One crucial aspect of this mechanism is indexing, which serves to allow documents to be located quickly. The problem is that users want to retrieve on the basis of context, and individual words provide unreliable evidence about the contextual topic or meaning of a document. Hence, the available solutions cannot meet the needs of the growing heterogeneous data in terms of processing. This results in inefficient information retrieval or search query results. The design of indexing strategies that can support this need is required. There are various indexing strategies which are utilized for solving Big Data management issues, and can also serve as a base for the design of more efficient indexing strategies. The aim is to explore document indexing strategy for Big Data manageability. The existing systems like, Latent Semantic Indexing , Inverted Indexing, Semantic indexing and Vector Space Model has their own challenges such as, Demands high computational performance, Consumes more memory Space, Longer data processing time, Limits the search space, will not produce the exact answer, Can present wrong answers due to synonyms and polysemy, approach makes use of formal ontology. This paper will describe and compare the various Indexing techniques and presents the characteristics and challenges involved

International Journal on Future Revolution in Computer Science & Communication Engineering

SmallClient for big data: an indexing framework towards fast data retrieval

Author: Chang Victor
Karim Ahmad
Siddiqa Aisha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/12/2016
Field of study

Numerous applications are continuously generating massive amount of data and it has become critical to extract useful information while maintaining acceptable computing performance. The objective of this work is to design an indexing framework which minimizes indexing overhead and improves query execution and data search performance with optimum aggregation of computing performance. We propose Small-Client, an indexing framework to speed up query execution. SmallClient has three modules: block creation, index creation and query execution. Block creation module supports improving data retrieval performance with minimum data uploading overhead. Index creation module allows maximum indexes on a dataset to increase index hit ratio with minimized indexing overhead. Finally, query execution module offers incoming queries to utilize these indexes. The evaluation shows that Small-Client outperforms Hadoop full scan with more than 90% search performance. Meanwhile, indexing overhead of SmallClient is reduced to approximately 50% and 80% for index size and indexing time respectively

Southampton (e-Prints Soton)

Crossref