19 research outputs found
Indexing collections of XML documents with arbitrary links
In recent years, the popularity of XML has increased significantly. XML is the extensible markup language of the World Wide Web Consortium (W3C). XML is used to represent data in many areas, such as traditional database management systems, e-business environments, and the World Wide Web. XML data, unlike relational and object-oriented data, has no fixed schema known in advance and is stored separately from the data. XML data is self-describing and can model heterogeneity more naturally than relational or object-oriented data models. Moreover, XML data usually has XLinks or XPointers to data in other documents (e.g., global-links). In addition to XLink or XPointer links, the XML standard allows to add internal-links between different elements in the same XML document using the ID/IDREF attributes. The rise in popularity of XML has generated much interest in query processing over graph-structured data. In order to facilitate efficient evaluation of path expressions, structured indexes have been proposed. However, most variants of structured indexes ignore global- or interior-document references. They assume a tree-like structure of XML-documents, which do not contain such global-and internal-links. Extending these indexes to work with large XML graphs considering of global- or internal-document links, firstly requires a lot of computing power for the creation process. Secondly, this would also require a great deal of space in which to store the indexes. As a latter demonstrates, the efficient evaluation of ancestors-descendants queries over arbitrary graphs with long paths is indeed a complex issue. This thesis proposes the HID index (2-Hop cover path Index based on DAG) is based on the concept of a two-hop cover for a directed graph. The algorithms proposed for the HID index creation, in effect, scales down the original graph size substantially. As a result, a directed acyclic graph (DAG) with a smaller number of nodes and edges will emerge. This reduces the number of computing steps required for building the index. In addition to this, computing time and space will be reduced as well. The index also permits to efficiently evaluate ancestors-descendants relationships. Moreover, the proposed index has an advantage over other comparable indexes: it is optimized for descendants- or-self queries on arbitrary graphs with link relationship, a task that would stress any index structures. Our experiments with real life XML data show that, the HID index provides better performance than other indexes
Sentiment Analysis on Twitters Big Data Against the Covid- 19 Pandemic Using Machine Learning Algorithms
This paper analyzes users reactions on Twitter to the COVID-19 pandemic, using machine learning and data mining algorithms to classify tweets according to economic and health fears. A large dataset of tweets is explored, extracted, transformed, loaded, cleansed, and analyzed. The proposed framework improves prediction quality with a proposed dictionary that is used to classify tweets. The study compares four supervised machine learning algorithms and finds that people discuss the pandemics dangers from economic and health perspectives with equal frequency. The Naive Bayes algorithm achieves the highest percentage of correct predictions
IBRI-CASONTO: Ontology-based semantic search engine
The vast availability of information, that added in a very fast pace, in the data repositories creates a challenge in extracting correct and accurate information. Which has increased the competition among developers in order to gain access to technology that seeks to understand the intent researcher and contextual meaning of terms. While the competition for developing an Arabic Semantic Search systems are still in their infancy, and the reason could be traced back to the complexity of Arabic Language. It has a complex morphological, grammatical and semantic aspects, as it is a highly inflectional and derivational language. In this paper, we try to highlight and present an Ontological Search Engine called IBRI-CASONTO for Colleges of Applied Sciences, Oman. Our proposed engine supports both Arabic and English language. It is also employed two types of search which are a keyword-based search and a semantics-based search. IBRI-CASONTO is based on different technologies such as Resource Description Framework (RDF) data and Ontological graph. The experiments represent in two sections, first it shows a comparison among Entity-Search and the Classical-Search inside the IBRI-CASONTO itself, second it compares the Entity-Search of IBRI-CASONTO with currently used search engines, such as Kngine, Wolfram Alpha and the most popular engine nowadays Google, in order to measure their performance and efficiency
Efficient evaluation of reachability query for directed acyclic XML graph based on a prime number labelling schema
Many schema labelling approaches have been designed to facilitate querying of XML documents. The proposed algorithms are based on the fact that ancestor–descendant relationships among nodes can be quickly determined. Schema labelling is a family of technologies widely used in indexing tree, graph, or structured XML graph, in which a unique identifier is assigned to each node in the tree/graph. The generated identifier is then used in indexing as a reference to the actual node so that structural relationship among the nodes can be quickly captured. In this paper, we extend the prime number schema labelling algorithm for labelling DAG XML graph. Our main contribution is scaling down the original XML graph size substantially based on the Strongly Connected Component (SCC) principles. Labelling each node in DAG with an integer that is the arithmetical multiplication of the prime number associating with the node and its parent label. The schema does not depend on spanning tree. Thus, subsumption hierarchies represented in a DAG can be efficiently explored by checking the divisibility among the labels. Also, it inherits dynamic update ability and compact size features from its predecessors. Our theoretical analysis and the experimental results showed that the generated labelled schema is an efficient and a scalable one for processing reachability queries on large XML graphs
Dynamic Candidate Solution Boosted Beluga Whale Optimization Algorithm for Biomedical Classification
In many fields, complicated issues can now be solved with the help of Artificial Intelligence (AI) and Machine Learning (ML). One of the more modern Metaheuristic (MH) algorithms used to tackle numerous issues in various fields is the Beluga Whale Optimization (BWO) method. However, BWO has a lack of diversity, which could lead to being trapped in local optimaand premature convergence. This study presents two stages for enhancing the fundamental BWO algorithm. The initial stage of BWO’s Opposition-Based Learning (OBL), also known as OBWO, helps to expedite the search process and enhance the learning methodology to choose a better generation of candidate solutions for the fundamental BWO. The second step, referred to as OBWOD, combines the Dynamic Candidate Solution (DCS) and OBWO based on the k-Nearest Neighbor (kNN) classifier to boost variety and improve the consistency of the selected solution by giving potential candidates a chance to solve the given problem with a high fitness value. A comparison study with present optimization algorithms for single-objective bound-constraint optimization problems was conducted to evaluate the performance of the OBWOD algorithm on issues from the 2022 IEEE Congress on Evolutionary Computation (CEC’22) benchmark test suite with a range of dimension sizes. The results of the statistical significance test confirmed that the proposed algorithm is competitive with the optimization algorithms. In addition, the OBWOD algorithm surpassed the performance of seven other algorithms with an overall classification accuracy of 85.17% for classifying 10 medical datasets with different dimension sizes according to the performance evaluation matrix
A Schematic Analysis on Selective-RDF Database Stores
RDF has gained great interest in both academia and industry as an important language to describe graph data. With the increasing amount of RDF data which is becoming available, efficient and scalable nowadays has become a challenge to achieve the semantic web vision. The RDF model has attracted the attention of the database community and researchers to propose various methods to store and query the RDF data efficiently. However, current RDF database suffer from several problems, like, poor performance behavior for querying RDF data.. This paper provides a comparative analysis made on selective RDF databases storages. It provides a precise study on the various means of having a persistent storage and access of RDF graphs. Recently there has been a major development on initiatives in query processing, access protocols and triple-store technologies. In the evaluation the use of a non- memory and a non-native store Sesame, a native store Allegro graph and Jena API a main-memory based RDF storage system, specifically designed to support fast semantic association discovery. The framework and applications with the ability to store and to query RDF data are analyzed and investigated. Moreover, this paper gives an overview of the features of techniques for storing RDF data and the main purpose of study is to find suitable storage system to store RDF data