7 research outputs found

    SenseCluster for exploring large data repositories

    Get PDF
    Exploring and making sense of large data repositories has become a daunting task. This is especially the case for end users who often have limited access to the data due to the complexity of the retrieval process and limited availability of IT support for developing custom queries and reports based on the data. Consequently, traditional interfaces are no longer meeting these requirements. Instead, novel interfaces are required to fully support the sense making process. In this paper, we followed a design science approach and introduced a query clustering system (Sense Cluster) that could serve as a quick exploration tool for making better sense of large data repositories. We also present an evaluation of the effectiveness of our artifact using cognitive walkthroughs

    Join query enhancement processing (jqpro) with big rdf data on a distributed system using hashing-merge join technique

    Get PDF
    Semantic web technologies have emerged in the last few years across different fields of study and their data are still growing rapidly. Specifically, the increased data storage and publishing capabilities in standard open web formats have made the technology much more successful. So, the data have become readable by humans, and they can be processed on a computer. The demand for complex multiple RDF queries is becoming significant with the increasing number of RDF triples. Such complex queries occasionally produce many common subexpressions. It is therefore extremely challenging to reduce the amount of RDF queries and transmission time for a vast number of related RDF data. Moreover, Recent literature shows that join query processing of Big RDF data has introduced many problems with respect to execution time and throughput. The hash-based encoding induces low execution time, which takes a long time to load and hence does not load all graphs. This is because the Resource Description Framework (RDF) collects and analyses large data in swarms, thereby having to deal with the inherent challenge of efficient swarm storage. The effective storage and data retrieval, which could be applied to high amounts of possible schema-less data, has also proven exceedingly difficult for RDF data storage. For instance, it is particularly difficult to view semantic and SPARQL query languages, as well as huge and complex graph patterns. To address this problem, a Join Query Processing Model (JQPro) is introduced for Big RDF data. The objectives of this research are: (i) formulate plan generator algorithms for join query processing on the basis of the previous research. (ii) develop an enhancement model of Join Query Processing (JQPro) based on SPARQL and Hadoop MapReduce using hashing-merge join technique to process Big RDF Data. (iii) evaluate and compare the performance based on the execution time, throughput, and CPU utilization of the JQPro model with existing models. On the other hand, the throughput was employed to measure the units of information that a system can process in each time frame. In addition, the CPU utilization was used in the big join query processing as an important resource element particularly during the map, to reduce phases. Furthermore, the hash-join and Sort-Merge algorithms were used to generate the join query processing, and this was employed due to their capacity to allow for more data sets to be joined. Both processes were sorted by algorithms on join attributes and the sorted relations was merged. Therefore, the join column sorted the groups of datasets with the same value. The sort–merge–join algorithm sorts the datasets on the joining attribute and then searches for tuples by merging the two datasets. Then, a processing framework for RDF queries was introduced and the benchmark was used for performance evaluation. Finally, the validation was conducted by standard statistical analysis to validate and compare the performance of the JQPro model with current models. In addition, the synthetic benchmarks Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) v06 were used for measurement. The experiment was carried out on three datasets ranging from 10 million to 1 billion RDF triples produced by the generator of WatDiv data with a scale factor of 10, 100 and 1000, respectively. A selective dataset for each experimental query was also used for the processing of RDFs with a LUBM benchmark in sizes 500, 1000 and 2000 million triples. The result revealed that there is a strong correlation between execution time and throughput with a strength of 99.9% percent as confirmed by the Pearson correlation coefficient. Furthermore, the findings show that the JQPro solution was comparable to gStore RDF-3X, RDFox and PARJ and the percentage of improved performance was 87.77% in terms of execution time. The CPU utilization was significantly increased by extensive mapping and reduced code computing. It is therefore inferred that the JQPro solution is timely and innovative, as it provides an efficient execution time and CPU utilization where users could perform better queries for Big RDF data processing in a seamless manne

    JQPro : Join query processing in a distributed system for big RDF data using the hash-merge join technique

    Get PDF
    In the last decade, the volume of semantic data has increased exponentially, with the number of Resource Description Framework (RDF) datasets exceeding trillions of triples in RDF repositories. Hence, the size of RDF datasets continues to grow. However, with the increasing number of RDF triples, complex multiple RDF queries are becoming a significant demand. Sometimes, such complex queries produce many common sub-expressions in a single query or over multiple queries running as a batch. In addition, it is also difficult to minimize the number of RDF queries and processing time for a large amount of related data in a typical distributed environment encounter. To address this complication, we introduce a join query processing model for big RDF data, called JQPro. By adopting a MapReduce framework in JQPro, we developed three new algorithms, which are hash-join, sort-merge, and enhanced MapReduce-join for join query processing of RDF data. Based on an experiment conducted, the result showed that the JQPro model outperformed the two popular algorithms, gStore and RDF-3X, with respect to the average execution time. Furthermore, the JQPro model was also tested against RDF-3X, RDFox, and PARJs using the LUBM benchmark. The result showed that the JQPro model had better performance in comparison with the other models. In conclusion, the findings showed that JQPro achieved improved performance with 87.77% in terms of execution time. Hence, in comparison with the selected models, JQPro performs better

    Étude comparative de l’utilisabilité de trois interfaces de formulation de requêtes à une base de données relationnelle

    Get PDF
    RÉSUMÉ : L’utilisation de bases de données relationnelles est une activité complexe pour la grande majorité des utilisateurs et cela ne fait pas exception dans une entreprise de technologie financière comme Croesus Finansoft. De nombreux chercheurs et développeurs se sont intéressés à cette activité ces dernières années pour tenter de réduire la complexité et d’améliorer l’utilisabilité des interfaces utilisateurs des systèmes de requêtes à une base de données relationnelle. L’objectif du projet de recherche est de développer une solution d’interface utilisateur permettant aux utilisateurs d’un logiciel de gestion de portefeuilles de créer plus facilement des requêtes à une base de données qu’avec les interfaces actuelles. La recherche a été effectuée en collaboration avec Croesus Lab. qui oeuvre dans le domaine de la technologie financière et offre des produits logiciels aux grandes institutions financières canadiennes. Elle comporte une étude des problèmes rencontrés par les utilisateurs avec les interfaces d’un logiciel de gestion de portefeuille comprenant une interface de critères de recherche simplifiée basée sur une approche par filtres combinée avec une approche de langage semi-naturel sous forme de phrases structurées rigides et une interface de critères de recherche avancée basée sur une approche par filtres dans un formulaire flexible présenté sous forme hiérarchique. La recherche a aussi mené à la conception de maquettes puis d’un prototype d’interface utilisateur basée sur une approche de formulaire flexible de formulation de requêtes par diagramme. Trois cycles de tests d’utilisabilité visant à tester les maquettes et le prototype et à évaluer les résultats en termes de performance et de satisfaction humaine ont été effectués durant le processus de conception. À chaque cycle de tests, la chercheure a observé soigneusement les comportements des utilisateurs. De plus, elle a mené des entrevues semi-dirigées pour recueillir les commentaires et les impressions des utilisateurs. Ces informations ont permis d’améliorer les maquettes. Dix et 12 sujets ont participé respectivement au premier et au second cycle de tests d’utilisabilité portant sur les maquettes. Par la suite, le prototype fonctionnel d’interface a été développé et un troisième cycle de tests d’utilisabilité a été effectué avec celui-ci auprès de six sujets. Pour finir, nous avons mené une étude expérimentale comparative visant à tester trois hypothèses (H) stipulant que la nouvelle interface du prototype donne de meilleurs résultats que les deux interfaces actuelles en termes de temps d’exécution des tâches (H1), de nombre d’erreurs (H2) et de nombre de demandes d’assistance (H3) de la part des utilisateurs. Douze utilisateurs connaissant déjà les deux interfaces actuelles du logiciel de gestion de portefeuilles ont participé aux tests ; chacun devait effectuer trois tâches de création de requêtes avec chacune des trois interfaces. De façon générale, les commentaires recueillis auprès des utilisateurs au sujet du prototype de la nouvelle interface (formulaire flexible de formulation de requêtes par diagramme) sont positifs : l’interface est considérée comme plus intuitive, elle nécessite moins d’opérations pour formuler les différentes conditions des requêtes et l’utilisateur est guidé pour l’ajout d’opérateurs logiques ET/OU. Les résultats montrent aussi que même si les utilisateurs n’ont aucune expérience avec la nouvelle interface contrairement aux deux autres interfaces, la durée de la formulation de requêtes tend à être plus courte avec la nouvelle interface pour toutes les tâches effectuées : toutefois, une seule (sur trois) offre une différence significative en termes statistiques entre la nouvelle interface et l’interface des critères de recherche avancée. L’hypothèse H1 est donc partiellement confirmée, pour une interface sur deux et pour une tâche sur trois). De plus, la nouvelle interface a permis de réduire le nombre d’erreurs de façon significative par rapport à l’interface des critères de recherche avancée pour l’une des trois tâches. L’hypothèse H2 est partiellement confirmée (pour une interface sur deux, et pour une tâche sur trois). Enfin, il n’y a pas de différence significative entre les deux interfaces actuelles et le prototype de formulaire flexible de formulation de requêtes par diagramme quant au nombre de demandes d’assistance à la tâche. L’hypothèse H3 est donc infirmée. Dans la discussion, nous tentons d’expliquer ces résultats et dans la conclusion, nous proposons quelques pistes de recherches pour la suite.----------ABSTRACT : Exploration of relational databases is a complex activity for most people and database users of financial technology software like Croesus Finansoft are no exception. Lots of research has been done in the late years to reduce the complexity and improve the usability of interfaces used to create relational database requests. The goal of this research is to develop a user interface solution, allowing portfolio management software users to create relational database requests in an easier way than with the actual interfaces. The research has been done in collaboration with Croesus Lab. which works in financial technology field and offers software products to Canadians financial institutions. It includes a study of the problems encountered by the portfolio management software users offering a simplified search criteria interface based on a filter approach combined with a rigid semi-natural language approach and an advanced search criteria interface based on a filter approach presented in a hierarchical flexible form. The research has led to the conception of a mock-up and an interface prototype based on an approach of a flexible form or query formulation by diagram. Three cycles of usability tests were done to test the mock-ups and the prototype and to evaluate the results in terms of performance and human satisfaction has been done during the conception process. In every meeting between the searcher and a subject, the searcher has taken notes about her observations of the subject’s behaviour. At the end of each meeting, an interview was conducted by the searcher to collect feedback from the subjects. The information has been used to improve the mock-ups. Ten and 12 subjects have participated respectively to the first and second usability test cycles on mock-ups. Then, the functional prototype has been developed and a third cycle of usability tests has been done with six subjects. An experimental comparative study has been conducted to validate three hypotheses (H) saying that the new interface prototype gives better results than the two actual interfaces in terms of tasks execution time (H1), the number of errors (H2) and the amount of assistance demands (H3) from users. Twelve users already knowing the two actual interfaces have participated in the tests; each of them had to perform three tasks of requests creation with each of the three interfaces. In general, the collected comments on the new interface prototype (interface based on an approach of a flexible form or query formulation by diagram) are positive : the interface is considered more intuitive, it requires fewer operations to formulate the different request’s conditions and the user is guided during the insertion of logical operators AND/OR. The results show that even if the user has no experience with the new interface, the request formulation duration tends to be shorter with the new interface for all the tasks : however, only one task (on three) offers a significant difference in statistic terms between the new interface and the advanced search criteria. The hypothesis H1 is partially confirmed for one interface out of two and one task out of the three. The new interface as significantly reduce the number of errors compared to the advanced search criteria interface for one task out of the three. The hypothesis H2 is partially confirmed (for one interface out of two and for one task out of the three). There is no significant difference between the two actual interfaces and the new interface in terms of the amount of assistance demands. The hypothesis H3 is informed. In the discussion, we try to explain the results and the conclusion. New search fields are proposed for the future
    corecore