4 research outputs found

    Technology Selection for Big Data and Analytical Applications

    Get PDF
    The term Big Data has become pervasive in recent years, as smart phones, televisions, washing machines, refrigerators, smart meters, diverse sensors, eyeglasses, and even clothes connect to the Internet. However, their generated data is essentially worthless without appropriate data analytics that utilizes information retrieval, statistics, as well as various other techniques. As Big Data is commonly too big for a single person or institution to investigate, appropriate tools are being used that go way beyond a traditional data warehouse and that have been developed in recent years. Unfortunately, there is no single solution but a large variety of different tools, each of which with distinct functionalities, properties and characteristics. Especially small and medium-sized companies have a hard time to keep track, as this requires time, skills, money, and specific knowledge that, in combination, result in high entrance barriers for Big Data utilization. This paper aims to reduce these barriers by explaining and structuring different classes of technologies and the basic criteria for proper technology selection. It proposes a framework that guides especially small and mid-sized companies through a suitable selection process that can serve as a basis for further advances

    Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects

    Get PDF
    Companies are faced with the challenge of handling increasing amounts of digital data to run or improve their business. Although a large set of technical solutions are available to manage such Big Data, many companies lack the maturity to manage that kind of projects, which results in a high failure rate. This paper aims at providing better process guidance for a successful deployment of Big Data projects. Our approach is based on the combination of a set of methodological bricks documented in the literature from early data mining projects to nowadays. It is complemented by learned lessons from pilots conducted in different areas (IT, health, space, food industry) with a focus on two pilots giving a concrete vision of how to drive the implementation with emphasis on the identification of values, the definition of a relevant strategy, the use of an Agile follow-up and a progressive rise in maturity

    Join query enhancement processing (jqpro) with big rdf data on a distributed system using hashing-merge join technique

    Get PDF
    Semantic web technologies have emerged in the last few years across different fields of study and their data are still growing rapidly. Specifically, the increased data storage and publishing capabilities in standard open web formats have made the technology much more successful. So, the data have become readable by humans, and they can be processed on a computer. The demand for complex multiple RDF queries is becoming significant with the increasing number of RDF triples. Such complex queries occasionally produce many common subexpressions. It is therefore extremely challenging to reduce the amount of RDF queries and transmission time for a vast number of related RDF data. Moreover, Recent literature shows that join query processing of Big RDF data has introduced many problems with respect to execution time and throughput. The hash-based encoding induces low execution time, which takes a long time to load and hence does not load all graphs. This is because the Resource Description Framework (RDF) collects and analyses large data in swarms, thereby having to deal with the inherent challenge of efficient swarm storage. The effective storage and data retrieval, which could be applied to high amounts of possible schema-less data, has also proven exceedingly difficult for RDF data storage. For instance, it is particularly difficult to view semantic and SPARQL query languages, as well as huge and complex graph patterns. To address this problem, a Join Query Processing Model (JQPro) is introduced for Big RDF data. The objectives of this research are: (i) formulate plan generator algorithms for join query processing on the basis of the previous research. (ii) develop an enhancement model of Join Query Processing (JQPro) based on SPARQL and Hadoop MapReduce using hashing-merge join technique to process Big RDF Data. (iii) evaluate and compare the performance based on the execution time, throughput, and CPU utilization of the JQPro model with existing models. On the other hand, the throughput was employed to measure the units of information that a system can process in each time frame. In addition, the CPU utilization was used in the big join query processing as an important resource element particularly during the map, to reduce phases. Furthermore, the hash-join and Sort-Merge algorithms were used to generate the join query processing, and this was employed due to their capacity to allow for more data sets to be joined. Both processes were sorted by algorithms on join attributes and the sorted relations was merged. Therefore, the join column sorted the groups of datasets with the same value. The sort–merge–join algorithm sorts the datasets on the joining attribute and then searches for tuples by merging the two datasets. Then, a processing framework for RDF queries was introduced and the benchmark was used for performance evaluation. Finally, the validation was conducted by standard statistical analysis to validate and compare the performance of the JQPro model with current models. In addition, the synthetic benchmarks Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) v06 were used for measurement. The experiment was carried out on three datasets ranging from 10 million to 1 billion RDF triples produced by the generator of WatDiv data with a scale factor of 10, 100 and 1000, respectively. A selective dataset for each experimental query was also used for the processing of RDFs with a LUBM benchmark in sizes 500, 1000 and 2000 million triples. The result revealed that there is a strong correlation between execution time and throughput with a strength of 99.9% percent as confirmed by the Pearson correlation coefficient. Furthermore, the findings show that the JQPro solution was comparable to gStore RDF-3X, RDFox and PARJ and the percentage of improved performance was 87.77% in terms of execution time. The CPU utilization was significantly increased by extensive mapping and reduced code computing. It is therefore inferred that the JQPro solution is timely and innovative, as it provides an efficient execution time and CPU utilization where users could perform better queries for Big RDF data processing in a seamless manne

    Factors That Drive the Selection of Business Intelligence Tools in South African Financial Services Providers

    Get PDF
    Innovation and technology advancements in information systems (IS) result in multiple product offerings and business intelligence (BI) software tools in the market to implement business intelligence systems (BIS). As a result, a high proportion of organisations fail to employ appropriate and suitable software tools meeting organisational needs, resulting in a prime number of BI solution failures and abandoned projects are therefore recorded. Due to such project failures, benefits associated with BI are not realised hence organisations loose enormous investments on BI solutions and competitive advantage. The study aims at discovering and exploring critical factors influencing the selection of BI tools when embarking on the selection process. This is a quantitative research study and questionnaire surveyed data was collected from 92 participants working in South African financial services providers listed on the Johannesburg Stock Exchange (JSE) appearing in the top 100 based on market capitalization. The data was analysed quantitative by employing the use of SPSS and SmartPLS-3 software's to test the significance of influential factors using the proposed conceptual model that emerged from the literature. The findings showed that a combination of domain technical and non-technical factors is critical. Therefore, software tool technical factors (functionality, ease of use, compatibility, availability of an integrated hardware/software package, and availability of source code), vendor technical factors (availability of technical support, technical skills, quality of product, availability of user manual for important information, tutorial for learning and troubleshooting guide, and experience in using product developed by the same vendor), and opinion non-technical factors (end-users, subordinates, outside personnel acquaintances, and improvement in customer service) emerged as significant combination of influential factors to be considered. The study contributes to both academia and industry by providing influential determinants for software tool selection. It is hoped that the findings presented will contribute to a greater understanding of factors influencing the selection of BI tools to researchers and practitioners alike. Furthermore, organisations seeking to select and deliver appropriate BI tools will be better equipped to drive such endeavours
    corecore