4 research outputs found

    Framework for a semantic data transformation in solving data quality issues in big data

    Get PDF
    Purpose - Today organizations and companies are generating a tremendous amount of data.At the same time, an enormous amount of data is being received and acquired from various resources and being stored which brings us to the era of Big Data (BD). BD is a term used to describe massive datasets that are of diverse format created at a very high speed, the management of which is near impossible by using traditional database management systems (Kanchi et al., 2015). With the dawn of BD, Data Quality (DQ) has become very imperative.Volume, velocity and variety – the initial 3Vs characteristics of BD are usually used to describe the main properties of BD.But for extraction of value (which is another V property) and make BD effective and efficient for organizational decision making, the significance of another V of BD, veracity, is gradually coming to light. Veracity straightly denotes inconsistency and DQ issues.Today, veracity in data analysis is the biggest challenge when compared to other aspects such as volume and velocity. Trusting the data acquired goes a long way in implementing decisions from an automated decision making system and veracity helps to validate the data acquired (Agarwal, Ravikumar, & Saha, 2016).DQ represents an important issue in every business.To be successful, companies need high-quality data on inventory, supplies, customers, vendors and other vital enterprise information in order to run efficiently their data analysis applications (e.g. decision support systems, data mining, customer relationship management) and produce accurate results (McAfee & Brynjolfsson, 2012).During the transformation of huge volume of data, there might exist data mismatch, miscalculation and/or loss of useful data that leads to an unsuccessful data transformation (Tesfagiorgish, & JunYi, 2015) which will in turn leads to poor data quality. In addition of external data, particularly RDF data, increase some challenges for data transformation when compared with the traditional transformation process. For example, the drawbacks of using BD in the business analysis process is that the data is almost schema less, and RDF data contains poor or complex schema. Traditional data transformation tools are not able to process such inconsistent and heterogeneous data because they do not support semantic-aware data, they are entirely schema-dependent and they do not focus on expressive semantic relationships to integrate data from different sources.Thus, BD requires more powerful tools to transform data semantically. While the research on this area so far offer different frameworks, to the best of the researchers knowledge, not much research has been done in relation to transformation of DQ in BD. The much that has been done has not gone beyond cleansing incoming data generally (Merino et al., 2016).The proposed framework presents the method for the analysis of DQ using BD from various domains and applying semantic technologies in the ETL transformation stage to create a semantic model for the enablement of quality in the data

    An Online Social Network model through Twitter to build a social perception variable to measure the violence in Mexico

    Get PDF
    This paper describes the methodology and the model that used in Twitter to create an indicator that allows us to denote a social perception about violence, a topic of high impact in Mexico. We investigated and validated the keywords that Mexicans used related to this topic, in a specific time-lapse defined by the researchers. We implemented two analysis levels, the first one relative to the sum of tweets, and the second one with a rate of total tweets per 100,000 inhabitanThis paper describes the methodology and the model that used in Twitter to create an indicator that allows us to denote a social perception about violence, a topic of high impact in Mexico. We investigated and validated the keywords that Mexicans used related to this topic, in a specific time-lapse defined by the researchers. We implemented two analysis levels, the first one relative to the sum of tweets, and the second one with a rate of total tweets per 100,000 inhabita

    DTRM: A new reputation mechanism to enhance data trustworthiness for high-performance cloud computing

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this record.Cloud computing and the mobile Internet have been the two most influential information technology revolutions, which intersect in mobile cloud computing (MCC). The burgeoning MCC enables the large-scale collection and processing of big data, which demand trusted, authentic, and accurate data to ensure an important but often overlooked aspect of big data - data veracity. Troublesome internal attacks launched by internal malicious users is one key problem that reduces data veracity and remains difficult to handle. To enhance data veracity and thus improve the performance of big data computing in MCC, this paper proposes a Data Trustworthiness enhanced Reputation Mechanism (DTRM) which can be used to defend against internal attacks. In the DTRM, the sensitivity-level based data category, Metagraph theory based user group division, and reputation transferring methods are integrated into the reputation query and evaluation process. The extensive simulation results based on real datasets show that the DTRM outperforms existing classic reputation mechanisms under bad mouthing attacks and mobile attacks.This work was supported by the National Natural Science Foundation of China (61602360, 61772008, 61472121), the Pilot Project of Fujian Province (formal industry key project) (2016Y0031), the Foundation of Science and Technology on Information Assurance Laboratory (KJ-14-109) and the Fujian Provincial Key Lab of Network Security and Cryptology Research Fund (15012)

    Information System Articulation Development - Managing Veracity Attributes and Quantifying Relationship with Readability of Textual Data

    Get PDF
    Often the textual data are either disorganized or misinterpreted because of unstructured Big Data in multiple dimensions. Managing readable textual alphanumeric data and its analytics is challenging. In spatial dimensions, the facts can be ambiguous and inconsistent, posing interpretation and new knowledge discovery challenges. The information can be wordy, erratic, and noisy. The research aims to assimilate the data characteristics through Information System (IS) artefacts that are appropriate to data analytics, especially in application domains that involve big data sources. Data heterogeneity and multidimensionality can make and preclude IS-guided veracity models in the data integration process, including customer analytics services. The veracity of big data thus can impact visualization and value, including knowledge enhancement in the vast amount of textual data qualitatively. The manner the veracity features construed in each schematic, semantic and syntactic attribute dimension in several IS artefacts and relevant documents can enhance the readability of textual data robustly
    corecore