1,343 research outputs found

    Web Log Data Analysis: Converting Unstructured Web Log Data into Structured Data Using Apache Pig

    Get PDF
    Data extraction and analysis have recently received significant attention due to the evolution of social media and large volume of data available in an unstructured form. Hadoop and MapReduce have been continuously implementing and analyzing large amount of data. In this paper Apache Pig, which is one of the high-level platform for analyzing large volume of data and runs on the top of Hadoop is used to analyze unstructured log files and extract information. In this paper, weblog server files are used to analyze and extract meaningful information in an unstructured form to a structured form in Apache Pig framework The main purpose of this paper is to extract, transform and load unstructured data in an Apache Pig framework and analyze the data and its performance on local mode as well as MapReduce mode. This paper further explains in brief about the different steps required to analyze unstructured web server log files in Apache Pig. This paper also compares the efficiency when a large volume of data is processed on MapReduce mode and local mode

    Systems For Delivering Electric Vehicle Data Analytics

    Get PDF
    n the recent times, advances in scientific research related to electric vehicles led to generation of large amounts of data. This data is majorly logger data collected from various sensors in the vehicle. It is predominantly unstructured and non-relational in nature, also called Big Data. Analysis of such data needs a high performance information technology infrastructure that provides superior computational efficiency and storage capacity. It should be scalable to accommodate the growing data and ensure its security over a network. This research proposes an architecture built over Hadoop to effectively support distributed data management over a network for real-time data collection and storage, parallel processing, and faster random read access for information retrieval for decision-making. Once imported into a database, the system can support efficient analysis and visualization of data as per user needs. These analytics can help understand correlations between data parameters under various circumstances. This system provides scalability to support data accumulation in the future and still perform analytics with less overhead. Overall, these open problems in EV data analytics are taken into consideration and a low-cost architecture for data management is researched
    corecore