3 research outputs found

    Big Data – a step change for SDI?

    Get PDF
    The globally hyped notion of Big Data has increasingly influenced scientific and technical debates about the handling and management of geospatial information. Accordingly, we see a need to recall what has happened over the past years, to present the recent Big Data landscape from an infrastructural perspective and to outline the major implications for the SDI community. We primarily conclude that it would be too simple and naïve to consider only the technological aspects that are underpinning geospatial (web) services. Instead, we request SDI researchers, engineers, providers and consumers to develop new methodologies and capacities for dealing with (geo)spatial information as part of broader knowledge infrastructures

    Designing & Implementing a Java Web Application to Interact with Data Stored in a Distributed File System

    Get PDF
    Every day there is an exponential increase of information and this data must be stored and analyzed. Traditional data warehousing solutions are expensive. Apache Hadoop is a popular open source data store which implements map-reduce concepts to create a distributed database architecture. In this paper, a performance analysis project was devised that compares Apache Hive, which is built on top of Apache Hadoop, with a traditional database such as MySQL. Hive supports HiveQueryLanguage, a SQL like directive language which implements MapReduce jobs. These jobs can then be executed using Hadoop. Hive also has a system catalog – Metastore which is used to index data components. The Hadoop framework is developed to include a duplication detection system which helps managing multiple copies of the same data at the file level. The Java Server Pages and Java Servlet framework were used to build a Java web application to provide a web interface for the clients to access and analyze large data sets present in Apache Hive or MySQL databases
    corecore