University of Zagreb. Faculty of Science. Department of Mathematics.
Abstract
U ovom radu je opisan način rada i primjena Apache Hadoop-a i njegovih komponenti. Najvažnija komponenta je MapReduce koja ima sve veću primjenu. Da bismo mogli koristiti MapReduce algoritam potrebno je razumjeti njegov način rada, te naučiti neka pravila za pisanje samog algoritma kao što je korištenje combiner funkcije. Da bi u potpunosti razumjeli koncept Hadoop-a, objašnjeni su pojmovi Flume, Hive, HDFS, te Oozie. Primjena Hadoop-a i MapReduce-a je pokazana na analizi društvene mreže Twitter. Podaci su prikupljani prema određenim uvjetima pomoću Apache Flume-a, zatim su obrađeni s Oozie-em, a upiti nad njima su izvršeni pomoću Hive-a.This thesis describes the operation and use of Apache Hadoop and its components. The most important component is MapReduce. To use MapReduce algorithm it is necessary to understand its mode of operation, and learn some of the rules for writing the algorithm as well as the use of the combiner function. In order to fully understand the concept of Hadoop, the following concepts are explained: Flume, Hive, HDFS and Oozie. Use of Hadoop and MapReduce is shown in the analysis of social network Twitter. Data were collected according to certain conditions using Apache Flume, then they were processed with Oozie-operation and queried using Hive