112,736 research outputs found
Recommended from our members
Geosocial Big Data Analysis Using Python and FOSS4G with the Case Study of Korean Data
Nowadays, there are many researches on the analysis of Geosocial big data, such as geotweeet and as foursquare venues and OSS(Open Source Software) has an important role on this. In the analyzing geosocial big data, there are several different steps such as data collection, data parsing, data conversion, statistical analysis, visualizing and database management. So, the integrated system architecture and the compatible analysis environment has a key role to acquire the relevant analysis results. The Python programming support the interoperable analysis environment for the various and different software functions and enable to process for geosocial big data in the integrated platforms. FOSS4G support software environment for geovisualization and data management for the collected data. In this study, the way and process of geosocial big data analysis is introduced with case study of geotweet and foursquare venues and the analysis results are presented with the case study of Korean data. For this study, Python API libraries for tweeter(tweepy) and foursquare(pyforsquare) used to collect the geosocial data, and Pandas and Simplejson are used to parse and extract the valid data, and GDAL and PySAL are used to convert and analyze for GIS data. PyTagCloud and WordCloud are used to visualize the qualitative text. MongoDB is used to store the collected dataset and QGIS are applied for the geovisualization
Discovering Big Data Modelling for Educational World
AbstractWith the advancement in internet technology all over the world, the demand for online education is growing. Many educational institutions are offering various types of online courses and e-content. The analytical models from data mining and computer science heuristics help in analysis and visualization of data, predicting student performance, generating recommendations for students as well as teachers, providing feedback to students, identifying related courses, e-content and books, detecting undesirable student behaviours, developing course contents and in planning various other educational activities. Today many educational institutions are using data analytics for improving the services they provide. The data access patterns about students, logged and collected from online educational learning systems could be explored to find informative relationships in the educational world. But a major concern is that the data are exploding, as numbers of students and courses are increasing day by day all over the world. The usage of Big Data platforms and parallel programming models like MapReduce may accelerate the analysis of exploding educational data and computational pattern finding capability. The paper focuses on trial of educational modelling based on Big Data techniques
Performance Analysis of Hadoop MapReduce And Apache Spark for Big Data
In the recent era, information has evolved at an exponential rate. In order to obtain new insights, this information must be carefully interpreted and analyzed. There is, therefore, a need for a system that can process data efficiently all the time. Distributed cloud computing data processing platforms are important tools for data analytics on a large scale. In this area, Apache Hadoop (High-Availability Distributed Object-Oriented Platform) MapReduce has evolved as the standard. The MapReduce job reads, processes its input data and then returns it to Hadoop Distributed Files Systems (HDFS). Although there is limitation to its programming interface, this has led to the development of modern data flow-oriented frameworks known as Apache Spark, which uses Resilient Distributed Datasets (RDDs) to execute data structures in memory. Since RDDs can be stored in the memory, algorithms can iterate very efficiently over its data many times.
Cluster computing is a major investment for any organization that chooses to perform Big Data Analysis. The MapReduce and Spark were indeed two famous open-source cluster-computing frameworks for big data analysis. Cluster computing hides the task complexity and low latency with simple user-friendly programming. It improves performance throughput, and backup uptime should the main system fail. Its features include flexibility, task scheduling, higher availability, and faster processing speed. Big Data analytics has become more computer-intensive as data management becomes a big issue for scientific computation. High-Performance Computing is undoubtedly of great importance for big data processing. The main application of this research work is towards the realization of High-Performance Computing (HPC) for Big Data Analysis.
This thesis work investigates the processing capability and efficiency of Hadoop MapReduce and Apache Spark using Cloudera Manager (CM). The Cloudera Manager provides end-to-end cluster management for Cloudera Distribution for Apache Hadoop (CDH). The implementation was carried out with Amazon Web Services (AWS). Amazon Web Service is used to configure window Virtual Machine (VM). Four Linux In-stances of free tier eligible t2.micro were launched using Amazon Elastic Compute Cloud (EC2). The Linux Instances were configured into four cluster nodes using Secure Socket Shell (SSH).
A Big Data application is generated and injected while both MapReduce and Spark job are run with different queries such as scan, aggregation, two way and three-way join. The time taken for each task to be completed are recorded, observed, and thoroughly analyzed. It was observed that Spark executes job faster than MapReduce
Actors vs Shared Memory: two models at work on Big Data application frameworks
This work aims at analyzing how two different concurrency models, namely the
shared memory model and the actor model, can influence the development of
applications that manage huge masses of data, distinctive of Big Data
applications. The paper compares the two models by analyzing a couple of
concrete projects based on the MapReduce and Bulk Synchronous Parallel
algorithmic schemes. Both projects are doubly implemented on two concrete
platforms: Akka Cluster and Managed X10. The result is both a conceptual
comparison of models in the Big Data Analytics scenario, and an experimental
analysis based on concrete executions on a cluster platform
Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes
The basic features of some of the most versatile and popular open source
frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are
considered and compared. Their comparative analysis was performed and
conclusions were made as to the advantages and disadvantages of these
platforms. The performance tests for the de facto standard MNIST data set were
carried out on H2O framework for deep learning algorithms designed for CPU and
GPU platforms for single-threaded and multithreaded modes of operation.Comment: 4 pages, 6 figures, 4 tables; XIIth International Scientific and
Technical Conference on Computer Sciences and Information Technologies (CSIT
2017), Lviv, Ukrain
Cloud Computing Solution for Monitoring Arid Rangeland Dynamics: Case of Moroccan Highlands and Southern Acacia Ecosystems
The wide availability of free satellite imagery, the recent development of cloud platforms dedicated to big spatial data (Big Data) that integrates both image archives from different providers, processing algorithms, distributed processing capabilities as well as an application programming interface (API) that facilitate scripting and automation process opened new perspectives for the use of vegetation observation time series over long timestamps and over large spatial scales (almost planetary).
This work aims at harnessing these technologies and building up an automated solution to monitor rangeland rehabilitation dynamics in arid lands and to assess the effectiveness of stakeholder’s management strategies. Such solution is based on graphical user interface that facilitate the process and on the use of analysis functions relaying on analysing temporal trajectories (time series) of different spectral indices derived from satellite images (Landsat or Sentinel) at the required spatial analysis scale.
The solution is implemented using java script as scripting language using the functions offered by GEE API. The graphical user interface of the first prototype is exploitable by the means of a standard web browser and it is accessible even to people without any background in regard to programming languages or to remote sensing skills. The process was tested for two arid sites on Morocco: acacia ecosystems on the southern part of Morocco and the highlands on Moroccan eastern parts mainly on sites recently rehabilitated. It has been qualified is promising solution
- …