112,736 research outputs found

    Discovering Big Data Modelling for Educational World

    Get PDF
    AbstractWith the advancement in internet technology all over the world, the demand for online education is growing. Many educational institutions are offering various types of online courses and e-content. The analytical models from data mining and computer science heuristics help in analysis and visualization of data, predicting student performance, generating recommendations for students as well as teachers, providing feedback to students, identifying related courses, e-content and books, detecting undesirable student behaviours, developing course contents and in planning various other educational activities. Today many educational institutions are using data analytics for improving the services they provide. The data access patterns about students, logged and collected from online educational learning systems could be explored to find informative relationships in the educational world. But a major concern is that the data are exploding, as numbers of students and courses are increasing day by day all over the world. The usage of Big Data platforms and parallel programming models like MapReduce may accelerate the analysis of exploding educational data and computational pattern finding capability. The paper focuses on trial of educational modelling based on Big Data techniques

    Performance Analysis of Hadoop MapReduce And Apache Spark for Big Data

    Get PDF
    In the recent era, information has evolved at an exponential rate. In order to obtain new insights, this information must be carefully interpreted and analyzed. There is, therefore, a need for a system that can process data efficiently all the time. Distributed cloud computing data processing platforms are important tools for data analytics on a large scale. In this area, Apache Hadoop (High-Availability Distributed Object-Oriented Platform) MapReduce has evolved as the standard. The MapReduce job reads, processes its input data and then returns it to Hadoop Distributed Files Systems (HDFS). Although there is limitation to its programming interface, this has led to the development of modern data flow-oriented frameworks known as Apache Spark, which uses Resilient Distributed Datasets (RDDs) to execute data structures in memory. Since RDDs can be stored in the memory, algorithms can iterate very efficiently over its data many times. Cluster computing is a major investment for any organization that chooses to perform Big Data Analysis. The MapReduce and Spark were indeed two famous open-source cluster-computing frameworks for big data analysis. Cluster computing hides the task complexity and low latency with simple user-friendly programming. It improves performance throughput, and backup uptime should the main system fail. Its features include flexibility, task scheduling, higher availability, and faster processing speed. Big Data analytics has become more computer-intensive as data management becomes a big issue for scientific computation. High-Performance Computing is undoubtedly of great importance for big data processing. The main application of this research work is towards the realization of High-Performance Computing (HPC) for Big Data Analysis. This thesis work investigates the processing capability and efficiency of Hadoop MapReduce and Apache Spark using Cloudera Manager (CM). The Cloudera Manager provides end-to-end cluster management for Cloudera Distribution for Apache Hadoop (CDH). The implementation was carried out with Amazon Web Services (AWS). Amazon Web Service is used to configure window Virtual Machine (VM). Four Linux In-stances of free tier eligible t2.micro were launched using Amazon Elastic Compute Cloud (EC2). The Linux Instances were configured into four cluster nodes using Secure Socket Shell (SSH). A Big Data application is generated and injected while both MapReduce and Spark job are run with different queries such as scan, aggregation, two way and three-way join. The time taken for each task to be completed are recorded, observed, and thoroughly analyzed. It was observed that Spark executes job faster than MapReduce

    Actors vs Shared Memory: two models at work on Big Data application frameworks

    Full text link
    This work aims at analyzing how two different concurrency models, namely the shared memory model and the actor model, can influence the development of applications that manage huge masses of data, distinctive of Big Data applications. The paper compares the two models by analyzing a couple of concrete projects based on the MapReduce and Bulk Synchronous Parallel algorithmic schemes. Both projects are doubly implemented on two concrete platforms: Akka Cluster and Managed X10. The result is both a conceptual comparison of models in the Big Data Analytics scenario, and an experimental analysis based on concrete executions on a cluster platform

    Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes

    Full text link
    The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation.Comment: 4 pages, 6 figures, 4 tables; XIIth International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT 2017), Lviv, Ukrain

    Cloud Computing Solution for Monitoring Arid Rangeland Dynamics: Case of Moroccan Highlands and Southern Acacia Ecosystems

    Get PDF
    The wide availability of free satellite imagery, the recent development of cloud platforms dedicated to big spatial data (Big Data) that integrates both image archives from different providers, processing algorithms, distributed processing capabilities as well as an application programming interface (API) that facilitate scripting and automation process opened new perspectives for the use of vegetation observation time series over long timestamps and over large spatial scales (almost planetary). This work aims at harnessing these technologies and building up an automated solution to monitor rangeland rehabilitation dynamics in arid lands and to assess the effectiveness of stakeholder’s management strategies. Such solution is based on graphical user interface that facilitate the process and on the use of analysis functions relaying on analysing temporal trajectories (time series) of different spectral indices derived from satellite images (Landsat or Sentinel) at the required spatial analysis scale. The solution is implemented using java script as scripting language using the functions offered by GEE API. The graphical user interface of the first prototype is exploitable by the means of a standard web browser and it is accessible even to people without any background in regard to programming languages or to remote sensing skills. The process was tested for two arid sites on Morocco: acacia ecosystems on the southern part of Morocco and the highlands on Moroccan eastern parts mainly on sites recently rehabilitated. It has been qualified is promising solution
    • …
    corecore