869 research outputs found

    EFFECTIVE AND EFFICIENT WAY OF REDUCE DEPENDENCY ON DATASET WITH THE HELP OF MAPREDUCE ON BIG DATA

    Get PDF
    With the fast development of networks these days organizations has overflowing with the collection of millions of data with big number of combination. This big data challenges over trade troubles. It requires more analysis for the high-performance procedure. The new method of hadoop and MapReduce methods are discussed starting the data mining standpoint. In the proposed research work we have to progress performance through parallelization of different operations such as loading the information, index building and evaluating the queries. Thus the performance analysis is completed with the minimum of three nodes with in the Amazon cloud environment. Hbase is a open source, non-relational and distributed database model. It executes on the pinnacle of Hadoop. It consists of a single key with multiple values. Looping is avoid in retrieving a meticulous data from huge datasets and it consume less amount of time for execute the data. HDFS file system is used to store the data after performing arts the map reduces operations and the execution time is decreased when the amount of nodes gets increased. The performance analysis is tuned with the parameters such as the carrying out complexity

    A Survey on Vertical and Horizontal Scaling Platforms for Big Data Analytics

    Get PDF
    There is no doubt that we are entering the era of big data. The challenge is on how to store, search, and analyze the huge amount of data that is being generated per second. One of the main obstacles to the big data researchers is how to find the appropriate big data analysis platform. The basic aim of this work is to present a complete investigation of all the available platforms for big data analysis in terms of vertical and horizontal scaling, and its compatible framework and applications in detail. Finally, this article will outline some research trends and other open issues in big data analytic

    From Social Data Mining to Forecasting Socio-Economic Crisis

    Full text link
    Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c

    A MapReduce-based nearest neighbor approach for big-data-driven traffic flow prediction

    Full text link
    In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup
    • …
    corecore