15 research outputs found

    Improved Approaches to Handle Bigdata through Hadoop

    Get PDF
    Big data is an evolving term that describes any voluminous amount of structured, semistructured and unstructured data that has the potential to be mined for information.Today2019;s world produces a large amount of data from various sources, records and from different fields termed as 201C;BIG DATA201D;. Such huge data is to be analyzed, and filtered using various techniques and algorithms to extract the interested and useful data to gain knowledge. In the new era with the boom of both structured and unstructured types of data, in the field of genomics, meteorology, biology, environmental research and many others, it has become difficult to process, manage and analyze patterns using traditional databases and architectures. It requires new technologies and skills to analyze the flow of material and draw conclusions. So, a proper architecture should be understood to gain knowledge about the Big Data. The analysis of Big Data involves multiple distinct phases such as collection, extraction, cleaning, analysis and retrieval

    Analysis and Exploring of different recent trends in processing of Big data

    Get PDF
    Big Data is a whole new concept in information technology driven computing world that manages large and complex datasets through certain data mining processing tools and functional models. Research suggests that tapping the potential of this data can benefit businesses, scientific disciplines and the public sector – contributing to their economic gains as well as development in every sphere. The need is to develop efficient systems that can exploit this potential to the maximum, keeping in mind the current challenges associated with its analysis, structure, scale, timeliness and privacy As we all know the process of data mining is knowledge driven discovery process through which we can extract relevant information about a particular matter of subject. Big size datasets have numerous velocity and volume so it demands research techniques to extract useful information. As the complexity of data increases, there is more challenging work for us to implement big data methodologies. Enterprises face the challenge of processingthese huge chunks of data, and have found that none of the existing centralized architectures can efficiently handle this huge volume of data. This research highlights the concept of Big Data used in datasets, its present scenario place in industry, hidden truths behind development of big data and what will be the future of big data in coming years

    Challenges for MapReduce in Big Data

    Get PDF
    In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research

    Neutrosophic association rule mining algorithm for big data analysis

    Get PDF
    Big Data is a large-sized and complex dataset, which cannot be managed using traditional data processing tools. Mining process of big data is the ability to extract valuable information from these large datasets. Association rule mining is a type of data mining process, which is indented to determine interesting associations between items and to establish a set of association rules whose support is greater than a specific threshold

    CHALLENGES IN THE DEPLOYMENT AND OPERATION OF MACHINE LEARNING IN PRACTICE

    Get PDF
    Machine learning has recently emerged as a powerful technique to increase operational efficiency or to develop new value propositions. However, the translation of a prediction algorithm into an operationally usable machine learning model is a time-consuming and in various ways challenging task. In this work, we target to systematically elicit the challenges in deployment and operation to enable broader practical dissemination of machine learning applications. To this end, we first identify relevant challenges with a structured literature analysis. Subsequently, we conduct an interview study with machine learning practitioners across various industries, perform a qualitative content analysis, and identify challenges organized along three distinct categories as well as six overarching clusters. Eventually, results from both literature and interviews are evaluated with a comparative analysis. Key issues identified include automated strategies for data drift detection and handling, standardization of machine learning infrastructure, and appropriate communication and expectation management

    Dağıtık veri yönetim ve işleme mimarisi kullanılarak maikine öğrenmesi uygulamaları gerçekleştirilmesi

    Get PDF
    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Her geçen gün hayatımızda daha çok yer edinen teknolojinin gelişimi ile birlikte, üretilen ve dolayısıyla depolanma ve analiz gerekliliğini beraberinden getiren verilerin bilinen yöntemlerle yönetilmesi ve işlenmesi neredeyse imkânsız hale gelmektedir. Hem veri boyutunda hem de veri çeşitliliğinde artış, bu bağlamda yeni yöntemlerin geliştirilmesini zorunlu hale getirmiştir. Bu tez çalışmasında geleneksel yöntemlerle işlenemeyecek boyut ve çeşitlilikteki veriler için geliştirilmiş olan dağıtık veri yönetim ve analiz araçları kullanılarak makine öğrenmesi uygulamaları geliştirilmektedir. Uygulamalar Google Cloud hizmeti kullanılarak oluşturulmuş Spark kümesi üzerinde pyspark kütüphaneleri kullanılarak gerçekleştirilmektedir. Bu tez çalışmasında iki farklı veri seti kullanılarak makine öğrenmesi uygulamaları gerçekleştirilmektedir. Uygulama-1'de kablosuz sensörlerden elde edilmiş hareket verileri kullanılarak Lojistik Regresyon sınıflandırma algoritması ile makine öğrenmesi uygulaması geliştirilmektedir. Uygulamanın çalıştırılması esnasında kümedeki kaynakların kullanımları gözlenmektedir. Uygulama-2'de çevrimiçi bir turizm acentesinin kontrol panelinden elde edilmiş veriler ile Rastgele Orman ve Gradyan-artırılmış Ağaç algoritmalarının ortalama tıklama maliyeti tahmininde performansları karşılaştırılmaktadır.With the development of technology that takes place more and more every day in our lives, it becomes almost impossible to manage and process the data produced and thus brought about the necessity of storage and analysis. Both the data size and the increase in the variety of data have necessitated the development of new methods in this context. In this thesis, machine learning applications have been developed by using distributed data management and analysis tools which have been developed for data that cannot be processed in traditional management. Applications were implemented using pyspark libraries on the Spark cluster created using the Google Cloud service. In this thesis, machine learning applications were carried out by using two different data sets. The application of machine learning was developed with Logistic Regression classification algorithm by using motion data obtained from wireless sensors in application-1. The use of resources in the cluster was observed during the execution of the application. In the application-2, the average clicks cost estimation performances of Random Forest and Gradient-boosted Tree algorithms were compared by using the data obtained from the control panel of an online tourism agency

    How can AI regulation be effectively enforced? : comparing compliance mechanisms for AI regulation with a multiple-criteria decision analysis

    Get PDF
    Award date: 17 June 2022. Supervisor: Professor Andrea Renda (European University Institute)Newly emerging AI regulations need effective and innovative enforcement and compliance mechanisms to assure that fundamental and human rights are protected when using an AI system. This study compares four different compliance mechanisms namely ‘Real-Time and Automated Conformity Assessment’, ‘Standardization and Certification’, ‘Algorithmic Impact Assessment’ and ‘Algorithmic Auditing’ as well as three different assurers of compliance namely deployers, notified bodies and civil society organisations. With an MCDA, this research has shown that civil society-based compliance mechanisms are believed to be less effective, less feasible and more costly compared to all other compliance mechanisms. Second, external compliance mechanisms (by notified bodies) were rated to be more effective but also more difficult to implement compared to internal compliance mechanisms. Third, algorithmic auditing scored highest among all policy options. Fourth, despite its experimental nature, automated and real-time compliance mechanisms are not scored significantly lower than other compliance mechanisms
    corecore