    New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools

                 الجميع متصل بوسائل التواصل الاجتماعي مثل) الفيس بوك وتويتر ولنكدان والانستغرام ...الخ) , التي تتولد من خلالها كميات هائلة من البيانات لا تستطيع التطبيقات التقليدية من معالجتها , حيث تعتبر وسائل التواصل الاجتماعي منصة مهمة لتبادل المعلومات والآراء والمعرفة التي يجريها العديد من المشتركين ,على الرغم من هذه السمات الأساسية ، تساهم البيانات الضخمة أيضًا في العديد من المشكلات ، مثل جمع البيانات ، والتخزين ، والنقل ، والتحديث ، والمراجعة ، والنشر ، والمسح الضوئي ، والتصور ، وحماية البيانات ... إلخ. للتعامل مع كل هذه المشاكل، ظهرت الحاجة إلى نظام مناسب لا يقوم فقط بإعداد التفاصيل، بل يوفر أيضًا تحليلًا ذا مغزى للاستفادة من المواقف الصعبة، سواء ذات الصلة بالأعمال التجارية، أو القرار المناسب، أو الصحة، أو وسائل التواصل الاجتماعي، أو العلوم، الاتصالات، البيئة... إلخ.يلاحظ المؤلفون من خلال قراءة الدراسات السابقة أن هناك تحليلات مختلفة من خلال Hadoop وأدواته المختلفة مثل المشاعر في الوقت الفعلي وغيرها. ومع ذلك، فإن التعامل مع هذه البيانات الضخمة يعد مهمة صعبة. لذلك فإن هذا النوع من التحليل يكون بكفاءه أكثر أكثر كفاءة فقط من خلال نظام Hadoop البيئي.، الغرض من هذه الورقة هو تحليل الأدبيات المتعلقة بتحليل البيانات الضخمة لوسائل التواصل الاجتماعي باستخدام إطار Hadoop لمعرفة أدوات التحليل تقريبًا الموجودة في العالم تحت مظلة Hadoop وتوجهاتها بالإضافة إلى الصعوبات والأساليب الحديثة لها للتغلب على تحديات البيانات الضخمة في المعالجة غير المتصلة وفي الوقت الفعلي. تعمل التحليلات في الوقت الفعلي على تسريع عملية اتخاذ القرار إلى جانب توفير الوصول إلى مقاييس الأعمال وإعداد التقارير. كما تم توضيح المقارنة بين Hadoop و spark.Everybody is connected with social media like (Facebook, Twitter, LinkedIn, Instagram…etc.) that generate a large quantity of data and which traditional applications are inadequate to process. Social media are regarded as an important platform for sharing information, opinion, and knowledge of many subscribers. These basic media attribute Big data also to many issues, such as data collection, storage, moving, updating, reviewing, posting, scanning, visualization, Data protection, etc. To deal with all these problems, this is a need for an adequate system that not just prepares the details, but also provides meaningful analysis to take advantage of the difficult situations, relevant to business, proper decision, Health, social media, science, telecommunications, the environment, etc. Authors notice through reading of previous studies that there are different analyzes through HADOOP and its various tools such as the sentiment in real-time and others. However, dealing with this Big data is a challenging task. Therefore, such type of analysis is more efficiently possible only through the Hadoop Ecosystem. The purpose of this paper is to analyze literature related analysis of big data of social media using the Hadoop framework for knowing almost analysis tools existing in the world under the Hadoop umbrella and its orientations in addition to difficulties and modern methods of them to overcome challenges of big data in offline and real –time processing. Real-time Analytics accelerates decision-making along with providing access to business metrics and reporting. Comparison between Hadoop and spark has been also illustrated

    Sentiment Analysis of Twitter Data for a Tourism Recommender System in Bangladesh

    The exponentially expanding Digital Universe is generating huge amount of data containing valuable information. The tourism industry, which is one of the fastest growing economic sectors, can benefit from the myriad of digital data travelers generate in every phase of their travel- planning, booking, traveling, feedback etc. One application of tourism related data can be to provide personalized destination recommendations. The primary objective of this research is to facilitate the business development of a tourism recommendation system for Bangladesh called “JatraLog”. Sentiment based recommendation is one of the features that will be employed in the recommendation system. This thesis aims to address two research goals: firstly, to study Sentiment Analysis as a tourism recommendation tool and secondly, to investigate twitter as a potential source of valuable tourism related data for providing recommendations for different countries, specifically Bangladesh. Sentiment Analysis can be defined as a Text Classification problem, where a document or text is classified into two groups: positive or negative, and in some cases a third group, i.e. neutral. For this thesis, two sets of tourism related English language tweets were collected from Twitter using keywords. The first set contains only the tweets and the second set contains geo-location and timestamp along with the tweets. Then the collected tweets were automatically labeled as positive or negative depending on whether the tweets contained positive or negative emoticons respectively. After they were labeled, 90% of the tweets from the first set were used to train a Naive Bayes Sentiment Classifier and the remaining 10% were used to test the accuracy of the Classifier. The Classifier accuracy was found to be approximately 86.5%. The second set was used to retrieve statistical information required to address the second research goal, i.e. investigating Twitter as a potential source of sentiment data for a destination recommendation system

    A comparison of statistical machine learning methods in heartbeat detection and classification

    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

    Towards a Workload for Evolutionary Analytics

    Emerging data analysis involves the ingestion and exploration of new data sets, application of complex functions, and frequent query revisions based on observing prior query answers. We call this new type of analysis evolutionary analytics and identify its properties. This type of analysis is not well represented by current benchmark workloads. In this paper, we present a workload and identify several metrics to test system support for evolutionary analytics. Along with our metrics, we present methodologies for running the workload that capture this analytical scenario.Comment: 10 page

    Revisiting Ralph Sprague’s Framework for Developing Decision Support Systems

    Ralph H. Sprague Jr. was a leader in the MIS field and helped develop the conceptual foundation for decision support systems (DSS). In this paper, I pay homage to Sprague and his DSS contributions. I take a personal perspective based on my years of working with Sprague. I explore the history of DSS and its evolution. I also present and discuss Sprague’s DSS development framework with its dialog, data, and models (DDM) paradigm and characteristics. At its core, the development framework remains valid in today’s world of business intelligence and big data analytics. I present and discuss a contemporary reference architecture for business intelligence and analytics (BI/A) in the context of Sprague’s DSS development framework. The practice of decision support continues to evolve and can be described by a maturity model with DSS, enterprise data warehousing, real-time data warehousing, big data analytics, and the emerging cognitive as successive generations. I use a DSS perspective to describe and provide examples of what the forthcoming cognitive generation will bring

    Essentials of Business Analytics

