7 research outputs found

    Study on log event noise reduction by using Naive Bayes supervised machine learning

    Get PDF
    This research addresses which Naive Bayes model would be best to predict Windows log events that could be considered noise or in other words not containing information about malicious activities. With the exploding amount of log data being generated by servers, large corporations or organizations are having an increasingly difficult time analyzing these logs to find evidence of malicious activity in their environment. Fortune 200 and larger corporations today are producing Terabytes of log events daily and this is expanding at a rate that soon it will be in the Petabytes. It is estimated that 80 to 90 percent of these log events could be classified as noise or just informational. They are not needed for finding evidence of malicious activity. By showing a process that can be used to predict whether these log events are noise or non-noise, with a reasonable degree of accuracy, tools could then be used to analyze log events to find malicious activity to filter out noise events and reduce the amount of data needed to be processed. This research will compare the Naive Bayes Bag of Words Multinomial, Multinomial TF-IDF and Multi-Variate Bernoulli models using different size feature word sets in predicting Windows noise log events

    Natural Disaster Application on Big Data and Machine Learning: A Review

    Get PDF
    Natural disasters are events that are difficult to avoid. There are several ways of reducing the risks of natural disasters. One of them is implementing disaster reduction programs. There are already several developed countries that apply the concept of disaster reduction. In addition to disaster reduction programs, there are several ways to predict or reducing the risks using artificial intelligence technology. One of them is big data, machine learning, and deep learning. By utilizing this method at the moment, it facilitates tasks in visualizing, analyzing, and predicting natural disaster. This research will focus on conducting a review process and understanding the purpose of machine learning and big data in the area of disaster management and natural disaster. The result of this paper is providing insight and the use of big data, machine learning, and deep learning in 6 disaster management area. This 6-disaster management area includes early warning damage, damage assessment, monitoring and detection, forecasting and predicting, and post-disaster coordination, and response, and long-term risk assessment and reduction

    When Silver Is As Good As Gold: Using Weak Supervision to Train Machine Learning Models on Social Media Data

    Get PDF
    Over the last decade, advances in machine learning have led to an exponential growth in artificial intelligence i.e., machine learning models capable of learning from vast amounts of data to perform several tasks such as text classification, regression, machine translation, speech recognition, and many others. While massive volumes of data are available, due to the manual curation process involved in the generation of training datasets, only a percentage of the data is used to train machine learning models. The process of labeling data with a ground-truth value is extremely tedious, expensive, and is the major bottleneck of supervised learning. To curtail this, the theory of noisy learning can be employed where data labeled through heuristics, knowledge bases and weak classifiers can be utilized for training, instead of data obtained through manual annotation. The assumption here is that a large volume of training data, which contains noise and acquired through an automated process, can compensate for the lack of manual labels. In this study, we utilize heuristic based approaches to create noisy silver standard datasets. We extensively tested the theory of noisy learning on four different applications by training several machine learning models using the silver standard dataset with several sample sizes and class imbalances and tested the performance using a gold standard dataset. Our evaluations on the four applications indicate the success of silver standard datasets in identifying a gold standard dataset. We conclude the study with evidence that noisy social media data can be utilized for weak supervisio

    Analyzing Twitter Feeds to Facilitate Crises Informatics and Disaster Response During Mass Emergencies

    Get PDF
    It is a common practice these days for general public to use various micro-blogging platforms, predominantly Twitter, to share ideas, opinions and information about things and life. Twitter is also being increasingly used as a popular source of information sharing during natural disasters and mass emergencies to update and communicate the extent of the geographic phenomena, report the affected population and casualties, request or provide volunteering services and to share the status of disaster recovery process initiated by humanitarian-aid and disaster-management organizations. Recent research in this area has affirmed the potential use of such social media data for various disaster response tasks. Even though the availability of social media data is massive, open and free, there is a significant limitation in making sense of this data because of its high volume, variety, velocity, value, variability and veracity. The current work provides a comprehensive framework of text processing and analysis performed on several thousands of tweets shared on Twitter during natural disaster events. Specifically, this work em- ploys state-of-the-art machine learning techniques from natural language processing on tweet content to process the ginormous data generated at the time of disasters. This study shall serve as a basis to provide useful actionable information to the crises management and mitigation teams in planning and preparation of effective disaster response and to facilitate the development of future automated systems for handling crises situations

    Réactions des participants à la discussion #JeSuisCharlie sur Twitter : Étude des thèmes et des émotions dans les tweets en français

    Get PDF
    Dans ce mémoire de master, nous nous intéressons aux discours numériques qui sont produits sur le site de microblogage Twitter. Nous examinons les réactions des internautes dans les tweets à l’attaque terroriste qui a lieu dans les bureaux du journal satirique Charlie Hebdo à Paris, en 2015. Plus précisement, nous observons les thèmes reflétés dans les messages, l’expression des émotions ainsi que les hashtags utilisés.Dans le cadre théorique, d’abord, nous soulignons la fonctionnalité de Twitter pendant les situations variées, et surtout l’importance des hashtags qui relient des gens différents pour partager les informations. Par la suite, nous situons notre travail au domaine de l’analyse du discours d’un point de vue linguistique (Maingueneau, 2014) avec les concepts de positionnement (Davies & Harré, 1990 ; Maingueneau, 2002a) et d’énonciation (Riegel & al., 1994). À la fin, nous observons l’expression verbale des émotions dans le champ linguistique ainsi que sur Twitter.Pour répondre à nos questions de recherche, nous analysons les tweets en français qui sont envoyés à la discussion #JeSuisCharlie le 7 janvier 2015, dans la soirée de la fusillade. Notre étude porte une perspective linguistique et consiste d’une analyse du discours numérique qualitative en impliquant la théorie de l’énonciation (Riegel & al., 1994) et le positionnement (Davies & Harré, 1990 ; Maingueneau, 2002a). Pour observer les émotions, nous adoptons une notion de positionnement émotionnel. Les résultats indiquent que les thèmes reflétés sont liés à la solidarité et à la liberté d’expression. Les internautes expriment surtout les émotions de solidarité, de haine et de chagrin qui sont exprimées par les indices de la subjectivité (identifiés par Kerbrat-Orecchioni, 1980), les noms de lieu et les informations exactes sur la jour et l’heure. Parmi les hashtags, il y en a aussi qui expriment, entre autres, la solidarité. Le hashtag le plus utilisé à côté de #JeSuisCharlie est celui de #CharlieHebdo.</p

    Using Twitter to mobilise knowledge for First Contact Physiotherapists - A qualitative study

    Get PDF
    Background - First Contact Physiotherapists (FCPs) specialise in supporting people who consult with musculoskeletal conditions in National Health Service primary care. Cited FCP role challenges include professional isolation, time demands and changing professional and policy contexts.The evidence-to-practice gap is the delay between research knowledge being created and subsequently used in clinical practice and can result in patients not benefiting from healthcare advances. Knowledge mobilisation aims to close this gap by using different types of best available knowledge to support clinical decision making and optimise care. Twitter, though commonly used, has not yet been explored as a source of knowledge to inform FCP clinical practice.Methods - Semi-structured interviews with UK musculoskeletal FCPs (n=19) took place following purposive and snowball sampling. Data were analysed thematically and the knowledge mobilisation mindlines model was selected as a lens through which to further interpret the data. A Stakeholder Advisory Group including public members informed the study methods, topic guides and dissemination of the findings.Results - This study demonstrates how Twitter can meet FCP needs by providing rapid access to succinct, current and diverse knowledge to inform clinical practice. Twitter provides opportunities to overcome professional isolation and for clinical reassurance from peers. FCPs casually scrolled for knowledge, needed to filter knowledge for credibility and appreciated tweets with images or infographics. FCPs adapt knowledge from Twitter for offline training and clinical practice, however despite their clinical expertise and experience, most did not feel confident or safe to share their own knowledge and opinions online. This was due to witnessing ‘unprofessional’ and hostile behaviour online and misinformation and privacy concerns. Conclusions - Twitter offers a platform to mobilise knowledge to FCPs. Recommendations to enable confident knowledge sharing include FCP and Knowledge Mobiliser training, governance guidance for professional bodies and establishment of FCP Twitter networks
    corecore