403 research outputs found

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    Get PDF
    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

    VGI and crowdsourced data credibility analysis using spam email detection techniques

    Get PDF
    Volunteered geographic information (VGI) can be considered a subset of crowdsourced data (CSD) and its popularity has recently increased in a number of application areas. Disaster management is one of its key application areas in which the benefits of VGI and CSD are potentially very high. However, quality issues such as credibility, reliability and relevance are limiting many of the advantages of utilising CSD. Credibility issues arise as CSD come from a variety of heterogeneous sources including both professionals and untrained citizens. VGI and CSD are also highly unstructured and the quality and metadata are often undocumented. In the 2011 Australian floods, the general public and disaster management administrators used the Ushahidi Crowd-mapping platform to extensively communicate flood-related information including hazards, evacuations, emergency services, road closures and property damage. This study assessed the credibility of the Australian Broadcasting Corporation’s Ushahidi CrowdMap dataset using a Naïve Bayesian network approach based on models commonly used in spam email detection systems. The results of the study reveal that the spam email detection approach is potentially useful for CSD credibility detection with an accuracy of over 90% using a forced classification methodology

    Artificial Intelligence and Machine Learning in Cybersecurity: Applications, Challenges, and Opportunities for MIS Academics

    Get PDF
    The availability of massive amounts of data, fast computers, and superior machine learning (ML) algorithms has spurred interest in artificial intelligence (AI). It is no surprise, then, that we observe an increase in the application of AI in cybersecurity. Our survey of AI applications in cybersecurity shows most of the present applications are in the areas of malware identification and classification, intrusion detection, and cybercrime prevention. We should, however, be aware that AI-enabled cybersecurity is not without its drawbacks. Challenges to AI solutions include a shortage of good quality data to train machine learning models, the potential for exploits via adversarial AI/ML, and limited human expertise in AI. However, the rewards in terms of increased accuracy of cyberattack predictions, faster response to cyberattacks, and improved cybersecurity make it worthwhile to overcome these challenges. We present a summary of the current research on the application of AI and ML to improve cybersecurity, challenges that need to be overcome, and research opportunities for academics in management information systems

    Applications in security and evasions in machine learning : a survey

    Get PDF
    In recent years, machine learning (ML) has become an important part to yield security and privacy in various applications. ML is used to address serious issues such as real-time attack detection, data leakage vulnerability assessments and many more. ML extensively supports the demanding requirements of the current scenario of security and privacy across a range of areas such as real-time decision-making, big data processing, reduced cycle time for learning, cost-efficiency and error-free processing. Therefore, in this paper, we review the state of the art approaches where ML is applicable more effectively to fulfill current real-world requirements in security. We examine different security applications' perspectives where ML models play an essential role and compare, with different possible dimensions, their accuracy results. By analyzing ML algorithms in security application it provides a blueprint for an interdisciplinary research area. Even with the use of current sophisticated technology and tools, attackers can evade the ML models by committing adversarial attacks. Therefore, requirements rise to assess the vulnerability in the ML models to cope up with the adversarial attacks at the time of development. Accordingly, as a supplement to this point, we also analyze the different types of adversarial attacks on the ML models. To give proper visualization of security properties, we have represented the threat model and defense strategies against adversarial attack methods. Moreover, we illustrate the adversarial attacks based on the attackers' knowledge about the model and addressed the point of the model at which possible attacks may be committed. Finally, we also investigate different types of properties of the adversarial attacks

    A Client based email phishing detection algorithm: case of phishing attacks in the banking industry

    Get PDF
    Thesis submitted in partial fulfillment of the requirements for the Degree of Master of Science in Information Systems Security (MSc.ISS) at Strathmore UniversityToday, the banking sector has been a target for many phishing attackers. The use of email as an electronic means of communication during working hours and mostly for official purposes has made it a lucrative attack vector. With the rapid growth of technology, phishing techniques have advanced as seen in the millions of cash lost by banks through email phishing yearly. This continues to be the case despite investments in spam filtering tools, monitoring tools as well as creating user awareness, through training of banking staff on how they can easily identify a phishing email. To protect bank users and prevent the financial loses through phishing attacks, it important to understand how phishing works as well as the techniques used to achieve it. Moreover, there is a great need to implement an anti-phishing algorithm that collectively checks against phishing linguistic techniques, existence of malicious links and malicious attachments. This can lead to an increase in the performance and accuracy of the designed tool towards detecting and flagging phishing emails thus preventing them from being read by target. Evolutionary prototyping methodology was applied during this research. The advantages are in the fact that it enabled continuous analysis and supervised learning of the algorithm development until the desired outcome was achieved. This research aimed at understanding the characteristic of phishing emails, towards achieving defence in depth through creation of an algorithm for detecting and flagging phishing emails. In this research, we have implemented a client-based anti-phishing algorithm. The algorithm is able to analyse phishing links, identify malicious email attachments and perform text classification using a NaĂŻve Bayes classifier to identify phishing terms in a new unread email. It then flags the email as malicious and sends it to the spam folder. Therefore the user only gets clean emails in the inbox folder

    Geospatial crowdsourced data fitness analysis for spatial data infrastructure based disaster management actions

    Get PDF
    The reporting of disasters has changed from official media reports to citizen reporters who are at the disaster scene. This kind of crowd based reporting, related to disasters or any other events, is often identified as 'Crowdsourced Data' (CSD). CSD are freely and widely available thanks to the current technological advancements. The quality of CSD is often problematic as it is often created by the citizens of varying skills and backgrounds. CSD is considered unstructured in general, and its quality remains poorly defined. Moreover, the CSD's location availability and the quality of any available locations may be incomplete. The traditional data quality assessment methods and parameters are also often incompatible with the unstructured nature of CSD due to its undocumented nature and missing metadata. Although other research has identified credibility and relevance as possible CSD quality assessment indicators, the available assessment methods for these indicators are still immature. In the 2011 Australian floods, the citizens and disaster management administrators used the Ushahidi Crowd-mapping platform and the Twitter social media platform to extensively communicate flood related information including hazards, evacuations, help services, road closures and property damage. This research designed a CSD quality assessment framework and tested the quality of the 2011 Australian floods' Ushahidi Crowdmap and Twitter data. In particular, it explored a number of aspects namely, location availability and location quality assessment, semantic extraction of hidden location toponyms and the analysis of the credibility and relevance of reports. This research was conducted based on a Design Science (DS) research method which is often utilised in Information Science (IS) based research. Location availability of the Ushahidi Crowdmap and the Twitter data assessed the quality of available locations by comparing three different datasets i.e. Google Maps, OpenStreetMap (OSM) and Queensland Department of Natural Resources and Mines' (QDNRM) road data. Missing locations were semantically extracted using Natural Language Processing (NLP) and gazetteer lookup techniques. The Credibility of Ushahidi Crowdmap dataset was assessed using a naive Bayesian Network (BN) model commonly utilised in spam email detection. CSD relevance was assessed by adapting Geographic Information Retrieval (GIR) relevance assessment techniques which are also utilised in the IT sector. Thematic and geographic relevance were assessed using Term Frequency – Inverse Document Frequency Vector Space Model (TF-IDF VSM) and NLP based on semantic gazetteers. Results of the CSD location comparison showed that the combined use of non-authoritative and authoritative data improved location determination. The semantic location analysis results indicated some improvements of the location availability of the tweets and Crowdmap data; however, the quality of new locations was still uncertain. The results of the credibility analysis revealed that the spam email detection approaches are feasible for CSD credibility detection. However, it was critical to train the model in a controlled environment using structured training including modified training samples. The use of GIR techniques for CSD relevance analysis provided promising results. A separate relevance ranked list of the same CSD data was prepared through manual analysis. The results revealed that the two lists generally agreed which indicated the system's potential to analyse relevance in a similar way to humans. This research showed that the CSD fitness analysis can potentially improve the accuracy, reliability and currency of CSD and may be utilised to fill information gaps available in authoritative sources. The integrated and autonomous CSD qualification framework presented provides a guide for flood disaster first responders and could be adapted to support other forms of emergencies
    • 

    corecore