19 research outputs found

    Incremental Information Gain Analysis of Input Attribute Impact on RBF-Kernel SVM Spam Detection

    Get PDF
    The massive increase of spam is posing a very serious threat to email and SMS, which have become an important means of communication. Not only do spams annoy users, but they also become a security threat. Machine learning techniques have been widely used for spam detection. Email spams can be detected through detecting senders’ behaviour, the contents of an email, subject and source address, etc, while SMS spam detection usually is based on the tokens or features of messages due to short content. However, a comprehensive analysis of email/SMS content may provide cures for users to aware of email/SMS spams. We cannot completely depend on automatic tools to identify all spams. In this paper, we propose an analysis approach based on information entropy and incremental learning to see how various features affect the performance of an RBF-based SVM spam detector, so that to increase our awareness of a spam by sensing the features of a spam. The experiments were carried out on the spambase and SMSSpemCollection databases in UCI machine learning repository. The results show that some features have significant impacts on spam detection, of which users should be aware, and there exists a feature space that achieves Pareto efficiency in True Positive Rate and True Negative Rate

    Controlling Informational Society: A Google Error Analysis!

    Get PDF
    “Informational Society” is unceasingly discussed by all societies’ quadrants. Nevertheless, in spite of illustrating the most recent progress of western societies the complexity to characterize it is well-known. In this societal evolution the “leading role” goes to information, as a polymorphic phenomenon and a polysemantic concept. Given such claim and the need for a multidimensional approach, the overall amount of information available online has reached an unparalleled level, and consequently search engines become exceptionally important. Search engines main stream literature has been debating the following perspectives: technology, user level of expertise and confidence, organizational impact, and just recently power issues. However, the trade-off between informational fluxes versus control has been disregarded. So, our intention is to discuss such gap, and for that, the overall structure of the chapter is: information, search engines, control and its dimensions, and exploit Google as a case study

    The security challenges in the IoT enabled cyber-physical systems and opportunities for evolutionary computing & other computational intelligence

    Get PDF
    Internet of Things (IoT) has given rise to the fourth industrial revolution (Industrie 4.0), and it brings great benefits by connecting people, processes and data. However, cybersecurity has become a critical challenge in the IoT enabled cyber physical systems, from connected supply chain, Big Data produced by huge amount of IoT devices, to industry control systems. Evolutionary computation combining with other computational intelligence will play an important role for cybersecurity, such as artificial immune mechanism for IoT security architecture, data mining/fusion in IoT enabled cyber physical systems, and data driven cybersecurity. This paper provides an overview of security challenges in IoT enabled cyber-physical systems and what evolutionary computation and other computational intelligence technology could contribute for the challenges. The overview could provide clues and guidance for research in IoT security with computational intelligence

    Methods for demoting and detecting Web spam

    Get PDF
    Web spamming has tremendously subverted the ranking mechanism of information retrieval in Web search engines. It manipulates data source maliciously either by contents or links with the intention of contributing negative impacts to Web search results. The altering order of the search results by spammers has increased the difficulty level of searching and time consumption for Web users to retrieve relevant information. In order to improve the quality of Web search engines results, the design of anti-Web spam techniques are developed in this thesis to detect and demote Web spam via trust and distrust and Web spam classification.A comprehensive literature on existing anti-Web spam techniques emphasizing on trust and distrust model and machine learning model is presented. Furthermore, several experiments are conducted to show the vulnerability of ranking algorithm towards Web spam. Two public available Web spam datasets are used for the experiments throughout the thesis - WEBSPAM-UK2006 and WEBSPAM-UK2007.Two link-based trust and distrust model algorithms are presented subsequently: Trust Propagation Rank and Trust Propagation Spam Mass. Both algorithms semi automatically detect and demote Web spam based on limited human experts’ evaluation of non-spam and spam pages. In the experiments, the results for Trust Propagation Rank and Trust Propagation Spam Mass have achieved up to 10.88% and 43.94% improvement over the benchmark algorithms.Thereafter, the weight properties which associated as the linkage between two Web hosts are introduced into the task of Web spam detection. In most studies, the weight properties are involved in ranking mechanism; in this research work, the weight properties are incorporated into distrust based algorithms to detect more spam. The experiments have shown that the weight properties enhanced existing distrust based Web spam detection algorithms for up to 30.26% and 31.30% on both aforementioned datasets.Even though the integration of weight properties has shown significant results in detecting Web spam, the discussion on distrust seed set propagation algorithm is presented to further enhance the Web spam detection experience. Distrust seed set propagation algorithm propagates the distrust score in a wider range to estimate the probability of other unevaluated Web pages for being spam. The experimental results have shown that the algorithm improved the distrust based Web spam detection algorithms up to 19.47% and 25.17% on both datasets.An alternative machine learning classifier - multilayered perceptron neural network is proposed in the thesis to further improve the detection rate of Web spam. In the experiments, the detection rate of Web spam using multilayered perceptron neural network has increased up to 14.02% and 3.53% over the conventional classifier – support vector machines. At the same time, a mechanism to determine the number of hidden neurons for multilayered perceptron neural network is presented in this thesis to simplify the designing process of network structure

    Utilizing Multi-modal Weak Signals to Improve User Stance Inference in Social Media

    Get PDF
    Social media has become an integral component of the daily life. There are millions of various types of content being released into social networks daily. This allows for an interesting view into a users\u27 view on everyday life. Exploring the opinions of users in social media networks has always been an interesting subject for the Natural Language Processing researchers. Knowing the social opinions of a mass will allow anyone to make informed policy or marketing related decisions. This is exactly why it is desirable to find comprehensive social opinions. The nature of social media is complex and therefore obtaining the social opinion becomes a challenging task. Because of how diverse and complex social media networks are, they typically resonate with the actual social connections but in a digital platform. Similar to how users make friends and companions in the real world, the digital platforms enable users to mimic similar social connections. This work mainly looks at how to obtain a comprehensive social opinion out of social media network. Typical social opinion quantifiers will look at text contributions made by users to find the opinions. Currently, it is challenging because the majority of users on social media will be consuming content rather than expressing their opinions out into the world. This makes natural language processing based methods impractical due to not having linguistic features. In our work we look to improve a method named stance inference which can utilize multi-domain features to extract the social opinion. We also introduce a method which can expose users opinions even though they do not have on-topical content. We also note how by introducing weak supervision to an unsupervised task of stance inference we can improve the performance. The weak supervision we bring into the pipeline is through hashtags. We show how hashtags are contextual indicators added by humans which will be much likelier to be related than a topic model. Lastly we introduce disentanglement methods for chronological social media networks which allows one to utilize the methods we introduce above to be applied in these type of platforms

    Trustnet: a Trust and Reputation Management System in Distributed Environments

    Get PDF
    With emerging Internet-scale open content and resource sharing, social networks, and complex cyber-physical systems, trust issues become prominent. Despite their rigorous foundations, conventional network security theories and mechanisms are inadequate at addressing such loosely-defined security issues in decentralized open environments.In this dissertation, we propose a trust and reputation management system architecture and protocols (TrustNet), aimed to define and promote trust as a first-class system parameter on par with communication, computation, and storage performance metrics. To achieve such a breakthrough, we need a fundamentally new design paradigm to seamlessly integrate trust into system design. Our TrustNet initiative represents a bold effort to approach this ultimate goal. TrustNet is built on the top of underlying P2P and mobile ad hoc network layer and provides trust services to higher level applications and middleware. Following the TrustNet architecture, we design, implement, and analyze trust rating, trust aggregation, and trust management strategies. Especially, we propose three trust dissemination protocols and algorithms to meet the urgent needs and explicitly define and formulate end-to-end trust. We formulate trust management problems and propose the H-Trust, VectorTrust, and cTrust scheme to handle trust establishment and aggregation issues. We model trust relations as a trust graph in distributed environment to enhance accuracy and efficiency of trust establishment among peers. Leveraging the distributed Bellman-Ford algorithm, stochastic Markov chain process and H-Index algorithm for fast and lightweight aggregation of trust scores, our scheme are decentralized and self-configurable trust aggregation schemes.To evaluate TrustNet management strategies, we simulated our proposed protocols in both unstructured P2P network and mobile ad hoc network to analyze and simulate trust relationships. We use software generated data as well as real world data sets. Particularly, the student contact patterns on the NUS campus is used as our trust communication model. The simulation results demonstrate the features of trust relationship dissemination in real environments and the efficiency, accuracy, scalability and robustness of the TrustNet system.Computer Science Departmen

    Semantic and pragmatic characterization of learning objects

    Get PDF
    Tese de doutoramento. Engenharia Informática. Universidade do Porto. Faculdade de Engenharia. 201

    Addressing the new generation of spam (Spam 2.0) through Web usage models

    Get PDF
    New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem
    corecore