40 research outputs found

    Resolving FP-TP Conflict in Digest-Based Collaborative Spam Detection by Use of Negative Selection Algorithm

    Get PDF
    A well-known approach for collaborative spam filtering is to determine which emails belong to the same bulk, e.g. by exploiting their content similarity. This allows, after observing an initial portion of a bulk, for the bulkiness scores to be assigned to the remaining emails from the same bulk. This also allows the individual evidence of spamminess to be joined, if such evidence is generated by collaborating filters or users for some of the emails from an initial portion of the bulk. Usually a database of previously observed emails or email digests is formed and queried upon receiving new emails. Previous evaluations [2,10] of the approach based on the email digests that preserve email content similarity indicate and partially demonstrate that there are ways to make the approach robust to increased obfuscation efforts by spammers. However, for the settings of the parameters that provide good matching between the emails from the same bulk, the unwanted random matching between ham emails and unrelated ham and spam emails stays rather high. This directly translates into a need for use of higher bulkiness thresholds in order to ensure low false positive (FP) detection of ham, which implies that larger initial parts of spam bulks will not be filtered, i.e. true positive (TP) detection will not be very high (FP-TP conflict). In this paper we demonstrate how, by use of the negative selection algorithm, the unwanted random matching between unrelated emails may be decreased at least by an order of magnitude, while preserving the same good matching between the emails from the same bulk. We also show how this translates into an order of magnitude (at least) of less undetected bulky spam emails, under the same ham miss- detection requirements

    Artificial immune system for the Internet

    Get PDF
    We investigate the usability of the Artificial Immune Systems (AIS) approach for solving selected problems in computer networks. Artificial immune systems are created by using the concepts and algorithms inspired by the theory of how the Human Immune System (HIS) works. We consider two applications: detection of routing misbehavior in mobile ad hoc networks, and email spam filtering. In mobile ad hoc networks the multi-hop connectivity is provided by the collaboration of independent nodes. The nodes follow a common protocol in order to build their routing tables and forward the packets of other nodes. As there is no central control, some nodes may defect to follow the common protocol, which would have a negative impact on the overall connectivity in the network. We build an AIS for the detection of routing misbehavior by directly mapping the standard concepts and algorithms used for explaining how the HIS works. The implementation and evaluation in a simulator shows that the AIS mimics well most of the effects observed in the HIS, e.g. the faster secondary reaction to the already encountered misbehavior. However, its effectiveness and practical usability are very constrained, because some particularities of the problem cannot be accounted for by the approach, and because of the computational constrains (reported also in AIS literature) of the used negative selection algorithm. For the spam filtering problem, we apply the AIS concepts and algorithms much more selectively and in a less standard way, and we obtain much better results. We build the AIS for antispam on top of a standard technique for digest-based collaborative email spam filtering. We notice un advantageous and underemphasized technological difference between AISs and the HIS, and we exploit this difference to incorporate the negative selection in an innovative and computationally efficient way. We also improve the representation of the email digests used by the standard collaborative spam filtering scheme. We show that this new representation and the negative selection, when used together, improve significantly the filtering performance of the standard scheme on top of which we build our AIS. Our complete AIS for antispam integrates various innate and adaptive AIS mechanisms, including the mentioned specific use of the negative selection and the use of innate signalling mechanisms (PAMP and danger signals). In this way the AIS takes into account users' profiles, implicit or explicit feedback from the users, and the bulkiness of spam. We show by simulations that the overall AIS is very good both in detecting spam and in avoiding misdetection of good emails. Interestingly, both the innate and adaptive mechanisms prove to be crucial for achieving the good overall performance. We develop and test (within a simulator) our AIS for collaborative spam filtering in the case of email communications. The solution however seems to be well applicable to other types of Internet communications: Internet telephony, chat/sms, forum, news, blog, or web. In all these cases, the aim is to allow the wanted communications (content) and prevent those unwanted from reaching the end users and occupying their time and communication resources. The filtering problems, faced or likely to be faced in the near future by these applications, have or are likely to have the settings similar to those that we have in the email case: need for openness to unknown senders (creators of content, initiators of the communication), bulkiness in receiving spam (many recipients are usually affected by the same spam content), tolerance of the system to a small damage (to small amounts of unfiltered spam), possibility to implicitly or explicitly and in a cheap way obtain a feedback from the recipients about the damage (about spam that they receive), need for strong tolerance to wanted (non-spam) content. Our experiments with the email spam filtering show that our AIS, i.e. the way how we build it, is well fitted to such problem settings

    A Trust Management Framework for Vehicular Ad Hoc Networks

    Get PDF
    The inception of Vehicular Ad Hoc Networks (VANETs) provides an opportunity for road users and public infrastructure to share information that improves the operation of roads and the driver experience. However, such systems can be vulnerable to malicious external entities and legitimate users. Trust management is used to address attacks from legitimate users in accordance with a user’s trust score. Trust models evaluate messages to assign rewards or punishments. This can be used to influence a driver’s future behaviour or, in extremis, block the driver. With receiver-side schemes, various methods are used to evaluate trust including, reputation computation, neighbour recommendations, and storing historical information. However, they incur overhead and add a delay when deciding whether to accept or reject messages. In this thesis, we propose a novel Tamper-Proof Device (TPD) based trust framework for managing trust of multiple drivers at the sender side vehicle that updates trust, stores, and protects information from malicious tampering. The TPD also regulates, rewards, and punishes each specific driver, as required. Furthermore, the trust score determines the classes of message that a driver can access. Dissemination of feedback is only required when there is an attack (conflicting information). A Road-Side Unit (RSU) rules on a dispute, using either the sum of products of trust and feedback or official vehicle data if available. These “untrue attacks” are resolved by an RSU using collaboration, and then providing a fixed amount of reward and punishment, as appropriate. Repeated attacks are addressed by incremental punishments and potentially driver access-blocking when conditions are met. The lack of sophistication in this fixed RSU assessment scheme is then addressed by a novel fuzzy logic-based RSU approach. This determines a fairer level of reward and punishment based on the severity of incident, driver past behaviour, and RSU confidence. The fuzzy RSU controller assesses judgements in such a way as to encourage drivers to improve their behaviour. Although any driver can lie in any situation, we believe that trustworthy drivers are more likely to remain so, and vice versa. We capture this behaviour in a Markov chain model for the sender and reporter driver behaviours where a driver’s truthfulness is influenced by their trust score and trust state. For each trust state, the driver’s likelihood of lying or honesty is set by a probability distribution which is different for each state. This framework is analysed in Veins using various classes of vehicles under different traffic conditions. Results confirm that the framework operates effectively in the presence of untrue and inconsistent attacks. The correct functioning is confirmed with the system appropriately classifying incidents when clarifier vehicles send truthful feedback. The framework is also evaluated against a centralized reputation scheme and the results demonstrate that it outperforms the reputation approach in terms of reduced communication overhead and shorter response time. Next, we perform a set of experiments to evaluate the performance of the fuzzy assessment in Veins. The fuzzy and fixed RSU assessment schemes are compared, and the results show that the fuzzy scheme provides better overall driver behaviour. The Markov chain driver behaviour model is also examined when changing the initial trust score of all drivers

    Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing

    Get PDF
    This thesis presents several state-of-the-art approaches constructed for the purpose of (i) studying the trustworthiness of users in Online Social Network platforms, (ii) deriving concealed knowledge from their textual content, and (iii) classifying and predicting the domain knowledge of users and their content. The developed approaches are refined through proof-of-concept experiments, several benchmark comparisons, and appropriate and rigorous evaluation metrics to verify and validate their effectiveness and efficiency, and hence, those of the applied frameworks

    B!SON: A Tool for Open Access Journal Recommendation

    Get PDF
    Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project

    Semantic enrichment of knowledge sources supported by domain ontologies

    Get PDF
    This thesis introduces a novel conceptual framework to support the creation of knowledge representations based on enriched Semantic Vectors, using the classical vector space model approach extended with ontological support. One of the primary research challenges addressed here relates to the process of formalization and representation of document contents, where most existing approaches are limited and only take into account the explicit, word-based information in the document. This research explores how traditional knowledge representations can be enriched through incorporation of implicit information derived from the complex relationships (semantic associations) modelled by domain ontologies with the addition of information presented in documents. The relevant achievements pursued by this thesis are the following: (i) conceptualization of a model that enables the semantic enrichment of knowledge sources supported by domain experts; (ii) development of a method for extending the traditional vector space, using domain ontologies; (iii) development of a method to support ontology learning, based on the discovery of new ontological relations expressed in non-structured information sources; (iv) development of a process to evaluate the semantic enrichment; (v) implementation of a proof-of-concept, named SENSE (Semantic Enrichment kNowledge SourcEs), which enables to validate the ideas established under the scope of this thesis; (vi) publication of several scientific articles and the support to 4 master dissertations carried out by the department of Electrical and Computer Engineering from FCT/UNL. It is worth mentioning that the work developed under the semantic referential covered by this thesis has reused relevant achievements within the scope of research European projects, in order to address approaches which are considered scientifically sound and coherent and avoid “reinventing the wheel”.European research projects - CoSpaces (IST-5-034245), CRESCENDO (FP7-234344) and MobiS (FP7-318452

    Digital Forensics AI: on Practicality, Optimality, and Interpretability of Digital Evidence Mining Techniques

    Get PDF
    Digital forensics as a field has progressed alongside technological advancements over the years, just as digital devices have gotten more robust and sophisticated. However, criminals and attackers have devised means for exploiting the vulnerabilities or sophistication of these devices to carry out malicious activities in unprecedented ways. Their belief is that electronic crimes can be committed without identities being revealed or trails being established. Several applications of artificial intelligence (AI) have demonstrated interesting and promising solutions to seemingly intractable societal challenges. This thesis aims to advance the concept of applying AI techniques in digital forensic investigation. Our approach involves experimenting with a complex case scenario in which suspects corresponded by e-mail and deleted, suspiciously, certain communications, presumably to conceal evidence. The purpose is to demonstrate the efficacy of Artificial Neural Networks (ANN) in learning and detecting communication patterns over time, and then predicting the possibility of missing communication(s) along with potential topics of discussion. To do this, we developed a novel approach and included other existing models. The accuracy of our results is evaluated, and their performance on previously unseen data is measured. Second, we proposed conceptualizing the term “Digital Forensics AI” (DFAI) to formalize the application of AI in digital forensics. The objective is to highlight the instruments that facilitate the best evidential outcomes and presentation mechanisms that are adaptable to the probabilistic output of AI models. Finally, we enhanced our notion in support of the application of AI in digital forensics by recommending methodologies and approaches for bridging trust gaps through the development of interpretable models that facilitate the admissibility of digital evidence in legal proceedings

    Digital Forensics AI: on Practicality, Optimality, and Interpretability of Digital Evidence Mining Techniques

    Get PDF
    Digital forensics as a field has progressed alongside technological advancements over the years, just as digital devices have gotten more robust and sophisticated. However, criminals and attackers have devised means for exploiting the vulnerabilities or sophistication of these devices to carry out malicious activities in unprecedented ways. Their belief is that electronic crimes can be committed without identities being revealed or trails being established. Several applications of artificial intelligence (AI) have demonstrated interesting and promising solutions to seemingly intractable societal challenges. This thesis aims to advance the concept of applying AI techniques in digital forensic investigation. Our approach involves experimenting with a complex case scenario in which suspects corresponded by e-mail and deleted, suspiciously, certain communications, presumably to conceal evidence. The purpose is to demonstrate the efficacy of Artificial Neural Networks (ANN) in learning and detecting communication patterns over time, and then predicting the possibility of missing communication(s) along with potential topics of discussion. To do this, we developed a novel approach and included other existing models. The accuracy of our results is evaluated, and their performance on previously unseen data is measured. Second, we proposed conceptualizing the term “Digital Forensics AI” (DFAI) to formalize the application of AI in digital forensics. The objective is to highlight the instruments that facilitate the best evidential outcomes and presentation mechanisms that are adaptable to the probabilistic output of AI models. Finally, we enhanced our notion in support of the application of AI in digital forensics by recommending methodologies and approaches for bridging trust gaps through the development of interpretable models that facilitate the admissibility of digital evidence in legal proceedings

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC
    corecore