2,925 research outputs found

    Using online linear classifiers to filter spam Emails

    Get PDF
    The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

    Social Machinery and Intelligence

    Get PDF
    Social machines are systems formed by technical and human elements interacting in a structured manner. The use of digital platforms as mediators allows large numbers of human participants to join such mechanisms, creating systems where interconnected digital and human components operate as a single machine capable of highly sophisticated behaviour. Under certain conditions, such systems can be described as autonomous and goal-driven agents. Many examples of modern Artificial Intelligence (AI) can be regarded as instances of this class of mechanisms. We argue that this type of autonomous social machines has provided a new paradigm for the design of intelligent systems marking a new phase in the field of AI. The consequences of this observation range from methodological, philosophical to ethical. On the one side, it emphasises the role of Human-Computer Interaction in the design of intelligent systems, while on the other side it draws attention to both the risks for a human being and those for a society relying on mechanisms that are not necessarily controllable. The difficulty by companies in regulating the spread of misinformation, as well as those by authorities to protect task-workers managed by a software infrastructure, could be just some of the effects of this technological paradigm

    Last mile delivery

    Get PDF
    Last mile delivery is one of the most complex processes in the whole logistics process. This is because it involves many uncertainties, such as weather conditions, road conditions, traffic, car accidents, delivery vehicle anomalies, choice of route, avoiding parcel damage and delivery errors, and communication with the retailer or the recipient of the parcel; all this makes the successful delivery of parcels at the customers’ doorstep difficult. In addition, today’s consumers have much greater expectations regarding delivery services, they demand to receive their parcels much faster or be able to choose the time and place of delivery. All this increases the cost of last mile delivery, accounting for 40% of overall supply chain costs. E-commerce giants such as Amazon can invest a large number of resources into creating optimal last mile delivery solutions, establish numerous warehouses throughout countries which enable them to store the parcels as close to the end user as possible. However, companies that do not have as many resources may find it difficult to satisfy the delivery expectations of their customers; longer and inflexible waiting times, as well as additional payment for delivery may cause companies to quickly lose competitiveness on the market. This means that companies must turn to technological solutions that are going to help them to improve their last mile delivery effectively but at a reasonably low price. Big Data are the basis of all smart solutions. This is because collecting large amounts of data makes it possible to extract information and make future predictions on the basis of past patterns

    KeyForge: Mitigating Email Breaches with Forward-Forgeable Signatures

    Full text link
    Email breaches are commonplace, and they expose a wealth of personal, business, and political data that may have devastating consequences. The current email system allows any attacker who gains access to your email to prove the authenticity of the stolen messages to third parties -- a property arising from a necessary anti-spam / anti-spoofing protocol called DKIM. This exacerbates the problem of email breaches by greatly increasing the potential for attackers to damage the users' reputation, blackmail them, or sell the stolen information to third parties. In this paper, we introduce "non-attributable email", which guarantees that a wide class of adversaries are unable to convince any third party of the authenticity of stolen emails. We formally define non-attributability, and present two practical system proposals -- KeyForge and TimeForge -- that provably achieve non-attributability while maintaining the important protection against spam and spoofing that is currently provided by DKIM. Moreover, we implement KeyForge and demonstrate that that scheme is practical, achieving competitive verification and signing speed while also requiring 42% less bandwidth per email than RSA2048

    Addressing the new generation of spam (Spam 2.0) through Web usage models

    Get PDF
    New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem

    Development of a framework for the classification of antibiotics adjuvants

    Get PDF
    Dissertação de mestrado em BioInformaticsThroughout the last decades, bacteria have become increasingly resistant to available antibiotics, leading to a growing need for new antibiotics and new drug development methodologies. In the last 40 years, there are no records of the development of new antibiotics, which has begun to shorten possible alternatives. Therefore, finding new antibiotics and bringing them to market is increasingly challenging. One approach is finding compounds that restore or leverage the activity of existing antibiotics against biofilm bacteria. As the information in this field is very limited and there is no database regarding this theme, machine learning models were used to predict the relevance of the documents regarding adjuvants. In this project, the BIOFILMad - Catalog of antimicrobial adjuvants to tackle biofilms application was developed to help researchers save time in their daily research. This application was constructed using Django and Django REST Framework for the backend and React for the frontend. As for the backend, a database needed to be constructed since no database entirely focuses on this topic. For that, a machine learning model was trained to help us classify articles. Three different algorithms were used, Support-Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR), combined with a different number of features used, more precisely, 945 and 1890. When analyzing all metrics, model LR-1 performed the best for classifying relevant documents with an accuracy score of 0.8461, a recall score of 0.6170, an f1-score of 0.6904, and a precision score of 0.7837. This model is the best at correctly predicting the relevant documents, as proven by the higher recall score compared to the other models. With this model, our database was populated with relevant information. Our backend has a unique feature, the aggregation feature constructed with Named Entity Recognition (NER). The goal is to identify specific entity types, in our case, it identifies CHEMICAL and DISEASE. An association between these entities was made, thus delivering the user the respective associations, saving researchers time. For example, a researcher can see with which compounds "pseudomonas aeruginosa" has already been tested thanks to this aggregation feature. The frontend was implemented so the user could access this aggregation feature, see the articles present in the database, use the machine learning models to classify new documents, and insert them in the database if they are relevant.Ao longo das últimas décadas, as bactérias tornaram-se cada vez mais resistentes aos antibióticos disponíveis, levando a uma crescente necessidade de novos antibióticos e novas metodologias de desenvolvimento de medicamentos. Nos últimos 40 anos, não há registos do desenvolvimento de novos antibióticos, o que começa a reduzir as alternativas possíveis. Portanto, criar novos antibióticos e torna-los disponíveis no mercado é cada vez mais desafiante. Uma abordagem é a descoberta de compostos que restaurem ou potencializem a atividade dos antibióticos existentes contra bactérias multirresistentes. Como as informações neste campo são muito limitadas e não há uma base de dados sobre este tema, modelos de Machine Learning foram utilizados para prever a relevância dos documentos acerca dos adjuvantes. Neste projeto, foi desenvolvida a aplicação BIOFILMad - Catalog of antimicrobial adjuvants to tackle biofilms para ajudar os investigadores a economizar tempo nas suas pesquisas. Esta aplicação foi construída usando o Django e Django REST Framework para o backend e React para o frontend. Quanto ao backend, foi necessário construir uma base de dados, pois não existe nenhuma que se concentre inteiramente neste tópico. Para isso, foi treinado um modelo machine learning para nos ajudar a classificar os artigos. Três algoritmos diferentes foram usados: Support-Vector Machine (SVM), Random Forest (RF) e Logistic Regression (LR), combinados com um número diferente de features, mais precisamente, 945 e 1890. Ao analisar todas as métricas, o modelo LR-1 teve o melhor desempenho para classificar artigos relevantes com uma accuracy de 0,8461, um recall de 0,6170, um f1-score de 0,6904 e uma precision de 0,7837. Este modelo foi o melhor a prever corretamente os artigos relevantes, comprovado pelo alto recall em comparação com os outros modelos. Com este modelo, a base de dados foi populda com informação relevante. O backend apresenta uma caracteristica particular, a agregação construída com Named-Entity-Recognition (NER). O objetivo é identificar tipos específicos de entidades, no nosso caso, identifica QUÍMICOS e DOENÇAS. Esta classificação serviu para formar associações entre entidades, demonstrando ao utilizador as respetivas associações feitas, permitindo economizar o tempo dos investigadores. Por exemplo, um investigador pode ver com quais compostos a "pseudomonas aeruginosa" já foi testada graças à funcionalidade de agregação. O frontend foi implementado para que o utilizador possa ter acesso a esta funcionalidade de agregação, ver os artigos presentes na base de dados, utilizar o modelo de machine learning para classificar novos artigos e inseri-los na base de dados caso sejam relevantes

    Architectures for the Future Networks and the Next Generation Internet: A Survey

    Get PDF
    Networking research funding agencies in the USA, Europe, Japan, and other countries are encouraging research on revolutionary networking architectures that may or may not be bound by the restrictions of the current TCP/IP based Internet. We present a comprehensive survey of such research projects and activities. The topics covered include various testbeds for experimentations for new architectures, new security mechanisms, content delivery mechanisms, management and control frameworks, service architectures, and routing mechanisms. Delay/Disruption tolerant networks, which allow communications even when complete end-to-end path is not available, are also discussed

    Artificial Intelligence in the development of modern infrastructures

    Get PDF
    Artificial intelligence (AI) makes it possible for machines to learn from experience, adjust to new inputs and perform tasks as human beings. Most of the examples of AI you hear about today - from computers playing chess to autonomous driving cars - rely heavily on deep learning and natural language processing
    corecore