2,925 research outputs found
Using online linear classifiers to filter spam Emails
The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering
Social Machinery and Intelligence
Social machines are systems formed by technical and human elements interacting in a
structured manner. The use of digital platforms as mediators allows large numbers of human participants to join such mechanisms, creating systems where interconnected digital and human components operate as a single machine capable of highly sophisticated behaviour. Under certain conditions, such systems can be described as autonomous and goal-driven agents. Many examples of modern Artificial Intelligence (AI) can be regarded as instances of this class of mechanisms. We argue that this type of autonomous social machines has provided a new paradigm for the design of intelligent systems marking a new phase in the field of AI. The consequences of this observation range from methodological, philosophical to ethical. On the one side, it emphasises the role of Human-Computer Interaction in the design of intelligent systems, while on the other side it draws attention to both the risks for a human being and those for a society relying on mechanisms that are not necessarily controllable. The difficulty by companies in regulating the spread of misinformation, as well as those by authorities to protect task-workers managed by a software infrastructure, could be just some of the effects of this technological paradigm
Last mile delivery
Last mile delivery is one of the most complex processes in the whole logistics process. This is because it involves many uncertainties, such as weather conditions, road conditions, traffic, car accidents, delivery vehicle anomalies, choice of route, avoiding parcel damage and delivery errors, and communication with the retailer or the recipient of the parcel; all this makes the successful delivery of parcels at the customers’ doorstep difficult. In addition, today’s consumers have much greater expectations regarding delivery services, they demand to receive their parcels much faster or be able to choose the time and place of delivery. All this increases the cost of last mile delivery, accounting for 40% of overall supply chain costs.
E-commerce giants such as Amazon can invest a large number of resources into creating optimal last mile delivery solutions, establish numerous warehouses throughout countries which enable them to store the parcels as close to the end user as possible. However, companies that do not have as many resources may find it difficult to satisfy the delivery expectations of their customers; longer and inflexible waiting times, as well as additional payment for delivery may cause companies to quickly lose competitiveness on the market. This means that companies must turn to technological solutions that are going to help them to improve their last mile delivery effectively but at a reasonably low price.
Big Data are the basis of all smart solutions. This is because collecting large amounts of data makes it possible to extract information and make future predictions on the basis of past patterns
KeyForge: Mitigating Email Breaches with Forward-Forgeable Signatures
Email breaches are commonplace, and they expose a wealth of personal,
business, and political data that may have devastating consequences. The
current email system allows any attacker who gains access to your email to
prove the authenticity of the stolen messages to third parties -- a property
arising from a necessary anti-spam / anti-spoofing protocol called DKIM. This
exacerbates the problem of email breaches by greatly increasing the potential
for attackers to damage the users' reputation, blackmail them, or sell the
stolen information to third parties.
In this paper, we introduce "non-attributable email", which guarantees that a
wide class of adversaries are unable to convince any third party of the
authenticity of stolen emails. We formally define non-attributability, and
present two practical system proposals -- KeyForge and TimeForge -- that
provably achieve non-attributability while maintaining the important protection
against spam and spoofing that is currently provided by DKIM. Moreover, we
implement KeyForge and demonstrate that that scheme is practical, achieving
competitive verification and signing speed while also requiring 42% less
bandwidth per email than RSA2048
Addressing the new generation of spam (Spam 2.0) through Web usage models
New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem
Development of a framework for the classification of antibiotics adjuvants
Dissertação de mestrado em BioInformaticsThroughout the last decades, bacteria have become increasingly resistant to available
antibiotics, leading to a growing need for new antibiotics and new drug development
methodologies. In the last 40 years, there are no records of the development of new
antibiotics, which has begun to shorten possible alternatives. Therefore, finding new
antibiotics and bringing them to market is increasingly challenging. One approach is finding
compounds that restore or leverage the activity of existing antibiotics against biofilm bacteria.
As the information in this field is very limited and there is no database regarding this theme,
machine learning models were used to predict the relevance of the documents regarding
adjuvants.
In this project, the BIOFILMad - Catalog of antimicrobial adjuvants to tackle biofilms
application was developed to help researchers save time in their daily research. This
application was constructed using Django and Django REST Framework for the backend
and React for the frontend.
As for the backend, a database needed to be constructed since no database entirely
focuses on this topic. For that, a machine learning model was trained to help us classify
articles. Three different algorithms were used, Support-Vector Machine (SVM), Random
Forest (RF), and Logistic Regression (LR), combined with a different number of features
used, more precisely, 945 and 1890. When analyzing all metrics, model LR-1 performed
the best for classifying relevant documents with an accuracy score of 0.8461, a recall score
of 0.6170, an f1-score of 0.6904, and a precision score of 0.7837. This model is the best at
correctly predicting the relevant documents, as proven by the higher recall score compared
to the other models. With this model, our database was populated with relevant information.
Our backend has a unique feature, the aggregation feature constructed with Named
Entity Recognition (NER). The goal is to identify specific entity types, in our case, it identifies CHEMICAL and DISEASE. An association between these entities was made, thus delivering
the user the respective associations, saving researchers time. For example, a researcher can
see with which compounds "pseudomonas aeruginosa" has already been tested thanks to this
aggregation feature.
The frontend was implemented so the user could access this aggregation feature, see the
articles present in the database, use the machine learning models to classify new documents,
and insert them in the database if they are relevant.Ao longo das últimas décadas, as bactérias tornaram-se cada vez mais resistentes aos
antibióticos disponíveis, levando a uma crescente necessidade de novos antibióticos e novas
metodologias de desenvolvimento de medicamentos. Nos últimos 40 anos, não há registos
do desenvolvimento de novos antibióticos, o que começa a reduzir as alternativas possíveis.
Portanto, criar novos antibióticos e torna-los disponíveis no mercado é cada vez mais
desafiante. Uma abordagem é a descoberta de compostos que restaurem ou potencializem a
atividade dos antibióticos existentes contra bactérias multirresistentes. Como as informações
neste campo são muito limitadas e não há uma base de dados sobre este tema, modelos
de Machine Learning foram utilizados para prever a relevância dos documentos acerca dos
adjuvantes.
Neste projeto, foi desenvolvida a aplicação BIOFILMad - Catalog of antimicrobial adjuvants
to tackle biofilms para ajudar os investigadores a economizar tempo nas suas pesquisas. Esta
aplicação foi construída usando o Django e Django REST Framework para o backend e React
para o frontend.
Quanto ao backend, foi necessário construir uma base de dados, pois não existe nenhuma
que se concentre inteiramente neste tópico. Para isso, foi treinado um modelo machine
learning para nos ajudar a classificar os artigos. Três algoritmos diferentes foram usados:
Support-Vector Machine (SVM), Random Forest (RF) e Logistic Regression (LR), combinados
com um número diferente de features, mais precisamente, 945 e 1890. Ao analisar todas as
métricas, o modelo LR-1 teve o melhor desempenho para classificar artigos relevantes com
uma accuracy de 0,8461, um recall de 0,6170, um f1-score de 0,6904 e uma precision de 0,7837.
Este modelo foi o melhor a prever corretamente os artigos relevantes, comprovado pelo
alto recall em comparação com os outros modelos. Com este modelo, a base de dados foi
populda com informação relevante.
O backend apresenta uma caracteristica particular, a agregação construída com Named-Entity-Recognition (NER). O objetivo é identificar tipos específicos de entidades, no nosso
caso, identifica QUÍMICOS e DOENÇAS. Esta classificação serviu para formar associações
entre entidades, demonstrando ao utilizador as respetivas associações feitas, permitindo
economizar o tempo dos investigadores. Por exemplo, um investigador pode ver com quais
compostos a "pseudomonas aeruginosa" já foi testada graças à funcionalidade de agregação.
O frontend foi implementado para que o utilizador possa ter acesso a esta
funcionalidade de agregação, ver os artigos presentes na base de dados, utilizar o modelo
de machine learning para classificar novos artigos e inseri-los na base de dados caso sejam
relevantes
Architectures for the Future Networks and the Next Generation Internet: A Survey
Networking research funding agencies in the USA, Europe, Japan, and other countries are encouraging research on revolutionary networking architectures that may or may not be bound by the restrictions of the current TCP/IP based Internet. We present a comprehensive survey of such research projects and activities. The topics covered include various testbeds for experimentations for new architectures, new security mechanisms, content delivery mechanisms, management and control frameworks, service architectures, and routing mechanisms. Delay/Disruption tolerant networks, which allow communications even when complete end-to-end path is not available, are also discussed
Artificial Intelligence in the development of modern infrastructures
Artificial intelligence (AI) makes it possible for machines to learn from experience, adjust to new inputs and perform tasks as human beings. Most of the examples of AI you hear about today - from computers playing chess to autonomous driving cars - rely heavily on deep learning and natural language processing
- …