14 research outputs found
Towards active seo (search engine optimization) 2.0
In the age of writable web, new skills and new practices are appearing. In an environment that allows everyone to communicate information globally, internet referencing (or SEO) is a strategic discipline that aims to generate visibility, internet traffic and a maximum exploitation of sites publications. Often misperceived as a fraud, SEO has evolved to be a facilitating tool for anyone who wishes to reference their website with search engines. In this article we show that it is possible to achieve the first rank in search results of keywords that are very competitive. We show methods that are quick, sustainable and legal; while applying the principles of active SEO 2.0. This article also clarifies some working functions of search engines, some advanced referencing techniques (that are completely ethical and legal) and we lay the foundations for an in depth reflection on the qualities and advantages of these techniques
Fake News Detection with Deep Diffusive Network Model
In recent years, due to the booming development of online social networks,
fake news for various commercial and political purposes has been appearing in
large numbers and widespread in the online world. With deceptive words, online
social network users can get infected by these online fake news easily, which
has brought about tremendous effects on the offline society already. An
important goal in improving the trustworthiness of information in online social
networks is to identify the fake news timely. This paper aims at investigating
the principles, methodologies and algorithms for detecting fake news articles,
creators and subjects from online social networks and evaluating the
corresponding performance. This paper addresses the challenges introduced by
the unknown characteristics of fake news and diverse connections among news
articles, creators and subjects. Based on a detailed data analysis, this paper
introduces a novel automatic fake news credibility inference model, namely
FakeDetector. Based on a set of explicit and latent features extracted from the
textual information, FakeDetector builds a deep diffusive network model to
learn the representations of news articles, creators and subjects
simultaneously. Extensive experiments have been done on a real-world fake news
dataset to compare FakeDetector with several state-of-the-art models, and the
experimental results have demonstrated the effectiveness of the proposed model
A pipeline and comparative study of 12 machine learning models for text classification
Text-based communication is highly favoured as a communication method,
especially in business environments. As a result, it is often abused by sending
malicious messages, e.g., spam emails, to deceive users into relaying personal
information, including online accounts credentials or banking details. For this
reason, many machine learning methods for text classification have been
proposed and incorporated into the services of most email providers. However,
optimising text classification algorithms and finding the right tradeoff on
their aggressiveness is still a major research problem.
We present an updated survey of 12 machine learning text classifiers applied
to a public spam corpus. A new pipeline is proposed to optimise hyperparameter
selection and improve the models' performance by applying specific methods
(based on natural language processing) in the preprocessing stage.
Our study aims to provide a new methodology to investigate and optimise the
effect of different feature sizes and hyperparameters in machine learning
classifiers that are widely used in text classification problems. The
classifiers are tested and evaluated on different metrics including F-score
(accuracy), precision, recall, and run time. By analysing all these aspects, we
show how the proposed pipeline can be used to achieve a good accuracy towards
spam filtering on the Enron dataset, a widely used public email corpus.
Statistical tests and explainability techniques are applied to provide a robust
analysis of the proposed pipeline and interpret the classification outcomes of
the 12 machine learning models, also identifying words that drive the
classification results. Our analysis shows that it is possible to identify an
effective machine learning model to classify the Enron dataset with an F-score
of 94%.Comment: This article has been accepted for publication in Expert Systems with
Applications, April 2022. Published by Elsevier. All data, models, and code
used in this work are available on GitHub at
https://github.com/Angione-Lab/12-machine-learning-models-for-text-classificatio
Mustahattuhakukoneoptimointi
Hakukoneoptimoinnin tarkoituksena on lisätä verkkosivun näkyvyyttä hakukoneiden tulossivuilla. Mustahattuhakukoneoptimointi on hakukoneyhtiöiden laatimien ohjesääntöjen vastaisten hakukoneoptimointimenetelmien hyödyntämistä. Tämän tutkielman tavoitteena on selvittää kirjallisuuskatsauksen pohjalta, mitä mustahattuhakukoneoptimoinnin menetelmiä ja vastatoimia tieteellisessä kirjallisuudessa on tutkittu. Tutkimusstrategiaksi valittiin kirjallisuuskatsaus, jotta tutkittava informaatio perustuisi tieteelliseen aineistoon. Kirjallisuuskatsauksen systemaattinen haku suoritettiin tietyin hakuehdoin neljään tietojenkäsittelytieteiden alan tietokantaan, minkä lisäksi suoritettiin täydentäviä lisähakuja kahteen muuhun tietokantaan. Tutkielman tavoitteena oli sisällyttää kirjallisuuskatsaukseen vähintään neljäkymmentä lähdettä sopivan laajuuden saavuttamiseksi. Aineisto valikoitui relevanssiin perustuen, joka arvioitiin tutkimalla artikkeleiden tiivistelmä ja tekemällä yleiskatsaus artikkeleiden sisältöön. Mustahattumenetelmät jaetaan sivun sisäisiin menetelmiin, linkkiperustaisiin menetelmiin ja muihin menetelmiin sekä tehostusmenetelmiin ja piilotusmenetelmiin. Lisäksi mustahattumenetelmiä voidaan hyödyntää verkkosivun hakutulossijoituksen tahalliseen alentamiseen. Liiallinen avainsanojen käyttö ja cloaking-menetelmä esiintyvät usein kirjallisuudessa. Mustahattuhakukoneoptimointi kuluttaa hakukoneiden resursseja, aiheuttaa verkkohakutuloksien laadun heikkenemistä ja voi edistää haitallisen verkkosisällön leviämistä. Hakukoneyhtiöt voivat antaa mustahattumenetelmiä hyödyntävälle verkkosivulle varoituksen, heikentää verkkosivun hakutulossijoitusta, tai poistaa verkkosivun hakuindeksistä. Mustahattuhakukoneoptimointia hyödyntävän verkkosivun toiminta voidaan pyrkiä lopettamaan myös oikeusteitse. Automaattiset menetelmät tehostavat mustahattuhakukoneoptimointia hyödyntävien verkkosivujen havainnointia. Mustahattumenetelmien kehittyessä myös vastatoimien on kehityttävä, jotta mustahattuhakukoneoptimoinnin vaikutuksia voidaan vähentää
Improving Web Spam Classification using Rank-time Features
In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the method of dataset construction is crucial for accurate spam classification and we note that this problem occurs generally in learning problems and can be hard to detect. In particular, we find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In our case, classification performance can differ by as much as 40 % in precision when using non-domain-separated data. Second, we show ranktime features can improve the performance of a web spam classifier. Our paper is the first to investigate the use of rank-time features, and in particular query-dependent ranktime features, for web spam detection. We show that the use of rank-time and query-dependent features can lead to an increase in accuracy over a classifier trained using page-based content only
Avaliação das técnicas de otimização para motores de busca
Dissertação de mestrado em Sistemas de InformaçãoOs motores de busca são ferramentas indispensáveis na pesquisa de informação na
web. A sua finalidade é fornecer resultados relevantes aos seus utilizadores de modo que
fiquem satisfeitos. Uma das formas de o conseguirem é através da indexação, que lhes
permite guardar informação útil. No entanto, devido à quantidade exorbitante de sites na
web, as suas políticas de indexação mudam periodicamente. A forma de como eles
operam exatamente não é conhecida e mantém-se como um segredo de negócio, uma
vez que as páginas web listadas nos seus resultados de busca podem ficar banidas de um
dia para outro. A esse processo de se posicionar nos motores de busca chama-se SEO
(Search Engine Optimization), que é um conjunto de técnicas de otimização que os sites
usam para conseguir visibilidade nos motores de busca.
O trabalho tem como objeto de estudo a avaliação das técnicas SEO correntes,
White Hat SEO e Black Hat SEO. No entanto a avaliação está focada em White Hat
SEO por estas serem as técnicas aceites pelos motores de busca. A avaliação inclui
análises de três sites com diferentes tópicos e a sua implementação das técnicas SEO. O
diagnóstico inicial foi feito mediante ferramentas para webmasters, que permitiram
obter os fatores que impediam a visibilidade nas SERP (Search Engine Results Page).
Em relação à implementação das técnicas, foram usadas ferramentas da análise, com os
quais se monitorizou o comportamento de cada um deles. A parte complementar da
avaliação foi a criação de um site (site de teste) baseado nas orientações dos motores de
busca mais conhecidos. Uma comparação entre as técnicas SEO implementadas nos
“sites analisados” e no “site de teste” determinou a eficácia das mesmas.
É de notar que uma listagem das técnicas SEO mais eficazes permite aos
proprietários de sites criar páginas web de qualidade, com códigos simples, interface
amigável e conteúdo otimizado, ajudando os motores de busca na indexação, e os
utilizadores na busca de informação.
Palavras-chave: Otimização para motores de busca, otimização de sites, técnicas SEO,
avaliação das técnicas SEO, black hat SEO, white hat SEO, ferramentas analytics,
ferramentas para webmasters.Search engines are essential tools when it comes to searching for information on
the web. Its purpose is to provide relevant results to its users so that they are satisfied.
One of the ways to achieve it is through indexing, which allows them to store useful
information. However, due to the exorbitant amount of websites on the web, indexing
policies change periodically. Their exact way of operation is unknown and remains a
business secret, since the web pages listed in their search results may be banned from
one day to another. The process of positioning in the search engines is called SEO
(English, Search Engine Optimization), which is a set of optimization techniques that
websites use in order to get visibility in search engines.
The study's goal is assessing the current SEO techniques, White Hat SEO and
Black Hat SEO. However the assessment is focused on White Hat SEO, due to the fact
that they are accepted by search engines. The assessment includes analyzes of three sites
with different topics and implementation of SEO techniques. The initial diagnosis was
made through webmaster tools, which allowed getting the factors that obstructed
visibility in SERP (Search Engine Results Page). Regarding the implementation of the
techniques, analysis tools were used, which monitored the behavior of each of them.
The complementary part of the evaluation was to create a site (test site) based on the
guidelines of popularity search engines. A comparison between SEO techniques
implemented in "sites analyzed" and "test site" determined their effectiveness.
It is noted that a listing of the most effective SEO techniques allows website
owners to create quality websites, with simple codes, a user-friendly interface and an
optimized content, benefiting the search engines in the indexing and users in finding
information.
Keywords: Search Engine Optimization, SEO, SEO techniques, SEO techniques
evaluation, black hat SEO, white hat SEO, analytics tools, webmaster tools
Improving Web Spam Classification using Rank-time Features ABSTRACT
In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the method of dataset construction is crucial for accurate spam classification and we note that this problem occurs generally in learning problems and can be hard to detect. In particular, we find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In our case, classification performance can differ by as much as 40 % in precision when using non-domain-separated data. Second, we show ranktime features can improve the performance of a web spam classifier. Our paper is the first to investigate the use of rank-time features, and in particular query-dependent ranktime features, for web spam detection. We show that the use of rank-time and query-dependent features can lead to an increase in accuracy over a classifier trained using page-based content only. Categories and Subject Descriptor