Search CORE

14 research outputs found

Towards active seo (search engine optimization) 2.0

Author: Boutet Charles-Victor
Quoniam Luc
Publication venue: Universidade de São Paulo. Faculdade de Economia, Administração e Contabilidade
Publication date: 01/12/2012
Field of study

In the age of writable web, new skills and new practices are appearing. In an environment that allows everyone to communicate information globally, internet referencing (or SEO) is a strategic discipline that aims to generate visibility, internet traffic and a maximum exploitation of sites publications. Often misperceived as a fraud, SEO has evolved to be a facilitating tool for anyone who wishes to reference their website with search engines. In this article we show that it is possible to achieve the first rank in search results of keywords that are very competitive. We show methods that are quick, sustainable and legal; while applying the principles of active SEO 2.0. This article also clarifies some working functions of search engines, some advanced referencing techniques (that are completely ethical and legal) and we lay the foundations for an in depth reflection on the qualities and advantages of these techniques

Cadernos Espinosanos (E-Journal)

Fake News Detection with Deep Diffusive Network Model

Author: Zhang Jiawei
Cui Limeng
Fu Yanjie
Gouza Fisher B.
Publication venue
Publication date: 22/05/2018
Field of study

In recent years, due to the booming development of online social networks, fake news for various commercial and political purposes has been appearing in large numbers and widespread in the online world. With deceptive words, online social network users can get infected by these online fake news easily, which has brought about tremendous effects on the offline society already. An important goal in improving the trustworthiness of information in online social networks is to identify the fake news timely. This paper aims at investigating the principles, methodologies and algorithms for detecting fake news articles, creators and subjects from online social networks and evaluating the corresponding performance. This paper addresses the challenges introduced by the unknown characteristics of fake news and diverse connections among news articles, creators and subjects. Based on a detailed data analysis, this paper introduces a novel automatic fake news credibility inference model, namely FakeDetector. Based on a set of explicit and latent features extracted from the textual information, FakeDetector builds a deep diffusive network model to learn the representations of news articles, creators and subjects simultaneously. Extensive experiments have been done on a real-world fake news dataset to compare FakeDetector with several state-of-the-art models, and the experimental results have demonstrated the effectiveness of the proposed model

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

A pipeline and comparative study of 12 machine learning models for text classification

Author: Angione Claudio
Occhipinti Annalisa
Rogers Louis
Publication venue
Publication date: 04/04/2022
Field of study

Text-based communication is highly favoured as a communication method, especially in business environments. As a result, it is often abused by sending malicious messages, e.g., spam emails, to deceive users into relaying personal information, including online accounts credentials or banking details. For this reason, many machine learning methods for text classification have been proposed and incorporated into the services of most email providers. However, optimising text classification algorithms and finding the right tradeoff on their aggressiveness is still a major research problem. We present an updated survey of 12 machine learning text classifiers applied to a public spam corpus. A new pipeline is proposed to optimise hyperparameter selection and improve the models' performance by applying specific methods (based on natural language processing) in the preprocessing stage. Our study aims to provide a new methodology to investigate and optimise the effect of different feature sizes and hyperparameters in machine learning classifiers that are widely used in text classification problems. The classifiers are tested and evaluated on different metrics including F-score (accuracy), precision, recall, and run time. By analysing all these aspects, we show how the proposed pipeline can be used to achieve a good accuracy towards spam filtering on the Enron dataset, a widely used public email corpus. Statistical tests and explainability techniques are applied to provide a robust analysis of the proposed pipeline and interpret the classification outcomes of the 12 machine learning models, also identifying words that drive the classification results. Our analysis shows that it is possible to identify an effective machine learning model to classify the Enron dataset with an F-score of 94%.Comment: This article has been accepted for publication in Expert Systems with Applications, April 2022. Published by Elsevier. All data, models, and code used in this work are available on GitHub at https://github.com/Angione-Lab/12-machine-learning-models-for-text-classificatio

arXiv.org e-Print Archive

Teeside University's Research Repository

A pipeline and comparative study of 12 machine learning models for text classification

Author: Angione Claudio
Occhipinti Annalisa
Rogers Louis
Publication venue
Publication date: 06/04/2022
Field of study

Teeside University's Research Repository

Mustahattuhakukoneoptimointi

Author: Piili Tommi
Publication venue
Publication date: 02/05/2023
Field of study

Hakukoneoptimoinnin tarkoituksena on lisätä verkkosivun näkyvyyttä hakukoneiden tulossivuilla. Mustahattuhakukoneoptimointi on hakukoneyhtiöiden laatimien ohjesääntöjen vastaisten hakukoneoptimointimenetelmien hyödyntämistä. Tämän tutkielman tavoitteena on selvittää kirjallisuuskatsauksen pohjalta, mitä mustahattuhakukoneoptimoinnin menetelmiä ja vastatoimia tieteellisessä kirjallisuudessa on tutkittu. Tutkimusstrategiaksi valittiin kirjallisuuskatsaus, jotta tutkittava informaatio perustuisi tieteelliseen aineistoon. Kirjallisuuskatsauksen systemaattinen haku suoritettiin tietyin hakuehdoin neljään tietojenkäsittelytieteiden alan tietokantaan, minkä lisäksi suoritettiin täydentäviä lisähakuja kahteen muuhun tietokantaan. Tutkielman tavoitteena oli sisällyttää kirjallisuuskatsaukseen vähintään neljäkymmentä lähdettä sopivan laajuuden saavuttamiseksi. Aineisto valikoitui relevanssiin perustuen, joka arvioitiin tutkimalla artikkeleiden tiivistelmä ja tekemällä yleiskatsaus artikkeleiden sisältöön. Mustahattumenetelmät jaetaan sivun sisäisiin menetelmiin, linkkiperustaisiin menetelmiin ja muihin menetelmiin sekä tehostusmenetelmiin ja piilotusmenetelmiin. Lisäksi mustahattumenetelmiä voidaan hyödyntää verkkosivun hakutulossijoituksen tahalliseen alentamiseen. Liiallinen avainsanojen käyttö ja cloaking-menetelmä esiintyvät usein kirjallisuudessa. Mustahattuhakukoneoptimointi kuluttaa hakukoneiden resursseja, aiheuttaa verkkohakutuloksien laadun heikkenemistä ja voi edistää haitallisen verkkosisällön leviämistä. Hakukoneyhtiöt voivat antaa mustahattumenetelmiä hyödyntävälle verkkosivulle varoituksen, heikentää verkkosivun hakutulossijoitusta, tai poistaa verkkosivun hakuindeksistä. Mustahattuhakukoneoptimointia hyödyntävän verkkosivun toiminta voidaan pyrkiä lopettamaan myös oikeusteitse. Automaattiset menetelmät tehostavat mustahattuhakukoneoptimointia hyödyntävien verkkosivujen havainnointia. Mustahattumenetelmien kehittyessä myös vastatoimien on kehityttävä, jotta mustahattuhakukoneoptimoinnin vaikutuksia voidaan vähentää

Trepo - Institutional Repository of Tampere University

Improving Web Spam Classification using Rank-time Features

Author: Aaswath Raman
Chris J. C. Burges
Krysta M. Svore
Qiang Wu
Publication venue
Publication date: 01/01/2007
Field of study

In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the method of dataset construction is crucial for accurate spam classification and we note that this problem occurs generally in learning problems and can be hard to detect. In particular, we find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In our case, classification performance can differ by as much as 40 % in precision when using non-domain-separated data. Second, we show ranktime features can improve the performance of a web spam classifier. Our paper is the first to investigate the use of rank-time features, and in particular query-dependent ranktime features, for web spam detection. We show that the use of rank-time and query-dependent features can lead to an increase in accuracy over a classifier trained using page-based content only

CiteSeerX

Crossref

Avaliação das técnicas de otimização para motores de busca

Author: Quintana Alexandra Diana Ccoicca
Publication venue
Publication date: 01/01/2012
Field of study

Dissertação de mestrado em Sistemas de InformaçãoOs motores de busca são ferramentas indispensáveis na pesquisa de informação na web. A sua finalidade é fornecer resultados relevantes aos seus utilizadores de modo que fiquem satisfeitos. Uma das formas de o conseguirem é através da indexação, que lhes permite guardar informação útil. No entanto, devido à quantidade exorbitante de sites na web, as suas políticas de indexação mudam periodicamente. A forma de como eles operam exatamente não é conhecida e mantém-se como um segredo de negócio, uma vez que as páginas web listadas nos seus resultados de busca podem ficar banidas de um dia para outro. A esse processo de se posicionar nos motores de busca chama-se SEO (Search Engine Optimization), que é um conjunto de técnicas de otimização que os sites usam para conseguir visibilidade nos motores de busca. O trabalho tem como objeto de estudo a avaliação das técnicas SEO correntes, White Hat SEO e Black Hat SEO. No entanto a avaliação está focada em White Hat SEO por estas serem as técnicas aceites pelos motores de busca. A avaliação inclui análises de três sites com diferentes tópicos e a sua implementação das técnicas SEO. O diagnóstico inicial foi feito mediante ferramentas para webmasters, que permitiram obter os fatores que impediam a visibilidade nas SERP (Search Engine Results Page). Em relação à implementação das técnicas, foram usadas ferramentas da análise, com os quais se monitorizou o comportamento de cada um deles. A parte complementar da avaliação foi a criação de um site (site de teste) baseado nas orientações dos motores de busca mais conhecidos. Uma comparação entre as técnicas SEO implementadas nos “sites analisados” e no “site de teste” determinou a eficácia das mesmas. É de notar que uma listagem das técnicas SEO mais eficazes permite aos proprietários de sites criar páginas web de qualidade, com códigos simples, interface amigável e conteúdo otimizado, ajudando os motores de busca na indexação, e os utilizadores na busca de informação. Palavras-chave: Otimização para motores de busca, otimização de sites, técnicas SEO, avaliação das técnicas SEO, black hat SEO, white hat SEO, ferramentas analytics, ferramentas para webmasters.Search engines are essential tools when it comes to searching for information on the web. Its purpose is to provide relevant results to its users so that they are satisfied. One of the ways to achieve it is through indexing, which allows them to store useful information. However, due to the exorbitant amount of websites on the web, indexing policies change periodically. Their exact way of operation is unknown and remains a business secret, since the web pages listed in their search results may be banned from one day to another. The process of positioning in the search engines is called SEO (English, Search Engine Optimization), which is a set of optimization techniques that websites use in order to get visibility in search engines. The study's goal is assessing the current SEO techniques, White Hat SEO and Black Hat SEO. However the assessment is focused on White Hat SEO, due to the fact that they are accepted by search engines. The assessment includes analyzes of three sites with different topics and implementation of SEO techniques. The initial diagnosis was made through webmaster tools, which allowed getting the factors that obstructed visibility in SERP (Search Engine Results Page). Regarding the implementation of the techniques, analysis tools were used, which monitored the behavior of each of them. The complementary part of the evaluation was to create a site (test site) based on the guidelines of popularity search engines. A comparison between SEO techniques implemented in "sites analyzed" and "test site" determined their effectiveness. It is noted that a listing of the most effective SEO techniques allows website owners to create quality websites, with simple codes, a user-friendly interface and an optimized content, benefiting the search engines in the indexing and users in finding information. Keywords: Search Engine Optimization, SEO, SEO techniques, SEO techniques evaluation, black hat SEO, white hat SEO, analytics tools, webmaster tools

Universidade do Minho: RepositoriUM

Improving Web Spam Classification using Rank-time Features ABSTRACT

Author: Aaswath Raman
Krysta M. Svore
Qiang Wu
Publication venue
Publication date
Field of study

CiteSeerX