14 research outputs found

    Towards active seo (search engine optimization) 2.0

    Get PDF
    In the age of writable web, new skills and new practices are appearing. In an environment that allows everyone to communicate information globally, internet referencing (or SEO) is a strategic discipline that aims to generate visibility, internet traffic and a maximum exploitation of sites publications. Often misperceived as a fraud, SEO has evolved to be a facilitating tool for anyone who wishes to reference their website with search engines. In this article we show that it is possible to achieve the first rank in search results of keywords that are very competitive. We show methods that are quick, sustainable and legal; while applying the principles of active SEO 2.0. This article also clarifies some working functions of search engines, some advanced referencing techniques (that are completely ethical and legal) and we lay the foundations for an in depth reflection on the qualities and advantages of these techniques

    Fake News Detection with Deep Diffusive Network Model

    Get PDF
    In recent years, due to the booming development of online social networks, fake news for various commercial and political purposes has been appearing in large numbers and widespread in the online world. With deceptive words, online social network users can get infected by these online fake news easily, which has brought about tremendous effects on the offline society already. An important goal in improving the trustworthiness of information in online social networks is to identify the fake news timely. This paper aims at investigating the principles, methodologies and algorithms for detecting fake news articles, creators and subjects from online social networks and evaluating the corresponding performance. This paper addresses the challenges introduced by the unknown characteristics of fake news and diverse connections among news articles, creators and subjects. Based on a detailed data analysis, this paper introduces a novel automatic fake news credibility inference model, namely FakeDetector. Based on a set of explicit and latent features extracted from the textual information, FakeDetector builds a deep diffusive network model to learn the representations of news articles, creators and subjects simultaneously. Extensive experiments have been done on a real-world fake news dataset to compare FakeDetector with several state-of-the-art models, and the experimental results have demonstrated the effectiveness of the proposed model

    A pipeline and comparative study of 12 machine learning models for text classification

    Get PDF
    Text-based communication is highly favoured as a communication method, especially in business environments. As a result, it is often abused by sending malicious messages, e.g., spam emails, to deceive users into relaying personal information, including online accounts credentials or banking details. For this reason, many machine learning methods for text classification have been proposed and incorporated into the services of most email providers. However, optimising text classification algorithms and finding the right tradeoff on their aggressiveness is still a major research problem. We present an updated survey of 12 machine learning text classifiers applied to a public spam corpus. A new pipeline is proposed to optimise hyperparameter selection and improve the models' performance by applying specific methods (based on natural language processing) in the preprocessing stage. Our study aims to provide a new methodology to investigate and optimise the effect of different feature sizes and hyperparameters in machine learning classifiers that are widely used in text classification problems. The classifiers are tested and evaluated on different metrics including F-score (accuracy), precision, recall, and run time. By analysing all these aspects, we show how the proposed pipeline can be used to achieve a good accuracy towards spam filtering on the Enron dataset, a widely used public email corpus. Statistical tests and explainability techniques are applied to provide a robust analysis of the proposed pipeline and interpret the classification outcomes of the 12 machine learning models, also identifying words that drive the classification results. Our analysis shows that it is possible to identify an effective machine learning model to classify the Enron dataset with an F-score of 94%.Comment: This article has been accepted for publication in Expert Systems with Applications, April 2022. Published by Elsevier. All data, models, and code used in this work are available on GitHub at https://github.com/Angione-Lab/12-machine-learning-models-for-text-classificatio

    Mustahattuhakukoneoptimointi

    Get PDF
    Hakukoneoptimoinnin tarkoituksena on lisätä verkkosivun näkyvyyttä hakukoneiden tulossivuilla. Mustahattuhakukoneoptimointi on hakukoneyhtiöiden laatimien ohjesääntöjen vastaisten hakukoneoptimointimenetelmien hyödyntämistä. Tämän tutkielman tavoitteena on selvittää kirjallisuuskatsauksen pohjalta, mitä mustahattuhakukoneoptimoinnin menetelmiä ja vastatoimia tieteellisessä kirjallisuudessa on tutkittu. Tutkimusstrategiaksi valittiin kirjallisuuskatsaus, jotta tutkittava informaatio perustuisi tieteelliseen aineistoon. Kirjallisuuskatsauksen systemaattinen haku suoritettiin tietyin hakuehdoin neljään tietojenkäsittelytieteiden alan tietokantaan, minkä lisäksi suoritettiin täydentäviä lisähakuja kahteen muuhun tietokantaan. Tutkielman tavoitteena oli sisällyttää kirjallisuuskatsaukseen vähintään neljäkymmentä lähdettä sopivan laajuuden saavuttamiseksi. Aineisto valikoitui relevanssiin perustuen, joka arvioitiin tutkimalla artikkeleiden tiivistelmä ja tekemällä yleiskatsaus artikkeleiden sisältöön. Mustahattumenetelmät jaetaan sivun sisäisiin menetelmiin, linkkiperustaisiin menetelmiin ja muihin menetelmiin sekä tehostusmenetelmiin ja piilotusmenetelmiin. Lisäksi mustahattumenetelmiä voidaan hyödyntää verkkosivun hakutulossijoituksen tahalliseen alentamiseen. Liiallinen avainsanojen käyttö ja cloaking-menetelmä esiintyvät usein kirjallisuudessa. Mustahattuhakukoneoptimointi kuluttaa hakukoneiden resursseja, aiheuttaa verkkohakutuloksien laadun heikkenemistä ja voi edistää haitallisen verkkosisällön leviämistä. Hakukoneyhtiöt voivat antaa mustahattumenetelmiä hyödyntävälle verkkosivulle varoituksen, heikentää verkkosivun hakutulossijoitusta, tai poistaa verkkosivun hakuindeksistä. Mustahattuhakukoneoptimointia hyödyntävän verkkosivun toiminta voidaan pyrkiä lopettamaan myös oikeusteitse. Automaattiset menetelmät tehostavat mustahattuhakukoneoptimointia hyödyntävien verkkosivujen havainnointia. Mustahattumenetelmien kehittyessä myös vastatoimien on kehityttävä, jotta mustahattuhakukoneoptimoinnin vaikutuksia voidaan vähentää

    Improving Web Spam Classification using Rank-time Features

    No full text
    In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the method of dataset construction is crucial for accurate spam classification and we note that this problem occurs generally in learning problems and can be hard to detect. In particular, we find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In our case, classification performance can differ by as much as 40 % in precision when using non-domain-separated data. Second, we show ranktime features can improve the performance of a web spam classifier. Our paper is the first to investigate the use of rank-time features, and in particular query-dependent ranktime features, for web spam detection. We show that the use of rank-time and query-dependent features can lead to an increase in accuracy over a classifier trained using page-based content only

    Avaliação das técnicas de otimização para motores de busca

    Get PDF
    Dissertação de mestrado em Sistemas de InformaçãoOs motores de busca são ferramentas indispensáveis na pesquisa de informação na web. A sua finalidade é fornecer resultados relevantes aos seus utilizadores de modo que fiquem satisfeitos. Uma das formas de o conseguirem é através da indexação, que lhes permite guardar informação útil. No entanto, devido à quantidade exorbitante de sites na web, as suas políticas de indexação mudam periodicamente. A forma de como eles operam exatamente não é conhecida e mantém-se como um segredo de negócio, uma vez que as páginas web listadas nos seus resultados de busca podem ficar banidas de um dia para outro. A esse processo de se posicionar nos motores de busca chama-se SEO (Search Engine Optimization), que é um conjunto de técnicas de otimização que os sites usam para conseguir visibilidade nos motores de busca. O trabalho tem como objeto de estudo a avaliação das técnicas SEO correntes, White Hat SEO e Black Hat SEO. No entanto a avaliação está focada em White Hat SEO por estas serem as técnicas aceites pelos motores de busca. A avaliação inclui análises de três sites com diferentes tópicos e a sua implementação das técnicas SEO. O diagnóstico inicial foi feito mediante ferramentas para webmasters, que permitiram obter os fatores que impediam a visibilidade nas SERP (Search Engine Results Page). Em relação à implementação das técnicas, foram usadas ferramentas da análise, com os quais se monitorizou o comportamento de cada um deles. A parte complementar da avaliação foi a criação de um site (site de teste) baseado nas orientações dos motores de busca mais conhecidos. Uma comparação entre as técnicas SEO implementadas nos “sites analisados” e no “site de teste” determinou a eficácia das mesmas. É de notar que uma listagem das técnicas SEO mais eficazes permite aos proprietários de sites criar páginas web de qualidade, com códigos simples, interface amigável e conteúdo otimizado, ajudando os motores de busca na indexação, e os utilizadores na busca de informação. Palavras-chave: Otimização para motores de busca, otimização de sites, técnicas SEO, avaliação das técnicas SEO, black hat SEO, white hat SEO, ferramentas analytics, ferramentas para webmasters.Search engines are essential tools when it comes to searching for information on the web. Its purpose is to provide relevant results to its users so that they are satisfied. One of the ways to achieve it is through indexing, which allows them to store useful information. However, due to the exorbitant amount of websites on the web, indexing policies change periodically. Their exact way of operation is unknown and remains a business secret, since the web pages listed in their search results may be banned from one day to another. The process of positioning in the search engines is called SEO (English, Search Engine Optimization), which is a set of optimization techniques that websites use in order to get visibility in search engines. The study's goal is assessing the current SEO techniques, White Hat SEO and Black Hat SEO. However the assessment is focused on White Hat SEO, due to the fact that they are accepted by search engines. The assessment includes analyzes of three sites with different topics and implementation of SEO techniques. The initial diagnosis was made through webmaster tools, which allowed getting the factors that obstructed visibility in SERP (Search Engine Results Page). Regarding the implementation of the techniques, analysis tools were used, which monitored the behavior of each of them. The complementary part of the evaluation was to create a site (test site) based on the guidelines of popularity search engines. A comparison between SEO techniques implemented in "sites analyzed" and "test site" determined their effectiveness. It is noted that a listing of the most effective SEO techniques allows website owners to create quality websites, with simple codes, a user-friendly interface and an optimized content, benefiting the search engines in the indexing and users in finding information. Keywords: Search Engine Optimization, SEO, SEO techniques, SEO techniques evaluation, black hat SEO, white hat SEO, analytics tools, webmaster tools

    Improving Web Spam Classification using Rank-time Features ABSTRACT

    No full text
    In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the method of dataset construction is crucial for accurate spam classification and we note that this problem occurs generally in learning problems and can be hard to detect. In particular, we find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In our case, classification performance can differ by as much as 40 % in precision when using non-domain-separated data. Second, we show ranktime features can improve the performance of a web spam classifier. Our paper is the first to investigate the use of rank-time features, and in particular query-dependent ranktime features, for web spam detection. We show that the use of rank-time and query-dependent features can lead to an increase in accuracy over a classifier trained using page-based content only. Categories and Subject Descriptor
    corecore