7,488 research outputs found
A Taxonomy of Hyperlink Hiding Techniques
Hidden links are designed solely for search engines rather than visitors. To
get high search engine rankings, link hiding techniques are usually used for
the profitability of black industries, such as illicit game servers, false
medical services, illegal gambling, and less attractive high-profit industry,
etc. This paper investigates hyperlink hiding techniques on the Web, and gives
a detailed taxonomy. We believe the taxonomy can help develop appropriate
countermeasures. Study on 5,583,451 Chinese sites' home pages indicate that
link hidden techniques are very prevalent on the Web. We also tried to explore
the attitude of Google towards link hiding spam by analyzing the PageRank
values of relative links. The results show that more should be done to punish
the hidden link spam.Comment: 12 pages, 2 figure
Clustering Analysis within Text Classification Techniques
The paper represents a personal approach upon the main applications of classification which are presented in the area of knowledge based society by means of methods and techniques widely spread in the literature. Text classification is underlined in chapter two where the main techniques used are described, along with an integrated taxonomy. The transition is made through the concept of spatial representation. Having the elementary elements of geometry and the artificial intelligence analysis, spatial representation models are presented. Using a parallel approach, spatial dimension is introduced in the process of classification. The main clustering methods are described in an aggregated taxonomy. For an example, spam and ham words are clustered and spatial represented, when the concepts of spam, ham and common and linkage word are presented and explained in the xOy space representation.Knowledge Societies, Text Classification, Spatial Representation, Artificial Intelligence, Clustering Analysis, Spam Filtering
Folksonomies and clustering in the collaborative system CiteULike
We analyze CiteULike, an online collaborative tagging system where users
bookmark and annotate scientific papers. Such a system can be naturally
represented as a tripartite graph whose nodes represent papers, users and tags
connected by individual tag assignments. The semantics of tags is studied here,
in order to uncover the hidden relationships between tags. We find that the
clustering coefficient reflects the semantical patterns among tags, providing
useful ideas for the designing of more efficient methods of data classification
and spam detection.Comment: 9 pages, 5 figures, iop style; corrected typo
BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology
This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software
- âŠ