11 research outputs found

    Определение поискового спама с использованием метода опорных векторов

    Get PDF
    The paper deals with the classification of web spam. It is marked out the characteristics of web pages context, which are different for a spam and not a spam. It is offered to use the method of support vector machines to figure out whether the web page is the spam. The results of the experiments are given here.В работе рассматривается классификация поискового спама. Выделяются характерные признаки контента веб-страниц, различающихся для спама и не спама. Предлагается использование метода опорных векторов для определения принадлежности веб-страницы к спаму. Приводятся результаты проведенных экспериментов

    Comprehensive Literature Review on Machine Learning Structures for Web Spam Classification

    Get PDF
    AbstractVarious Web spam features and machine learning structures were constantly proposed to classify Web spam in recent years. The aim of this paper was to provide a comprehensive machine learning algorithms comparison within the Web spam detection community. Several machine learning algorithms and ensemble meta-algorithms as classifiers, area under receiver operating characteristic as performance evaluation and two public available datasets (WEBSPAM-UK2006 and WEBSPAM-UK2007) were experimented in this study. The results have shown that random forest with variations of AdaBoost had achieved 0.937 in WEBSPAM-UK2006 and 0.852 in WEBSPAM-UK2007

    LDA-BASED PERSONALIZED DOCUMENT RECOMMENDATION

    Get PDF
    Abstrac

    Fesztivállátogatók véleményeinek számítógéppel támogatott tematikus modellezése – egy kísérlet eredményei: Computer-aided topic modelling based on festival-goers’ opinions – results of an experiment

    Get PDF
    Tanulmányunkban arra teszünk kísérletet, hogy egy számítógépes algoritmus, a rejtett Dirichlet eloszlást alkalmazó strukturált témamodell (stm) segítségével meghatározzuk a Sziget Fesztivál látogatói által a Facebookon írt vélemények jellemző témáit, és ezeket összevessük egy korábbi kutatásunkban körvonalazott témákkal. A Sziget Fesztivál látogatóinak az elmúlt hét évben angol nyelven írt szöveges véleményei alapján az algoritmus segítségével kilenc témát modelleztünk, melyek tartalma és köre csak részben egyezett meg a korábbi, kvalitatív kutatásunkban azonosított témákkal. Vizsgálatunk legfontosabb eredménye, hogy számítógépes eszközökkel eredményesen vizsgálhatók a látogatói vélemények, ugyanakkor az eredmények minőségét meghatározza a korpusz nagysága, vagyis az elemzett hozzászólások száma és terjedelme. = In our study, we attempt to determine the typical topics of opinions written by Sziget Festival visitors on Facebook using structured topic model (stm) computer algorithm and latent Dirichlet allocation, and compare the results with our previous research. Based on written opinions of the visitors of the Sziget Festival in the last seven years, we modelled nine topics. Their content and scope partly matched the topics identified in our previous qualitative research. The most important result of our study is that visitor opinions can be successfully examined with computer tools, but the quality of the results is determined by the size of the corpus, i.e. the number and scope of the analysed posts

    Antyscam – practical web spam classifier

    Get PDF
    To avoid of manipulating search engines results by web spam, anti spam system use machine learning techniques to detect spam. However, if the learning set for the system is out of date the quality of classification falls rapidly. We present the web spam recognition system that periodically refreshes the learning set to create an adequate classifier. A new classifier is trained exclusively on data collected during the last period. We have proved that such strategy is better than an incrementation of the learning set. The system solves the starting–up issues of lacks in learning set by minimisation of learning examples and utilization of external data sets. The system was tested on real data from the spam traps and common known web services: Quora, Reddit, and Stack Overflow. The test performed among ten months shows stability of the system and improvement of the results up to 60 percent at the end of the examined period.

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    Get PDF
    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

    Green supply chain management and enterprise performance in the South African mining industry

    Get PDF
    In the past decade, sustainability issues have emerged as an important area of consideration in business. The mining industry in South Africa has not been spared from this debate since it is one of the major contributors to the environmental challenges that South Africa and other sub-Saharan countries are facing. The purpose of this study is to explore green supply chain management (GSCM) practices and their relationship with enterprise performance in the South African mining industry. More specifically, this research explores the effect of GSCM practices and other organisational factors on enterprise performance in mining companies that operate in Rustenburg in North West Province. This study adopted a research model relating GSCM practice and enterprise performance through three organisational variables (employee job satisfaction, operational efficiency, and relational efficiency) as moderators. Statistical analyses were based on the data collected through survey questionnaires from two South African platinum mining companies in supply chain management departments. Reliability and validity of the research model were tested by the widely accepted statistical tools. To test the hypotheses relating GSCM practice implementation and enterprise performance, correlational analysis and multi-regression models were used. The most anticipated finding of the study was a direct link between GSCM practice implementation and enterprise performance. Indeed, statistical significance was found between the variables. Furthermore, there were significant direct relationships that were found between GSCM practice implementation and enterprise performance through three mediating variables of employee job satisfaction, operational efficiency and relational efficiency. This result indicates that enterprise performance will be improved when GSCM enhances employee job satisfaction, operational efficiency and operational efficiency.Graduate School of Business LeadershipMB

    Linked latent dirichlet allocation in web spam filtering

    No full text

    Methods for demoting and detecting Web spam

    Get PDF
    Web spamming has tremendously subverted the ranking mechanism of information retrieval in Web search engines. It manipulates data source maliciously either by contents or links with the intention of contributing negative impacts to Web search results. The altering order of the search results by spammers has increased the difficulty level of searching and time consumption for Web users to retrieve relevant information. In order to improve the quality of Web search engines results, the design of anti-Web spam techniques are developed in this thesis to detect and demote Web spam via trust and distrust and Web spam classification.A comprehensive literature on existing anti-Web spam techniques emphasizing on trust and distrust model and machine learning model is presented. Furthermore, several experiments are conducted to show the vulnerability of ranking algorithm towards Web spam. Two public available Web spam datasets are used for the experiments throughout the thesis - WEBSPAM-UK2006 and WEBSPAM-UK2007.Two link-based trust and distrust model algorithms are presented subsequently: Trust Propagation Rank and Trust Propagation Spam Mass. Both algorithms semi automatically detect and demote Web spam based on limited human experts’ evaluation of non-spam and spam pages. In the experiments, the results for Trust Propagation Rank and Trust Propagation Spam Mass have achieved up to 10.88% and 43.94% improvement over the benchmark algorithms.Thereafter, the weight properties which associated as the linkage between two Web hosts are introduced into the task of Web spam detection. In most studies, the weight properties are involved in ranking mechanism; in this research work, the weight properties are incorporated into distrust based algorithms to detect more spam. The experiments have shown that the weight properties enhanced existing distrust based Web spam detection algorithms for up to 30.26% and 31.30% on both aforementioned datasets.Even though the integration of weight properties has shown significant results in detecting Web spam, the discussion on distrust seed set propagation algorithm is presented to further enhance the Web spam detection experience. Distrust seed set propagation algorithm propagates the distrust score in a wider range to estimate the probability of other unevaluated Web pages for being spam. The experimental results have shown that the algorithm improved the distrust based Web spam detection algorithms up to 19.47% and 25.17% on both datasets.An alternative machine learning classifier - multilayered perceptron neural network is proposed in the thesis to further improve the detection rate of Web spam. In the experiments, the detection rate of Web spam using multilayered perceptron neural network has increased up to 14.02% and 3.53% over the conventional classifier – support vector machines. At the same time, a mechanism to determine the number of hidden neurons for multilayered perceptron neural network is presented in this thesis to simplify the designing process of network structure

    Towards Web Spam Filtering With Neural-based Approaches

    No full text
    The steady growth and popularization of the Web increases the competition between the websites and creates opportunities for profit in several segments. Thus, there is a great interest in keeping the website in a good position in search results. The problem is that many websites use techniques to circumvent the search engines which deteriorates the search results and exposes users to dangerous content. Given this scenario, this paper presents a performance evaluation of different models of artificial neural networks to automatically classify web spam.We have conducted an empirical experiment using a well-known, large and public web spam database. The results indicate that the evaluated approaches outperform the state-of-the-art web spam filters. © Springer-Verlag Berlin Heidelberg 2012.7637 LNAI199209et al.,Sociedad Colombiana de Computacion (SCo2),Universidad de Caldas,Universidad Nacional de Colombia,Universidad Tecnologica de Bolivar en Cartagena,Universidad Tecnologica de PereiraPublisher: Springer VerlagShengen, L., Xiaofei, N., Peiqi, L., Lin, W., Generating new features using genetic programming to detect link spam (2011) 2011 Intl. Conf. on Intelligent Computation Technology An D Automation (ICICTA 2011, pp. 135-138. , IEEEAraujo, L., Martinez-Romo, J., Web spam detection-new classification features based on qualified link analysis and language models (2010) IEEE Trans Actions on Information Forensics and Security, 5 (3), pp. 581-590Egele, M., Kolbitsch, C., Platzer, C., Removing web spam links from search engine results (2011) Journal in Computer Virology, 7 (1), pp. 51-62Shen, G., Gao, B., Liu, T., Feng, G., Song, S., Li, H., Detecting link spam using temporal information (2006) 6th Intl. Conf. on Data Mining (ICDM 2006, pp. 1049-1053. , IEEEGan, Q., Suel, T., Improving web spam classifiers using link structure (2007) 3rd Intl.Work. on Adversarial Information Retrieval on the Web (AIRWeb 2007, pp. 17-20. , ACMNtoulas, A., Najork, M., Manasse, M., Fetterly, D., Detecting spam web pages through content analysis (2006) 15th Intl. Conf. on World Wide Web (WWW 2006, pp. 83-92. , ACMSilva, R.M., Almeida, T.A., Yamakami, A., Artificial neural networks for content-based web spam detection (2012) 14th Intl. Conf. on Artificial Intelligence (ICAI 2012, pp. 1-7Bíró, I., Siklosi, D., Szabo, J., Benczur, A.A., Linked latent dirichlet allocation in web spam filtering (2009) 5th Intl. Work. on Adversarial Information Retrieval on the Web (AIRWeb 2009, pp. 37-40. , ACMAbernethy, J., Chapelle, O., Castillo, C., Graph regularization methods for web spam detection (2010) Machine Learning, 81 (2), pp. 207-225Castillo, C., Donato, D., Gionis, A., Know your neighbors-web spam detection using the web topology (2007) 30th Annual Intl ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2007, pp. 423-430. , ACMSvore, K.M., Wu, Q., Burges, C.J., Improving web spam classification using rank-time features (2007) 3rd Intl. Work. on Adversarial Information Retrieval on the Web (AIRWeb 2007, pp. 9-16. , ACMNoi, L.D., Hagenbuchner, M., Scarselli, F., Tsoi, A., Web spam detection by probability mapping graphsoms and graph neural networks (2010) ICANN 2010, Part II. LNC S, 6353, pp. 372-381. , Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) Springer, HeidelbergLargillier, T., Peyronnet, S., Using patterns in the behavior of the random surfer to detect webspam beneficiaries (2011) WISE Workshops 2010. LNCS, 6724, pp. 241-253. , Chiu, D.K.W., Bell atreche, L., Sasaki, H., Leung, H.-f., Cheung, S.-C., Hu, H., Shao, J. (eds.) Springer, HeidelbergHaykin, S., (1998) Neural Networks-A Comprehensive Foundation, , 2nd edn Prentice Hall New YorkBishop, C.M., (1995) Neural Networks for Pattern Recognition, , Oxford Press, OxfordHagan, M.T., Menhaj, M.B., (1994) Training feedforward networks with themarquardt algorithm, 5 (6), pp. 989-993. , IEEE Transactions on Neural NetworksKohonen, T., The self-organizing map (1990) Proceedings of the IEEE, 78 (9), pp. 1464-1480Orr, M.J.L., (1996) Introduction to Radial Basis Function Networks, , Center for Cognitive Science, UKBecchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R., Using rank propagation and probabilistic counting for link-based spam detection (2006) Workshop on Web Mining and Web Usage Analysis (WebKDD 2006, pp. 1-10. , ACMShao, J., Linear model selection by cross-validation (1993) Journal of the American Statistical Association, 88 (422), pp. 486-494Witten, I.H., Frank, E., (2005) Data Mining-Practical Machine Learning Tools and Techniques, , 2nd edn Morgan Kaufmann, San Francisc
    corecore