36 research outputs found

    ДОСЛІДЖЕННЯ ПЕРСПЕКТИВ ВИКОРИСТАННЯ ТА ПРИНЦИПІВ ПОБУДОВИ МУЛЬТИАГЕНТНОЇ ПОШУКОВОЇ СИСТЕМИ

    Get PDF
    У роботі досліджено принципи функціонування систем інформаційного пошуку та, зокрема, мультиагентної пошукової системи. Відповідно, проаналізовано ряд наукових досліджень у сфері інформаційного пошуку. В ході дослідження встановлено перспективність використання мультиагентності стосовно вдосконалення пошукових методів та, зокрема, при побудові систем інформаційного пошуку. Були визначені переваги побудови розподіленої мультиагентної пошукової системи в порівнянні з централізованими системами пошуку. Також наголошено, що організація мультиагентного пошуку дозволяє об’єднати в собі різні підходи до вирішення завдання інтелектуалізації та персоналізації пошукової видачі.In conditions of the information society development, there is one of the most important tasks remains - to solve the problem of effective search and collection of the information. This is crucially important due to a growing diversity of information sources focused on developing different areas of human activities. Thus, there is a demand for new methods to ensure the effective information search.In this paper, the principles of functioning of information search systems and, in particular, of multiagent search engine was analyzed. Accordingly, a number of scientific works in the field of information search have been analyzed. During the analysis of the principles of the functioning of information search systems and the lot of scientific research in the field of information search, the prospect of using the distributed multiagent system in the framework of the improvement of search methods was established and the feasibility of using it to improve the accuracy of document evaluation was emphasized. The study established the prospect of using multiagency in the improvement of search methods, and in particular, in the construction of information search systems. The advantages of building a distributed multi-agent search engine over centralized search systems were identified. It is also emphasized that multi-agent search can combine different approaches to solve the problem of search engine intellectualization and personalization.It was summarized that using the methodology of building a distributed multiagent system in the framework of improving search methods and, in particular, in the construction of information search systems, it is possible to ensure that the search engine first finds documents containing the necessary information. In addition, the basic principles of construction for the development of multiagent structure within the organization of information search were highlighted.The findings and suggestions of this study can be used in research and teaching. In particular, the results obtained from this study can be used to further analyze and refine information search methods

    Exploring Linguistic Features for Web Spam Detection: A Preliminary Study

    Get PDF
    We study the usability of linguistic features in theWeb spam classification task. The features were computed on two Web spam corpora: Webspam-Uk2006 and Webspam-Uk2007, we make them publicly available for other researchers. Preliminary analysis seems to indicate that certain linguistic features may be useful for the spam-detection task when combined with features studied elsewhere.JRC.G.2-Support to external securit

    Survey on Web Spam Detection using Link and Content Based Features

    Get PDF
    Web spam is one of the recent problems of search engines because it powerfully reduced the quality of the Web page. Web spam has an economic impact because spammers provide a large free advertising data or sites on the search engines and so an increase in the web traffic volume. In this paper we Survey on efficient spam detection techniques based on a classifier that combines new link based features with language models. Link Based features are related to qualitative data extracted from the web pages and also to the qualitative properties of the page links. Spam technique applies LM approach to different sources of information from a web page that belongs to the context of a link in order to provide high quality indicators of web spam. Specifically Detection technique applied the Kullback Leibler divergence on different combinations of these sources of information in order to characterize the relationship between two linked pages

    Semantic Web meets Web 2.0 (and vice versa): The Value of the Mundane for the Semantic Web

    No full text
    Web 2.0, not the Semantic Web, has become the face of “the next generation Web” among the tech-literate set, and even among many in the various research communities involved in the Web. Perceptions in these communities of what the Semantic Web is (and who is involved in it) are often misinformed if not misguided. In this paper we identify opportunities for Semantic Web activities to connect with the Web 2.0 community; we explore why this connection is of significant benefit to both groups, and identify how these connections open valuable research opportunities “in the real” for the Semantic Web effort

    Deep Learning for User Comment Moderation

    Full text link
    Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of English Wikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation

    Transforming Message Detection

    Full text link
    The majority of existing spam filtering techniques suffers from several serious disadvantages. Some of them provide many false positives. The others are suitable only for email filtering and may not be used in IM and social networks. Therefore content methods seem to be more efficient. One of them is based on signature retrieval. However it is not change resistant. There are enhancements (e.g. checksums) but they are extremely time and resource consuming. That is why the main objective of this research is to develop a transforming message detection method. To this end we have compared spam in various languages, namely English, French, Russian and Italian. For each language the number of examined messages including spam and notspam was about 1000. 135 quantitative features have been retrieved. Almost all these features do not depend on the language. They underlie the first step of the algorithm based on support vector machine. The next stage is to test the obtained results applying N-gram approach. Special attention is paid to word distortion and text alteration. The obtaining results indicate the efficiency of the suggested approach
    corecore