Search CORE

1,883 research outputs found

Implementation of Clever Crawler

Author: Anuja Lawankar, Nikhil Mangrulkar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2016
Field of study

Now a days due to duplicate documents in the World Wide Web while crawling, indexing and relevancy. Search engine gives huge number of redundancy data. Because of that, storing data is waste of rankings quality, resources, which is not convenient for users. To optimized this limitation a normalization rules is use to transform all duplicate URLs into the same canonical form and further optimized the result using Jaccard similarity function which optimized the similar text present in the content of the URL’s these function define some threshold value to reduce the duplication

International Journal on Recent and Innovation Trends in Computing and Communication

Multi-objective resource selection in distributed information retrieval

Author: Crestani F.
Wu S.
Publication venue
Publication date: 01/01/2002
Field of study

In a Distributed Information Retrieval system, a user submits a query to a broker, which determines how to yield a given number of documents from all possible resource servers. In this paper, we propose a multi-objective model for this resource selection task. In this model, four aspects are considered simultaneously in the choice of the resource: document's relevance to the given query, time, monetary cost, and similarity between resources. An optimized solution is achieved by comparing the performances of all possible candidates. Some variations of the basic model are also given, which improve the basic model's efficiency

CiteSeerX

University of Strathclyde Institutional Repository

Preliminary document analyzing and summarizing metadata standards and issues across Europe

Author: Alemu Getaneh
Anderson David
Delve Janet
Pinchbeck Dan
Publication venue: European Commission
Publication date: 01/01/2009
Field of study

University of Brighton Research Portal

Portsmouth University Research Portal (Pure)

Building a Robust Web Application

Author: Filonowich Eric
Publication venue: ePublications at Regis University
Publication date: 21/06/2006
Field of study

Change is inevitable. Software applications must be prepared for that inevitable moment by following structured robust software design and architecture. Utilizing popular n-tier architectures and robust philosophies in web applications enables developers to implement robust systems that are prepared for the unknown future. This project highlights and demonstrates robust software development techniques in a prototype web application using an n-tier architecture. The examples are designed to provide a robust philosophy that can be applied to similar robust solutions for other development efforts

ePublications at Regis University

Phonetic normalization as a means to improve toxicity detection

Author: Poitras Charles
Publication venue
Publication date
Field of study

À travers le temps et en présence des avancements de la technologie, l'utilisation de cette technologie afin de créer et de maintenir des communautés en ligne est devenue une occurrence journalière. Avec l'augmentation de l'utilisation de ces technologies, une tendance négative peut aussi se faire identifier; il y a une quantité croissante d'utilisateurs ayant des objectifs négatifs qui créent du contenu illicite ou nuisible à ces communautés. Afin de protéger ces communautés, il devient donc nécessaire de modérer les communications des communautés. Bien qu'il serait possible d'engager une équipe de modérateurs, cette équipe devrait constamment grandir afin de pouvoir modérer l'entièreté du contenu. Afin de résoudre ce problème, plusieurs se tournent vers des techniques de modération automatique. Deux exemples de techniques sont les "whitelists" et les "blacklists". Malheureusement, les utilisateurs néfastes peuvent facilement contourner ces techniques à l'aide de techniques subversives. Une des techniques populaires est l'utilisation de substitution où un utilisateur remplace un mot par un équivalent phonétique, ou une combinaison visuellement semblable au mot original. À travers ce mémoire, nous offrons une nouvelle technique de normalisation faisant usage de la phonétique à l'intérieur d'un normalisateur de texte. Ce normalisateur recrée la prononciation et infère le mot réel à partir de cette normalisation, l'objectif étant de retirer les signes de subversion. Une fois normalisé, un message peut ensuite être passé aux systèmes de classification.Over time, the presence of online communities and the use of electronic means of communication have and keep becoming more prevalent. With this increase, the presence of users making use of those means to spread and create harmful, or sometimes known as toxic, content has also increased. In order to protect those communities, the need for moderation becomes a critical matter. While it could be possible to hire a team of moderators, this team would have to be ever-growing, and as such, most turn to automatic means of detection as a step in their moderation process. Examples of such automatic means would be the use of methods such as blacklists and whitelists, but those methods can easily be subverted by harmful users. A common subversion technique is the substitution of a complete word by a phonetically similar word, or combination of letters that resembles the intended word. This thesis aims to offer a novel approach to moderation specifically targeting phonetic substitutions by creating a normalizer capable of identifying how a word should be read and inferring the obfuscated word, nullifying the effects of subversion. Once normalized phonetically, the messages are then sent to existing means of classification for automatic moderation

CorpusUL