4 research outputs found

    Spam filter evaluation with imprecise ground truth

    No full text
    When trained and evaluated on accurately labeled datasets, online email spam filters are remarkably effective, achieving error rates an order of magnitude better than classifiers in similar applications. But labels acquired from user feedback or third-party adjudication exhibit higher error rates than the best filters – even filters trained using the same source of labels. It is appropriate to use naturally occuring labels – including errors – as training data in evaluating spam filters. Erroneous labels are problematic, however, when used as ground truth to measure filter effectiveness. Any measurement of the filter’s error rate will be augmented and perhaps masked by the label error rate. Using two natural sources of labels, we demonstrate automatic and semi-automatic methods that reduce the influence of labeling errors on evaluation, yielding substantially more precise measurements of true filter error rates

    Considerando o ruído no aprendizado de modelos preditivos robustos para a filtragem colaborativa

    Get PDF
    In Recommendation Systems, it is named natural noise the inconsistencies that are introduced by a user. These inconsistencies affect the overall performance. Until then, data cleansing proposals have emerged with the objective to identify and correct these inconsistencies. However. approaches that consider noise in the learning process present a superior quality. Meanwhile, procedures for changing the cost function have arisen whose solution for the minimization of this with noisy data corresponds to the same solution using the original function with noiseless data. However, these procedures are dependent on previews knowledge of the noise distribution and in order to estimate it, certain assumptions regarding data are required. These conditions are not satisfied in collaborative filtering. In this work it is proposed to use these cost functions to construct a predictive model that considers noise in its learning. In addition, we present: (a) a class noise generation heuristic for collaborative filtering problems; (b) a baseline noise quantitative analysis; (c) robustness analysis of predictive models. In order to validate the proposal, three most representative datasets were selected for the problem. For such datasets, comparisons were made with state-of-the-art. Our results indicate that the proposal obtains superior prediction quality to the other methods in all the datasets and maintains a competitive robustness even when compared with the model that knows a priori the generator of the noise. Finally, a new direction is opened for methods that consider noise to the learning process of predictive models for collaborative filtering.Em sistemas de recomendação, denomina-se ruído natural as inconsistências que são introduzidas por um usuário. Inconsistências estas que são responsáveis por afetar o desempenho geral do recomendador. Até então, surgiram propostas de data cleansing que se baseiam em identificar essas avaliações inconsistentes e corrigi-las. Contudo, abordagens que consideram o ruído no processo de aprendizado apresentam qualidade superior. Neste cenário, surgiram procedimentos de alteração da função de custo, cuja solução para a minimização desta com dados ruidosos, corresponde à mesma solução utilizando a função original com dados sem ruído. Entretanto, estes são dependentes de um conhecimento a priori da distribuição do ruído e, para poder estimá-la, são necessárias certas suposições acerca dos dados. No caso da filtragem colaborativa, estas condições não são satisfeitas. Neste trabalho é proposta a utilização destas funções de custo para construir um modelo preditivo que considere o ruído no seu aprendizado. Adicionalmente, apresentamos: (a) uma heurística de geração de ruído de classe para problemas de filtragem colaborativa; (b) uma análise do quantitativo de ruído em bases; (c) análise da robustez de modelos preditivos. De forma a validar a proposta, foram selecionadas três bases mais representativas ao problema. Para tais bases, foram realizados comparativos com métodos do estado-da-arte. Nossos resultados indicam que a proposta obtém qualidade superior aos demais métodos em todas as bases e mantém uma robustez competitiva até mesmo quando se comparado com o modelo que conhece a priori o gerador do ruído. Por fim, abre-se um novo caminho para métodos que consideram ruído ao processo de aprendizado de modelos preditivos para filtragem colaborativa, e que, pesquisas nesta direção devem ser consideradas

    On Design and Evaluation of High-Recall Retrieval Systems for Electronic Discovery

    Get PDF
    High-recall retrieval is an information retrieval task model where the goal is to identify, for human consumption, all, or as many as practicable, documents relevant to a particular information need. This thesis investigates the ways in which one can evaluate high-recall retrieval systems and explores several design considerations that should be accounted for when designing such systems for electronic discovery. The primary contribution of this work is a framework for conducting high-recall retrieval experimentation in a controlled and repeatable way. This framework builds upon lessons learned from similar tasks to facilitate the use of retrieval systems on collections that cannot be distributed due to the sensitivity or privacy of the material contained within. Accordingly, a Web API is used to distribute document collections, informations needs, and corresponding relevance assessments in a one-document-at-a-time manner. Validation is conducted through the successful deployment of this architecture in the 2015 TREC Total Recall track over the live Web and in controlled environments. Using the runs submitted to the Total Recall track and other test collections, we explore the efficacy of a variety of new and existing effectiveness measures to high-recall retrieval tasks. We find that summarizing the trade-off between recall and the effort required to attain that recall is a non-trivial task and that several measures are sensitive to properties of the test collections themselves. We conclude that the gain curve, a de facto standard, and variants of the gain curve are the most robust to variations in test collection properties and the evaluation of high-recall systems. This thesis also explores the effect that non-authoritative, surrogate assessors can have when training machine learning algorithms. Contrary to popular thought, we find that surrogate assessors appear to be inferior to authoritative assessors due to differences of opinion rather than innate inferiority in their ability to identify relevance. Furthermore, we show that several techniques for diversifying and liberalizing a surrogate assessor's conception of relevance can yield substantial improvement in the surrogate and, in some cases, rival the authority. Finally, we present the results of a user study conducted to investigate the effect that three archetypal high-recall retrieval systems have on judging behaviour. Compared to using random and uncertainty sampling, selecting documents for training using relevance sampling significantly decreases the probability that a user will identify that document as relevant. On the other hand, no substantial difference between the test conditions is observed in the time taken to render such assessments
    corecore