Search CORE

7 research outputs found

Question selection for crowd entity resolution

Author: Arasu A.
Bansal N.
Bilenko M.
Demartini G.
Elmagarmid A. K.
Franklin M. J.
Gomes R.
Hassanzadeh O.
Hernández M. A.
Ipeirotis P. G.
Law E.
Marcus A.
Marcus A.
Park H.
Venetis P.
Wang J.
Winkler W.
Yang Y.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Customizaçao em ambientes de qualidade de dados

Author: Martinhago Adriana Zanella
Publication venue
Publication date: 01/01/2006
Field of study

Orientador: Marcos Sfair SunyeInclui apendicesDissertaçao (mestrado) - Universidade Federal do Paraná, Setor de Ciencias Exatas, Programa de Pós-Graduaçao em Informática. Defesa: Curitiba, 2006Inclui bibliografi

Repositório Digital Institucional da UFPR

Universidade Federal do Paraná

Sistema de recomendação para plataformas de e-learning

Author: Tavares Bruno
Publication venue: Instituto Politécnico do Porto. Instituto Superior de Engenharia do Porto
Publication date: 01/01/2012
Field of study

O aumento do número de recursos digitais disponíveis dificulta a tarefa de pesquisa dos recursos mais relevantes, no sentido de se obter o que é mais relevante. Assim sendo, um novo tipo de ferramentas, capaz de recomendar os recursos mais apropriados às necessidades do utilizador, torna-se cada vez mais necessário. O objetivo deste trabalho de I&D é o de implementar um módulo de recomendação inteligente para plataformas de e-learning. As recomendações baseiam-se, por um lado, no perfil do utilizador durante o processo de formação e, por outro lado, nos pedidos efetuados pelo utilizador, através de pesquisas [Tavares, Faria e Martins, 2012]. O e-learning 3.0 é um projeto QREN desenvolvido por um conjunto de organizações e tem com objetivo principal implementar uma plataforma de e-learning. Este trabalho encontra-se inserido no projeto e-learning 3.0 e consiste no desenvolvimento de um módulo de recomendação inteligente (MRI). O MRI utiliza diferentes técnicas de recomendação já aplicadas noutros sistemas de recomendação. Estas técnicas são utilizadas para criar um sistema de recomendação híbrido direcionado para a plataforma de e-learning. Para representar a informação relevante, sobre cada utilizador, foi construído um modelo de utilizador. Toda a informação necessária para efetuar a recomendação será representada no modelo do utilizador, sendo este modelo atualizado sempre que necessário. Os dados existentes no modelo de utilizador serão utilizados para personalizar as recomendações produzidas. As recomendações estão divididas em dois tipos, a formal e a não formal. Na recomendação formal o objetivo é fazer sugestões relacionadas a um curso específico. Na recomendação não-formal, o objetivo é fazer sugestões mais abrangentes onde as recomendações não estão associadas a nenhum curso. O sistema proposto é capaz de sugerir recursos de aprendizagem, com base no perfil do utilizador, através da combinação de técnicas de similaridade de palavras, um algoritmo de clustering e técnicas de filtragem.As more and more digital resources are available, finding the appropriate document becomes harder. Thus, a new kind of tools, able to recommend the more appropriated resources according the user needs, becomes even more necessary. The objective of this I&D work is to implement an intelligent recommendation module (MRI) for e-learning platforms. The recommendations are based on one hand, the performance of the user profile and on the other hand, the requests made by the user in the form of search queries [Tavares, Faria e Martins, 2012]. The e-learning 3.0 is a project developed by a group of organizations and has as primary objective the development of an e-learning platform. This work is inserted in the project e-learning 3.0 being responsible for the MRI. The MRI uses different techniques, which are already being used in recommendation systems, and apply those techniques to create a hybrid tutoring system for an e-learning platform. A user model was built to represent the relevant information about each user. All the information needed to do a recommendation is represented in that model, the model will be updated every time it is necessary. The data in the user model will be used to personalize the produced recommendations. The recommendations are divided into two types, the formal recommendation and the non-formal recommendation. In the formal recommendation the goal is to make suggestions related to a specific course. In the non-formal recommendation the purpose is to make suggestions that are not associated with any course at all. The solution is capable of suggesting learning resources, based in a user profile, by combining string similarity techniques, clustering algorithms and filtering techniques

Repositório Científico do Instituto Politécnico do Porto

Employing Trainable String Similarity Metrics for Information Integration

Author
Publication venue
Publication date
Field of study

The problem of identifying approximately duplicate objects in databases is an essential step for the information integration process. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity of potential duplicates. In this paper, we present a framework for improving duplicate detection using trainable measures of textual similarity. We propose to employ learnable text distance functions for each data field, and introduce an extended variant of learnable string edit distance based on an Expectation-Maximization (EM) training algorithm. Experimental results on a range of datasets show that this similarity metric is capable of adapting to the specific notions of similarity that are appropriate for different domains. Our overall system, MARLIN, utilizes support vector machines to combine multiple similarity metrics, which are shown to perform better than ensembles of decisions trees, which were employed for this task in previous work.

CiteSeerX

Unsupervised Duplicate Detection Using Sample Non-duplicates

Author: A.P. Dempster
H. Pasula
H.B. Newcombe
I.P. Fellegi
J. Shi
K.W. Church
L. Sachs
M.A. Hernandez
M.D. Larsen
M.G. Elfeky
P. Lehti
R. Baeza-Yates
S. Russell
S. Tejada
V.I. Levenshtein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref

Active duplicate detection with Bayesian nonparametric models

Author: Matsakis Nicholas E. (Nicholas Elias), 1976-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 129-137).When multiple databases are merged, an essential step is identifying sets of records that refer to the same entity. Called duplicate detection, this task is typically tedious to perform manually, and so a variety of automated methods have been developed for partitioning a collection of records into coreference sets. This task is complicated by ambiguous or noisy field values, so systems are typically domain-specific and often fitted to a representative labeled training corpus. Once fitted, such systems can estimate a partition of a similar corpus without human intervention. While this approach has many applications, it is often infeasible to encode the appropriate domain knowledge a priori or to identify suitable training data. To address such cases, this thesis uses an active framework for duplicate detection, wherein the system initially estimates a partition of a test corpus without training, but is then allowed to query a human user about the coreference labeling of a portion of the corpus. The responses to these queries are used to guide the system in producing improved partition estimates and further queries of interest. This thesis describes a complete implementation of this framework with three technical contributions: a domain-independent Bayesian model expressing the relationship between the unobserved partition and the observed field values of a set of database records; a criterion for picking informative queries based on the mutual information between the response and the unobserved partition; and an algorithm for estimating a minimum-error partition under a Bayesian model through a reduction to the well-studied problem of correlation clustering. It also present experimental results demonstrating the effectiveness of this method in a variety of data domains.by Nicholas Elias Matsakis.Ph.D

DSpace@MIT