Search CORE

11,974 research outputs found

Hierarchical structuring of Cultural Heritage objects within large aggregations

Author: C. Gennaro
K. Grieser
M. Hall
N. Aletras
N. Takhirov
P. Papadakos
R. Cilibrasi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Huge amounts of cultural content have been digitised and are available through digital libraries and aggregators like Europeana.eu. However, it is not easy for a user to have an overall picture of what is available nor to find related objects. We propose a method for hier- archically structuring cultural objects at different similarity levels. We describe a fast, scalable clustering algorithm with an automated field selection method for finding semantic clusters. We report a qualitative evaluation on the cluster categories based on records from the UK and a quantitative one on the results from the complete Europeana dataset.Comment: The paper has been published in the proceedings of the TPDL conference, see http://tpdl2013.info. For the final version see http://link.springer.com/chapter/10.1007%2F978-3-642-40501-3_2

arXiv.org e-Print Archive

Crossref

VU Research Portal

FAME: Face Association through Model Evolution

Author: Duygulu Pinar
Golge Eren
Publication venue
Publication date: 10/07/2014
Field of study

We attack the problem of learning face models for public faces from weakly-labelled images collected from web through querying a name. The data is very noisy even after face detection, with several irrelevant faces corresponding to other people. We propose a novel method, Face Association through Model Evolution (FAME), that is able to prune the data in an iterative way, for the face models associated to a name to evolve. The idea is based on capturing discriminativeness and representativeness of each instance and eliminating the outliers. The final models are used to classify faces on novel datasets with possibly different characteristics. On benchmark datasets, our results are comparable to or better than state-of-the-art studies for the task of face identification.Comment: Draft version of the stud

arXiv.org e-Print Archive

Crossref

Evolution of a Web-Scale Near Duplicate Image Detection System

Author: Gusev Andrey
Xu Jiajing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/09/2022
Field of study

Detecting near duplicate images is fundamental to the content ecosystem of photo sharing web applications. However, such a task is challenging when involving a web-scale image corpus containing billions of images. In this paper, we present an efficient system for detecting near duplicate images across 8 billion images. Our system consists of three stages: candidate generation, candidate selection, and clustering. We also demonstrate that this system can be used to greatly improve the quality of recommendations and search results across a number of real-world applications. In addition, we include the evolution of the system over the course of six years, bringing out experiences and lessons on how new systems are designed to accommodate organic content growth as well as the latest technology. Finally, we are releasing a human-labeled dataset of ~53,000 pairs of images introduced in this paper

arXiv.org e-Print Archive

An Efficient Approach for Finding Near Duplicate Web pages using Minimum Weight Overlapping Method

Author: Das Shine N
Mathew Midhun
Vijayaraghavan Pramod K.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 29/10/2011
Field of study

The existence of billions of web data has severely affected the performance and reliability of web search. The presence of near duplicate web pages plays an important role in this performance degradation while integrating data from heterogeneous sources. Web mining faces huge problems due to the existence of such documents. These pages increase the index storage space and thereby increase the serving cost. By introducing efficient methods to detect and remove such documents from the Web not only decreases the computation time but also increases the relevancy of search results. We aim a novel idea for finding near duplicate web pages which can be incorporated in the field of plagiarism detection, spam detection and focused web crawling scenarios. Here we propose an efficient method for finding near duplicates of an input web page, from a huge repository. A TDW matrix based algorithm is proposed with three phases, rendering, filtering and verification, which receives an input web page and a threshold in its first phase, prefix filtering and positional filtering to reduce the size of record set in the second phase and returns an optimal set of near duplicate web pages in the verification phase by using Minimum Weight Overlapping (MWO) method. The experimental results show that our algorithm outperforms in terms of two benchmark measures, precision and recall, and a reduction in the size of competing record set.DOI:http://dx.doi.org/10.11591/ijece.v1i2.7

IAES journal

Institute of Advanced Engineering and Science

Network Archaeology: Uncovering Ancient Networks from Present-day Interactions

Author: A Ahmed
A Kreimer
A Mithani
A Vazquez
A Vázquez
A Wagner
AC Gavin
AL Barabási
B Manna
BP Kelley
C Tantipathananandh
C Wiuf
Carl Kingsford
DJ de Solla Price
DJ Watts
DS Callaway
E Sprinzak
ED Levy
F Guo
F Hormozdiari
G Palla
H Ebel
H Huang
HA Simon
HB Fraser
I Bezáková
I Ispolatov
I Ispolatov
J Bar-Ilan
J Dutkowski
J Felsenstein
J Flannick
J Golbeck
J Hopcroft
J Leskovec
J Leskovec
J Leskovec
J Leskovec
J Leskovec
JB Pereira-Leal
JB Pereira-Leal
Joel S. Bader
JW Pinney
JW Thornton
L Hakes
LA Goodman
M Middendorf
P Shannon
R Kumar
R Milo
R Singh
RL Tatusov
S Hanneke
S Kerrien
S Li
S Navlakha
S Redner
Saket Navlakha
T Makino
TA Gibson
U Güldener
WK Kim
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/08/2010
Field of study

Often questions arise about old or extinct networks. What proteins interacted in a long-extinct ancestor species of yeast? Who were the central players in the Last.fm social network 3 years ago? Our ability to answer such questions has been limited by the unavailability of past versions of networks. To overcome these limitations, we propose several algorithms for reconstructing a network's history of growth given only the network as it exists today and a generative model by which the network is believed to have evolved. Our likelihood-based method finds a probable previous state of the network by reversing the forward growth model. This approach retains node identities so that the history of individual nodes can be tracked. We apply these algorithms to uncover older, non-extant biological and social networks believed to have grown via several models, including duplication-mutation with complementarity, forest fire, and preferential attachment. Through experiments on both synthetic and real-world data, we find that our algorithms can estimate node arrival times, identify anchor nodes from which new nodes copy links, and can reveal significant features of networks that have long since disappeared.Comment: 16 pages, 10 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central