Search CORE

63 research outputs found

Copy detection mechanisms for digital documents

Author: Garrett J. R.
Griswold G. N.
Héctor García-Molina
James Davis
Kahn R. E.
Manber U.
Sergey Brin
Wheeler D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Discovering gene annotations in biomedical text databases

Author: A Cakmak
Ali Cakmak
Burr Settles
Chin-Yew Lin
Deepak Ravichandran
DV Kalashnikov
E Camon
Ellen Riloff
Eugene Agichtein
G Salton
Gideon S Mann
Gultekin Ozsoyoglu
Jiawei Han
JoonHo Lee
K Asakawa
K Asako
KarenSparck Jones
L Lovasz
Michael Fleischman
Michael Fleischman
Oren Etzioni
Philip Resnik
PW Lord
Roy Rada
S Raychaudhuri
S White
Sergey Brin
Sergey Brin
The Gene Ontology Consortium
Tomonori Izumitani
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO) concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i) automating the annotation of genomic entities with Gene Ontology concepts, and (ii) providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate pattern occurrences with similar semantics. Relatively low recall performance of our pattern-based approach may be enhanced either by employing a probabilistic annotation framework based on the annotation neighbourhoods in textual data, or, alternatively, the statistical enrichment threshold may be adjusted to lower values for applications that put more value on achieving higher recall values.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Theories for influencer identification in complex networks

In social and biological systems, the structural heterogeneity of interaction networks gives rise to the emergence of a small set of influential nodes, or influencers, in a series of dynamical processes. Although much smaller than the entire network, these influencers were observed to be able to shape the collective dynamics of large populations in different contexts. As such, the successful identification of influencers should have profound implications in various real-world spreading dynamics such as viral marketing, epidemic outbreaks and cascading failure. In this chapter, we first summarize the centrality-based approach in finding single influencers in complex networks, and then discuss the more complicated problem of locating multiple influencers from a collective point of view. Progress rooted in collective influence theory, belief-propagation and computer science will be presented. Finally, we present some applications of influencer identification in diverse real-world systems, including online social platforms, scientific publication, brain networks and socioeconomic systems.Comment: 24 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Googling the Grey: Open Data, Web Services, and Semantics

Author: Andrew Baines
Christine Borgman
Cindy Stankowski
Cori Hayden
David Schloen
Dean R. Snow
Eric C. Kansa
Eric C. Kansa
Eric C. Kansa
Eric Kansa
Eric Kansa
Francis P. McManamon
Geoffrey C. Bowker
George P. Nicholas
Jennifer Trant
Karl-Heinz Lampe
Keith Kintigh
Kimberly Christen
Margie M. Burton
Martin Doerr
Martin Doerr
Michael Brown
Robin Boast
Sarah Whitcher Kansa
Sergey Brin
Tim Brody
Timothy J. Barringer
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

GIANT: Scalable Creation of a Web-scale Ontology

Author: Adomavicius Gediminas
Brin Sergey
Cordeiro Mário
Devlin Jacob
Doddington George R
Fader Anthony
Frantzi Katerina
Grishman Ralph
Ji Heng
Koo Terry
McClosky David
Mihalcea Rada
Pasca Marius
Pawar Sachin
Ritter Alan
Sha Lei
Smirnova Alisa
Witten Ian H
Witten Ian H
Zhang Ziqi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2020
Field of study

Understanding what online users may pay attention to is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, events and topics in the language style of online population. Neither is a logically structured ontology maintained among these notions. In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from a vast volume of web documents and search click graphs. Various types of edges are also constructed to maintain a hierarchy in the ontology. We present our graph-neural-network-based techniques used in GIANT, and evaluate the proposed methods as compared to a variety of baselines. GIANT has produced the Attention Ontology, which has been deployed in various Tencent applications involving over a billion users. Online A/B testing performed on Tencent QQ Browser shows that Attention Ontology can significantly improve click-through rates in news recommendation.Comment: Accepted as full paper by SIGMOD 202

arXiv.org e-Print Archive

Crossref

Uso de ontologías para la mejora de resultados de motores de búsqueda web

Author: Bernard Jansen
Craig Silverstein
Dennis Wackerly
Dulce Aguilar-López
Dulce Aguilar-López
Erik Selberg
Gerard Salton
Jaime Bocio
Jorge Morato
Kevin Droegemeier
Mariano Fernández-López
Mark Chignell
Michael Lesk
Michel Dumontier
Mingxia Gao
Natalya Noy
Prasanna Ganesan
Rahul Ramachandran
Sergey Brin
Siddharth Patwardhan
Thomas Gruber
Publication venue: 'Ediciones Profesionales de la Informacion SL'
Publication date
Field of study

Crossref

Near Neighbor Search in Large Metric Spaces

Author: Sergey Brin
Publication venue
Publication date
Field of study

Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically difficult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT -- Geometric Near-neighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT's outperform previous data structures in a number of applications

CiteSeerX