31,943 research outputs found
Characterizing Interdisciplinarity of Researchers and Research Topics Using Web Search Engines
Researchers' networks have been subject to active modeling and analysis.
Earlier literature mostly focused on citation or co-authorship networks
reconstructed from annotated scientific publication databases, which have
several limitations. Recently, general-purpose web search engines have also
been utilized to collect information about social networks. Here we
reconstructed, using web search engines, a network representing the relatedness
of researchers to their peers as well as to various research topics.
Relatedness between researchers and research topics was characterized by
visibility boost-increase of a researcher's visibility by focusing on a
particular topic. It was observed that researchers who had high visibility
boosts by the same research topic tended to be close to each other in their
network. We calculated correlations between visibility boosts by research
topics and researchers' interdisciplinarity at individual level (diversity of
topics related to the researcher) and at social level (his/her centrality in
the researchers' network). We found that visibility boosts by certain research
topics were positively correlated with researchers' individual-level
interdisciplinarity despite their negative correlations with the general
popularity of researchers. It was also found that visibility boosts by
network-related topics had positive correlations with researchers' social-level
interdisciplinarity. Research topics' correlations with researchers'
individual- and social-level interdisciplinarities were found to be nearly
independent from each other. These findings suggest that the notion of
"interdisciplinarity" of a researcher should be understood as a
multi-dimensional concept that should be evaluated using multiple assessment
means.Comment: 20 pages, 7 figures. Accepted for publication in PLoS On
Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval
We summarize math search engines and search interfaces produced by the
Document and Pattern Recognition Lab in recent years, and in particular the min
math search interface and the Tangent search engine. Source code for both
systems are publicly available. "The Masses" refers to our emphasis on creating
systems for mathematical non-experts, who may be looking to define unfamiliar
notation, or browse documents based on the visual appearance of formulae rather
than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer
Mathematics (July, Washington DC
Extraction of Keyphrases from Text: Evaluation of Four Algorithms
This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithmÂ’s keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in MicrosoftÂ’s Word 97, (2) an algorithm based on Eric BrillÂ’s part-of-speech tagger, (3) the Summarize feature in VerityÂ’s Search 97, and (4) NRCÂ’s Extractor algorithm. For all five document collections, NRCÂ’s Extractor yields the best match with the manually generated keyphrases
Automated Detection of Usage Errors in non-native English Writing
In an investigation of the use of a novelty detection algorithm for identifying inappropriate word
combinations in a raw English corpus, we employ an
unsupervised detection algorithm based on the one-
class support vector machines (OC-SVMs) and extract
sentences containing word sequences whose frequency
of appearance is significantly low in native English
writing. Combined with n-gram language models and
document categorization techniques, the OC-SVM classifier assigns given sentences into two different
groups; the sentences containing errors and those
without errors. Accuracies are 79.30 % with bigram
model, 86.63 % with trigram model, and 34.34 % with four-gram model
Normalized Web Distance and Word Similarity
There is a great deal of work in cognitive psychology, linguistics, and
computer science, about using word (or phrase) frequencies in context in text
corpora to develop measures for word similarity or word association, going back
to at least the 1960s. The goal of this chapter is to introduce the
normalizedis a general way to tap the amorphous low-grade knowledge available
for free on the Internet, typed in by local users aiming at personal
gratification of diverse objectives, and yet globally achieving what is
effectively the largest semantic electronic database in the world. Moreover,
this database is available for all by using any search engine that can return
aggregate page-count estimates for a large range of search-queries. In the
paper introducing the NWD it was called `normalized Google distance (NGD),' but
since Google doesn't allow computer searches anymore, we opt for the more
neutral and descriptive NWD. web distance (NWD) method to determine similarity
between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural
Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau
Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN
978-142008592
- …