Search CORE

1,639 research outputs found

Extraction of Keyphrases from Text: Evaluation of Four Algorithms

Author: Turney Peter
Publication venue
Publication date: 01/01/1997
Field of study

This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithms keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in Microsofts Word 97, (2) an algorithm based on Eric Brills part-of-speech tagger, (3) the Summarize feature in Veritys Search 97, and (4) NRCs Extractor algorithm. For all five document collections, NRCs Extractor yields the best match with the manually generated keyphrases

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

A Hybrid Radial Basis Function - Pseudospectral Method for Thermal Convection in a 3-D Spherical Shell

Author: Flyer Natasha
Wright Grady B.
Publication venue
Publication date: 01/01/2009
Field of study

A novel hybrid spectral method that combines radial basis function (RBF) and Chebyshev pseudospectral (PS) methods in a “2+1” approach is presented for numerically simulating thermal convection in a 3-D spherical shell. This is the first study to apply RBFs to a full 3D physical model in spherical geometry. In addition to being spectrally accurate, RBFs are not defined in terms of any surface based coordinate system such as spherical coordinates. As a result, when used in the lateral directions, as in this study, they completely circumvent the pole issue with the further advantage that nodes can be “scattered” over the surface of a sphere. In the radial direction, Chebyshev polynomials are used, which are also spectrally accurate and provide the necessary clustering near the boundaries to resolve boundary layers. Applications of this new hybrid methodology are given to the problem of convection in the Earth’s mantle,which is modeled by a Boussinesq fluid at infinite Prandtl number. To see whether this numerical technique warrants further investigation, the study limits itself to an isoviscous mantle.Benchmark comparisons are presented with other currently used mantle convection codes for Rayleigh number 7 · 103 and 105. The algorithmic simplicity of the code (mostly due to RBFs)allows it to be written in less than 400 lines of Matlab and run on a single workstation. We find that our method is very competitive with those currently used in the literature

Oxford University Research Archive

Learning to Extract Keyphrases from Text

Author: Turney Peter
Publication venue
Publication date: 01/01/1999
Field of study

Many academic journals ask their authors to provide a list of about five to fifteen key words, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a surprisingly wide variety of tasks for which keyphrases are useful, as we discuss in this paper. Recent commercial software, such as Microsoft?s Word 97 and Verity?s Search 97, includes algorithms that automatically extract keyphrases from documents. In this paper, we approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for this task. The third set of experiments examines the performance of GenEx on the task of metadata generation, relative to the performance of Microsoft?s Word 97. The fourth and final set of experiments investigates the performance of GenEx on the task of highlighting, relative to Verity?s Search 97. The experimental results support the claim that a specialized learning algorithm (GenEx) can generate better keyphrases than a general-purpose learning algorithm (C4.5) and the non-learning algorithms that are used in commercial software (Word 97 and Search 97)

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

Using noun phrases extraction for the improvement of hybrid clustering with text- and citation-based components. The example of “Information Systems Research”

Author: Glänzel Wolfgang
Meyer Martin S.
Thijs Bart
Publication venue
Publication date: 29/06/2015
Field of study

The hybrid clustering approach combining lexical and link-based similarities suffered for a long time from the different properties of the underlying networks. We propose a method based on noun phrase extraction using natural language processing to improve the measurement of the lexical component. Term shingles of different length are created form each of the extracted noun phrases. Hybrid networks are built based on weighted combination of the two types of similarities with seven different weights. We conclude that removing all single term shingles provides the best results at the level of computational feasibility, comparability with bibliographic coupling and also in a community detection application

Kent Academic Repository

Iterative approach to computational enzyme design

Author: Blomberg Rebecca
Chica Roberto A.
Hilvert Donald
Houk Kendall N.
Kiss Gert
Lee Toni M.
Mayo Stephen L.
Privett Heidi K.
Thomas Leonard M.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2012
Field of study

A general approach for the computational design of enzymes to catalyze arbitrary reactions is a goal at the forefront of the field of protein design. Recently, computationally designed enzymes have been produced for three chemical reactions through the synthesis and screening of a large number of variants. Here, we present an iterative approach that has led to the development of the most catalytically efficient computationally designed enzyme for the Kemp elimination to date. Previously established computational techniques were used to generate an initial design, HG-1, which was catalytically inactive. Analysis of HG-1 with molecular dynamics simulations (MD) and X-ray crystallography indicated that the inactivity might be due to bound waters and high flexibility of residues within the active site. This analysis guided changes to our design procedure, moved the design deeper into the interior of the protein, and resulted in an active Kemp eliminase, HG-2. The cocrystal structure of this enzyme with a transition state analog (TSA) revealed that the TSA was bound in the active site, interacted with the intended catalytic base in a catalytically relevant manner, but was flipped relative to the design model. MD analysis of HG-2 led to an additional point mutation, HG-3, that produced a further threefold improvement in activity. This iterative approach to computational enzyme design, including detailed MD and structural analysis of both active and inactive designs, promises a more complete understanding of the underlying principles of enzymatic catalysis and furthers progress toward reliably producing active enzymes

PubMed Central

Caltech Authors

ZHAW digitalcollection

Nanoscale alpha-structural domains in the phonon-glass thermoelectric material beta-Zn4Sb3

Author: Billinge S. J.
Božin E. S.
Haile S. M.
Kim H. J.
Snyder G. J.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/04/2007
Field of study

A study of the local atomic structure of the promising thermoelectric material beta-Zn4Sb3, using atomic pair distribution function (PDF) analysis of x-ray- and neutron-diffraction data, suggests that the material is nanostructured. The local structure of the beta phase closely resembles that of the low-temperature alpha phase. The alpha structure contains ordered zinc interstitial atoms which are not long range ordered in the beta phase. A rough estimate of the domain size from a visual inspection of the PDF is <~10 nm. It is probable that the nanoscale domains found in this study play an important role in the exceptionally low thermal conductivity of beta-Zn4Sb3

Caltech Authors

Exchange biasing of single-domain Ni nanoparticles spontaneously grown in an antiferromagnetic MnO matrix

Author: Abramoff M D
Bérar J F
Daniel P Shoemaker
Madeleine Grossman
Ram Seshadri
Sort J
Suenaga S
Yi J-Y
Publication venue: 'IOP Publishing'
Publication date: 15/10/2007
Field of study

Exchange biased composites of ferromagnetic single-domain Ni nanoparticles embedded within large grains of MnO have been prepared by reduction of Ni

_x

_{1-x}

_4

phases in flowing hydrogen. The Ni precipitates are 15-30 nm in extent, and the majority are completely encased within the MnO matrix. The manner in which the Ni nanoparticles are spontaneously formed imparts a high ferromagnetic- antiferromagnetic interface/volume ratio, which results in substantial exchange bias effects. Exchange bias fields of up to 100 Oe are observed, in cases where the starting Ni content

x

in the precursor Ni

_x

_{1-x}

_4

phase is small. For particles of approximately the same size, the exchange bias leads to significant hardening of the magnetization, with the coercive field scaling nearly linearly with the exchange bias field.Comment: 6 pages PDFLaTeX with 9 figure

arXiv.org e-Print Archive

Crossref