1,353,548 research outputs found
Models of Co-occurrence
A model of co-occurrence in bitext is a boolean predicate that indicates
whether a given pair of word tokens co-occur in corresponding regions of the
bitext space. Co-occurrence is a precondition for the possibility that two
tokens might be mutual translations. Models of co-occurrence are the glue that
binds methods for mapping bitext correspondence with methods for estimating
translation models into an integrated system for exploiting parallel texts.
Different models of co-occurrence are possible, depending on the kind of bitext
map that is available, the language-specific information that is available, and
the assumptions made about the nature of translational equivalence. Although
most statistical translation models are based on models of co-occurrence,
modeling co-occurrence correctly is more difficult than it may at first appear
Fixed versus Dynamic Co-Occurrence Windows in TextRank Term Weights for Information Retrieval
TextRank is a variant of PageRank typically used in graphs that represent
documents, and where vertices denote terms and edges denote relations between
terms. Quite often the relation between terms is simple term co-occurrence
within a fixed window of k terms. The output of TextRank when applied
iteratively is a score for each vertex, i.e. a term weight, that can be used
for information retrieval (IR) just like conventional term frequency based term
weights. So far, when computing TextRank term weights over co- occurrence
graphs, the window of term co-occurrence is al- ways ?xed. This work departs
from this, and considers dy- namically adjusted windows of term co-occurrence
that fol- low the document structure on a sentence- and paragraph- level. The
resulting TextRank term weights are used in a ranking function that re-ranks
1000 initially returned search results in order to improve the precision of the
ranking. Ex- periments with two IR collections show that adjusting the vicinity
of term co-occurrence when computing TextRank term weights can lead to gains in
early precision
- ā¦