792 research outputs found
Learning Semantic Representations for the Phrase Translation Model
This paper presents a novel semantic-based phrase translation model. A pair
of source and target phrases are projected into continuous-valued vector
representations in a low-dimensional latent semantic space, where their
translation score is computed by the distance between the pair in this new
space. The projection is performed by a multi-layer neural network whose
weights are learned on parallel training data. The learning is aimed to
directly optimize the quality of end-to-end machine translation results.
Experimental evaluation has been performed on two Europarl translation tasks,
English-French and German-English. The results show that the new semantic-based
phrase translation model significantly improves the performance of a
state-of-the-art phrase-based statistical machine translation sys-tem, leading
to a gain of 0.7-1.0 BLEU points
A Novel Ehanced Move Recognition Algorithm Based on Pre-trained Models with Positional Embeddings
The recognition of abstracts is crucial for effectively locating the content
and clarifying the article. Existing move recognition algorithms lack the
ability to learn word position information to obtain contextual semantics. This
paper proposes a novel enhanced move recognition algorithm with an improved
pre-trained model and a gated network with attention mechanism for unstructured
abstracts of Chinese scientific and technological papers. The proposed
algorithm first performs summary data segmentation and vocabulary training. The
EP-ERNIEAT-GRU framework is leveraged to incorporate word positional
information, facilitating deep semantic learning and targeted feature
extraction. Experimental results demonstrate that the proposed algorithm
achieves 13.37 higher accuracy on the split dataset than on the original
dataset and a 7.55 improvement in accuracy over the basic comparison model
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs
{\it SimRank} is a classic measure of the similarities of nodes in a graph.
Given a node in graph , a {\em single-source SimRank query}
returns the SimRank similarities between node and each node . This type of queries has numerous applications in web search and social
networks analysis, such as link prediction, web mining, and spam detection.
Existing methods for single-source SimRank queries, however, incur query cost
at least linear to the number of nodes , which renders them inapplicable for
real-time and interactive analysis.
{ This paper proposes \prsim, an algorithm that exploits the structure of
graphs to efficiently answer single-source SimRank queries. \prsim uses an
index of size , where is the number of edges in the graph, and
guarantees a query time that depends on the {\em reverse PageRank} distribution
of the input graph. In particular, we prove that \prsim runs in sub-linear time
if the degree distribution of the input graph follows the power-law
distribution, a property possessed by many real-world graphs. Based on the
theoretical analysis, we show that the empirical query time of all existing
SimRank algorithms also depends on the reverse PageRank distribution of the
graph.} Finally, we present the first experimental study that evaluates the
absolute errors of various SimRank algorithms on large graphs, and we show that
\prsim outperforms the state of the art in terms of query time, accuracy, index
size, and scalability.Comment: ACM SIGMOD 201
- …