Search CORE

4,293 research outputs found

Using the Web to Overcome Data Sparseness

Author: Keller Frank
Lapata Maria
Ourioupina Olga
Publication venue
Publication date: 01/01/2002
Field of study

This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verbobject bigrams from the web by querying a search engine. We evaluate this method by demonstrating that web frequencies and correlate with frequencies obtained from a carefully edited, balanced corpus

CiteSeerX

Crossref

Edinburgh Research Explorer

Language technologies and the evolution of the semantic web

Author: Motta Enrico
Sabou Marta
Publication venue
Publication date: 01/01/2006
Field of study

The availability of huge amounts of semantic markup on the Web promises to enable a quantum leap in the level of support available to Web users for locating, aggregating, sharing, interpreting and customizing information. While we cannot claim that a large scale Semantic Web already exists, a number of applications have been produced, which generate and exploit semantic markup, to provide advanced search and querying functionalities, and to allow the visualization and management of heterogeneous, distributed data. While these tools provide evidence of the feasibility and tremendous potential value of the enterprise, they all suffer from major limitations, to do primarily with the limited degree of scale and heterogeneity of the semantic data they use. Nevertheless, we argue that we are at a key point in the brief history of the Semantic Web and that the very latest demonstrators already give us a glimpse of what future applications will look like. In this paper, we describe the already visible effects of these changes by analyzing the evolution of Semantic Web tools from smart databases towards applications that harness collective intelligence. We also point out that language technology plays an important role in making this evolution sustainable and we highlight the need for improved support, especially in the area of large-scale linguistic resources

CiteSeerX

Open Research Online (The Open University)

A Simple Iterative Algorithm for Parsimonious Binary Kernel Fisher Discrimination

Author: B Chien
B Efron
B Krishnapuram
B Schölkopf
CM Bishop
D Hunter
D Masip
E Andelić
G Baudat
G Rätsch
J Lu
J Yang
J Zhu
K Fukunaga
K Lange
Kitsuchart Pasupa
M Figueiredo
M Last
M Osborne
M. Figueiredo
N Hsieh
R Duda
R Dutter
R Harrison
Robert F. Harrison
S Abe
S Billings
S Keerthi
S Mika
T Hastie
V Roth
Y Park
Y Sun
Y Washizawa
Y Xu
Y Xu
Y Xu
Z Liang
Z Liang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2010
Field of study

By applying recent results in optimization theory variously known as optimization transfer or majorize/minimize algorithms, an algorithm for binary, kernel, Fisher discriminant analysis is introduced that makes use of a non-smooth penalty on the coefficients to provide a parsimonious solution. The problem is converted into a smooth optimization that can be solved iteratively with no greater overhead than iteratively re-weighted least-squares. The result is simple, easily programmed and is shown to perform, in terms of both accuracy and parsimony, as well as or better than a number of leading machine learning algorithms on two well-studied and substantial benchmarks

Southampton (e-Prints Soton)

Crossref

White Rose Research Online

Role of Matrix Factorization Model in Collaborative Filtering Algorithm: A Survey

Author: Bokde Dheeraj kumar
Girase Sheetal
Mukhopadhyay Debajyoti
Publication venue
Publication date: 25/03/2015
Field of study

Recommendation Systems apply Information Retrieval techniques to select the online information relevant to a given user. Collaborative Filtering is currently most widely used approach to build Recommendation System. CF techniques uses the user behavior in form of user item ratings as their information source for prediction. There are major challenges like sparsity of rating matrix and growing nature of data which is faced by CF algorithms. These challenges are been well taken care by Matrix Factorization. In this paper we attempt to present an overview on the role of different MF model to address the challenges of CF algorithms, which can be served as a roadmap for research in this area.Comment: 8 pages, 1 figure in IJAFRC, Vol.1, Issue 12, December 201

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English

Author: Ataman Duygu
Federico Marcello
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/06/2017
Field of study

The necessity of using a fixed-size word vocabulary in order to control the model complexity in state-of-the-art neural machine translation (NMT) systems is an important bottleneck on performance, especially for morphologically rich languages. Conventional methods that aim to overcome this problem by using sub-word or character-level representations solely rely on statistics and disregard the linguistic properties of words, which leads to interruptions in the word structure and causes semantic and syntactic losses. In this paper, we propose a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language. Our method is based on unsupervised morphology learning and can be, in principle, used for pre-processing any language pair. We also present an alternative word segmentation method based on supervised morphological analysis, which aids us in measuring the accuracy of our model. We evaluate our method in Turkish-to-English NMT task where the input language is morphologically rich and agglutinative. We analyze different representation methods in terms of translation accuracy as well as the semantic and syntactic properties of the generated output. Our method obtains a significant improvement of 2.3 BLEU points over the conventional vocabulary reduction technique, showing that it can provide better accuracy in open vocabulary translation of morphologically rich languages.Comment: The 20th Annual Conference of the European Association for Machine Translation (EAMT), Research Paper, 12 page

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Directory of Open Access Journals

Embedding Semantic Relations into Word Representations

Author: Bollegala Danushka
Kawarabayashi Ken-ichi
Maehara Takanori
Publication venue
Publication date: 01/01/2015
Field of study

Learning representations for semantic relations is important for various tasks such as analogy detection, relational search, and relation classification. Although there have been several proposals for learning representations for individual words, learning word representations that explicitly capture the semantic relations between words remains under developed. We propose an unsupervised method for learning vector representations for words such that the learnt representations are sensitive to the semantic relations that exist between two words. First, we extract lexical patterns from the co-occurrence contexts of two words in a corpus to represent the semantic relations that exist between those two words. Second, we represent a lexical pattern as the weighted sum of the representations of the words that co-occur with that lexical pattern. Third, we train a binary classifier to detect relationally similar vs. non-similar lexical pattern pairs. The proposed method is unsupervised in the sense that the lexical pattern pairs we use as train data are automatically sampled from a corpus, without requiring any manual intervention. Our proposed method statistically significantly outperforms the current state-of-the-art word representations on three benchmark datasets for proportional analogy detection, demonstrating its ability to accurately capture the semantic relations among words.Comment: International Joint Conferences in AI (IJCAI) 201

arXiv.org e-Print Archive

University of Liverpool Repository