Search CORE

1,471 research outputs found

Distributed Representations of Sentences and Documents

Author: Le Quoc V.
Mikolov Tomas
Publication venue
Publication date: 22/05/2014
Field of study

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks

arXiv.org e-Print Archive

CiteSeerX

TED Talk Recommender Using Speech Transcripts

Author: Kwon Ilbong
Lee Injung
Lee Jae-Gil
Oh Jaehoon
Seonwoo Yeon
Sung Simin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/09/2018
Field of study

Nowadays, online video platforms mostly recommend related videos by analyzing user-driven data such as viewing patterns, rather than the content of the videos. However, content is more important than any other element when videos aim to deliver knowledge. Therefore, we have developed a web application which recommends related TED lecture videos to the users, considering the content of the videos from the transcripts. TED Talk Recommender constructs a network for recommending videos that are similar content-wise and providing a user interface.Comment: 3 page

arXiv.org e-Print Archive

Crossref