Search CORE

3 research outputs found

MPEG motion picture coding with long-term constraint on distortion variation

Author
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date
Field of study

Large-scale document labeling using supervised sequence embedding

Author: Bespalov Dmitriy
Publication venue: Drexel University
Publication date
Field of study

A critical component in computational treatment of an automated document labeling is the choice of an appropriate representation. Proper representation captures specific phenomena of interest in data while transforming it to a format appropriate for a classifier. For a text document, a popular choice is the bag-of-words (BoW) representation that encodes presence of unique words with non-zero weights such as TF-IDF. Extending this model to long, overlapping phrases (n-grams) results in exponential explosion in the dimensionality of the representation. In this work, we develop a model that encodes long phrases in a low-dimensional latent space with a cumulative function of individual words in each phrase. In contrast to BoW, the parameter space of the proposed model grows linearly with the length of the phrase. The proposed model requires only vector additions and multiplications with scalars to compute the latent representation of phrases, which makes it applicable to large-scale text labeling problems. Several sentiment classification and binary topic categorization problems will be used to empirically evaluate the proposed representation. The same model can also encode relative spatial distribution of elements in higher-dimensional sequences. In order to verify this claim, the proposed model will be evaluated on a large-scale image classification dataset, where images are transformed into two-dimensional sequences of quantized image descriptors.Ph.D., Computer Science -- Drexel University, 201

Drexel Libraries E-Repository and Archives