17,797 research outputs found
Fine-graind Image Classification via Combining Vision and Language
Fine-grained image classification is a challenging task due to the large
intra-class variance and small inter-class variance, aiming at recognizing
hundreds of sub-categories belonging to the same basic-level category. Most
existing fine-grained image classification methods generally learn part
detection models to obtain the semantic parts for better classification
accuracy. Despite achieving promising results, these methods mainly have two
limitations: (1) not all the parts which obtained through the part detection
models are beneficial and indispensable for classification, and (2)
fine-grained image classification requires more detailed visual descriptions
which could not be provided by the part locations or attribute annotations. For
addressing the above two limitations, this paper proposes the two-stream model
combining vision and language (CVL) for learning latent semantic
representations. The vision stream learns deep representations from the
original visual information via deep convolutional neural network. The language
stream utilizes the natural language descriptions which could point out the
discriminative parts or characteristics for each image, and provides a flexible
and compact way of encoding the salient visual aspects for distinguishing
sub-categories. Since the two streams are complementary, combining the two
streams can further achieves better classification accuracy. Comparing with 12
state-of-the-art methods on the widely used CUB-200-2011 dataset for
fine-grained image classification, the experimental results demonstrate our CVL
approach achieves the best performance.Comment: 9 pages, to appear in CVPR 201
Short user-generated videos classification using accompanied audio categories
This paper investigates the classification of short user-generated videos (UGVs) using the accompanied audio data since short UGVs accounts for a great proportion of the Internet UGVs and many short UGVs are accompanied by singlecategory soundtracks. We define seven types of UGVs corresponding to seven audio categories respectively. We also investigate three modeling approaches for audio feature representation, namely, single Gaussian (1G), Gaussian mixture (GMM) and Bag-of-Audio-Word (BoAW) models. Then using Support Vector Machine (SVM) with three different distance measurements corresponding to three feature representations, classifiers are trained to categorize the UGVs. The accompanying evaluation results show that these approaches are effective for categorizing the short UGVs based on their audio track. Experimental results show that a GMM representation with approximated Bhattacharyya distance (ABD) measurement produces the best performance, and BoAW representation with chi-square kernel also reports comparable results
Chi-square-based scoring function for categorization of MEDLINE citations
Objectives: Text categorization has been used in biomedical informatics for
identifying documents containing relevant topics of interest. We developed a
simple method that uses a chi-square-based scoring function to determine the
likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our
procedure requires construction of a genetic and a nongenetic domain document
corpus. We used MeSH descriptors assigned to MEDLINE citations for this
categorization task. We compared frequencies of MeSH descriptors between two
corpora applying chi-square test. A MeSH descriptor was considered to be a
positive indicator if its relative observed frequency in the genetic domain
corpus was greater than its relative observed frequency in the nongenetic
domain corpus. The output of the proposed method is a list of scores for all
the citations, with the highest score given to those citations containing MeSH
descriptors typical for the genetic domain. Results: Validation was done on a
set of 734 manually annotated MEDLINE citations. It achieved predictive
accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method
by comparing it to three machine learning algorithms (support vector machines,
decision trees, na\"ive Bayes). Although the differences were not statistically
significantly different, results showed that our chi-square scoring performs as
good as compared machine learning algorithms. Conclusions: We suggest that the
chi-square scoring is an effective solution to help categorize MEDLINE
citations. The algorithm is implemented in the BITOLA literature-based
discovery support system as a preprocessor for gene symbol disambiguation
process.Comment: 34 pages, 2 figure
An Investigation into the Pedagogical Features of Documents
Characterizing the content of a technical document in terms of its learning
utility can be useful for applications related to education, such as generating
reading lists from large collections of documents. We refer to this learning
utility as the "pedagogical value" of the document to the learner. While
pedagogical value is an important concept that has been studied extensively
within the education domain, there has been little work exploring it from a
computational, i.e., natural language processing (NLP), perspective. To allow a
computational exploration of this concept, we introduce the notion of
"pedagogical roles" of documents (e.g., Tutorial and Survey) as an intermediary
component for the study of pedagogical value. Given the lack of available
corpora for our exploration, we create the first annotated corpus of
pedagogical roles and use it to test baseline techniques for automatic
prediction of such roles.Comment: 12th Workshop on Innovative Use of NLP for Building Educational
Applications (BEA) at EMNLP 2017; 12 page
- …