3,696 research outputs found
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
The Ensemble MESH-Term Query Expansion Models Using Multiple LDA Topic Models and ANN Classifiers in Health Information Retrieval
Information retrieval in the health field has several challenges. Health information terminology is difficult for consumers (laypeople) to understand. Formulating a query with professional terms is not easy for consumers because health-related terms are more familiar to health professionals. If health terms related to a query are automatically added, it would help consumers to find relevant information. The proposed query expansion (QE) models show how to expand a query using MeSH (Medical Subject Headings) terms. The documents were represented by MeSH terms (i.e. Bag-of-MeSH), which were included in the full-text articles. And then the MeSH terms were used to generate LDA (Latent Dirichlet Analysis) topic models. A query and the top k retrieved documents were used to find MeSH terms as topic words related to the query.
LDA topic words were filtered by 1) threshold values of topic probability (TP) and word probability (WP) or 2) an ANN (Artificial Neural Network) classifier. Threshold values were effective in an LDA model with a specific number of topics to increase IR performance in terms of infAP (inferred Average Precision) and infNDCG (inferred Normalized Discounted Cumulative Gain), which are common IR metrics for large data collections with incomplete judgments. The top k words were chosen by the word score based on (TP *WP) and retrieved document ranking in an LDA model with specific thresholds. The QE model with specific thresholds for TP and WP showed improved mean infAP and infNDCG scores in an LDA model, comparing with the baseline result. However, the threshold values optimized for a particular LDA model did not perform well in other LDA models with different numbers of topics.
An ANN classifier was employed to overcome the weakness of the QE model depending on LDA thresholds by automatically categorizing MeSH terms (positive/negative/neutral) for QE. ANN classifiers were trained on word features related to the LDA model and collection. Two types of QE models (WSW & PWS) using an LDA model and an ANN classifier were proposed: 1) Word Score Weighting (WSW) where the probability of being a positive/negative/neutral word was used to weight the original word score, and 2) Positive Word Selection (PWS) where positive words were identified by the ANN classifier. Forty WSW models showed better average mean infAP and infNDCG scores than the PWS models when the top 7 words were selected for QE. Both approaches based on a binary ANN classifier were effective in increasing infAP and infNDCG, statistically, significantly, compared with the scores of the baseline run. A 3-class classifier performed worse than the binary classifier.
The proposed ensemble QE models integrated multiple ANN classifiers with multiple LDA models. Ensemble QE models combined multiple WSW/PWS models and one or multiple classifiers. Multiple classifiers were more effective in selecting relevant words for QE than one classifier. In ensemble QE (WSW/PWS) models, the top k words added to the original queries were effective to increase infAP and infNDCG scores. The ensemble QE model (WSW) using three classifiers showed statistically significant improvements for infAP and infNDCG in the mean scores for 30 queries when the top 3 words were added. The ensemble QE model (PWS) using four classifiers showed statistically significant improvements for 30 queries in the mean infAP and infNDCG scores
Subject-relevant Document Recommendation: A Reference Topic-Based Approach
Knowledge-intensive workers, such as academic researchers, medical professionals or patent engineers, have a demanding need of searching information relevant to their work. Content-based recommender system (CBRS) makes recommendation by analyzing similarity of textual contents between documents and users’ preferences. Although content-based filtering has been one of the promising approaches to document recommendations, it encounters the over-specialization problem. CBRS tends to recommend documents that are similar to what have been in user’s preference profile. Rationally, citations in an article represent the intellectual/affective balance of the individual interpretation in time and domain understanding. A cited article shall be associated with and may reflect the subject domain of its citing articles. Our study addresses the over-specialization problem to support the information needs of researchers. We propose a Reference Topic-based Document Recommendation (RTDR) technique, which exploits the citation information of a focal user’s preferred documents and thereby recommends documents that are relevant to the subject domain of his or her preference. Our primary evaluation results suggest the outperformance of the proposed RTDR to the benchmarks
Multi-label Ferns for Efficient Recognition of Musical Instruments in Recordings
In this paper we introduce multi-label ferns, and apply this technique for
automatic classification of musical instruments in audio recordings. We compare
the performance of our proposed method to a set of binary random ferns, using
jazz recordings as input data. Our main result is obtaining much faster
classification and higher F-score. We also achieve substantial reduction of the
model size
- …