27 research outputs found
LEARNING WORD RELATEDNESS OVER TIME FOR TEMPORAL RANKING
Queries and ranking with temporal aspects gain significant attention in field of Information Retrieval. While searching for articles published over time, the relevant documents usually occur in certain temporal patterns. Given a query that is implicitly time sensitive, we develop a temporal ranking using the important times of query by drawing from the distribution of query trend relatedness over time. We also combine the model with Dual Embedding Space Model (DESM) in the temporal model according to document timestamp. We apply our model using three temporal word embeddings algorithms to learn relatedness of words from news archive in Bahasa Indonesia: (1) QT-W2V-Rank using Word2Vec (2) QT-OW2V-Rank using OrthoTrans-Word2Vec (3) QT-DBE-Rank using Dynamic Bernoulli Embeddings. The highest score was achieved with static word embeddings learned separately over time, called QT-W2V-Rank, which is 66% in average precision and 68% in early precision. Furthermore, studies of different characteristics of temporal topics showed that QT-W2V-Rank is also more effective in capturing temporal patterns such as spikes, periodicity, and seasonality than the baselines
Gated Recurrent Neural Tensor Network
Recurrent Neural Networks (RNNs), which are a powerful scheme for modeling
temporal and sequential data need to capture long-term dependencies on datasets
and represent them in hidden layers with a powerful model to capture more
information from inputs. For modeling long-term dependencies in a dataset, the
gating mechanism concept can help RNNs remember and forget previous
information. Representing the hidden layers of an RNN with more expressive
operations (i.e., tensor products) helps it learn a more complex relationship
between the current input and the previous hidden layer information. These
ideas can generally improve RNN performances. In this paper, we proposed a
novel RNN architecture that combine the concepts of gating mechanism and the
tensor product into a single model. By combining these two concepts into a
single RNN, our proposed models learn long-term dependencies by modeling with
gating units and obtain more expressive and direct interaction between input
and hidden layers using a tensor product on 3-dimensional array (tensor) weight
parameters. We use Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit
(GRU) RNN and combine them with a tensor product inside their formulations. Our
proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural
Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor
Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the
tensor product. We conducted experiments with our proposed models on word-level
and character-level language modeling tasks and revealed that our proposed
models significantly improved their performance compared to our baseline
models.Comment: Accepted at IJCNN 2016 URL :
http://ieeexplore.ieee.org/document/7727233
Detecting Controversial Articles on Citizen Journalism
Someone\u27s understanding and stance on a particular controversial topic can be influenced by daily news or articles he consume everyday. Unfortunately, readers usually do not realize that they are reading controversial articles. In this paper, we address the problem of automatically detecting controversial article from citizen journalism media. To solve the problem, we employ a supervised machine learning approach with several hand-crafted features that exploits linguistic information, meta-data of an article, structural information in the commentary section, and sentiment expressed inside the body of an article. The experimental results shows that our proposed method manages to perform the addressed task effectively. The best performance so far is achieved when we use all proposed feature with Logistic Regression as our model (82.89\% in terms of accuracy). Moreover, we found that information from commentary section (structural features) contributes most to the classification task
Detecting Controversial Articles on Citizen Journalism
Someone's understanding and stance on a particular controversial topic can be influenced by daily news or articles he consume everyday. Unfortunately, readers usually do not realize that they are reading controversial articles. In this paper, we address the problem of automatically detecting controversial article from citizen journalism media. To solve the problem, we employ a supervised machine learning approach with several hand-crafted features that exploits linguistic information, meta-data of an article, structural information in the commentary section, and sentiment expressed inside the body of an article. The experimental results shows that our proposed method manages to perform the addressed task effectively. The best performance so far is achieved when we use all proposed feature with Logistic Regression as our model (82.89\% in terms of accuracy). Moreover, we found that information from commentary section (structural features) contributes most to the classification task
The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval
We evaluate the effectiveness of a dictionary-based cross-language text retrieval technique which uses a two-way dictionary for translating queries from their original language into the language of the text documents. As can be expected, the translated queries are not as effective as queries formulated by the users using the same language as the text documents. We then apply a local-feedback technique to expand the translated queries in order to improve their retrieval effectiveness. Our empirical results show that the technique is effective for English-Indonesian and Indonesian-English cross-language retrieval
A query ambiguity model for cross-language information retrieval
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Location Identification for the Geographic information Retrieval
Abstract. In this paper we identify location names that appear in queries written in Indonesian using geographic gazeeter. We built the gazeeter by collecting geographic information from a number of geographic resources. We translated an Indonesian query set into English using a machine translation technique. We also made an attempt to improve the retrieval effectiveness using a query expansion technique. The result shows that identifying locations in the queries and applying the query expansion technique can help improve the retrieval effectiveness for certain queries