78 research outputs found
Report on the Second International Workshop on Narrative Extraction from Texts (Text2Story 2019)
The Second International Workshop on Narrative Extraction from Texts (Text2Story’19 [http://text2story19.inesctec.pt/]) was held on the 14th of April 2019, in conjunction with the 41st European Conference on Information Retrieval (ECIR 2019) in Cologne, Germany. The workshop provided a platform for researchers in IR, NLP, and design and visualization to come together and share the recent advances in extraction and formal representation of narratives. The workshop consisted of two invited talks, ten research paper presentations, and a poster and demo session. The proceedings of the workshop are available online at http://ceur-ws.org/Vol-2342/info:eu-repo/semantics/publishedVersio
Evaluating Similarity Metrics for Latent Twitter Topics
Topic modelling approaches such as LDA, when applied on a tweet corpus, can often generate a topic model containing redundant topics. To evaluate the quality of a topic model in terms of redundancy, topic similarity metrics can be applied to estimate the similarity among topics in a topic model. There are various topic similarity metrics in the literature, e.g. the Jensen Shannon (JS) divergence-based metric. In this paper, we evaluate the performances of four distance/divergence-based topic similarity metrics and examine how they align with human judgements, including a newly proposed similarity metric that is based on computing word semantic similarity using word embeddings (WE). To obtain human judgements, we conduct a user study through crowdsourcing. Among various insights, our study shows that in general the cosine similarity (CS) and WE-based metrics perform better and appear to be complementary. However, we also find that the human assessors cannot easily distinguish between the distance/divergence-based and the semantic similarity-based metrics when identifying similar latent Twitter topics
Priors for Diversity and Novelty on Neural Recommender Systems
[Abstract] PRIN is a neural based recommendation method that allows the incorporation of item prior information into the recommendation process. In this work we study how the system behaves in terms of novelty and diversity under different configurations of item prior probability estimations. Our results show the versatility of the framework and how its behavior can be adapted to the desired properties, whether accuracy is preferred or diversity and novelty are the desired properties, or how a balance can be achieved with the proper selection of prior estimations.Ministerio de Ciencia, Innovación y Universidades; RTI2018-093336-B-C22Xunta de Galicia; GPC ED431B 2019/03Xunta de Galicia; ED431G/01Ministerio de Ciencia, Innovación y Universidades; FPU17/03210Ministerio de Ciencia, Innovación y Universidades; FPU014/0172
Understanding Mobile Search Task Relevance and User Behaviour in Context
Improvements in mobile technologies have led to a dramatic change in how and
when people access and use information, and is having a profound impact on how
users address their daily information needs. Smart phones are rapidly becoming
our main method of accessing information and are frequently used to perform
`on-the-go' search tasks. As research into information retrieval continues to
evolve, evaluating search behaviour in context is relatively new. Previous
research has studied the effects of context through either self-reported diary
studies or quantitative log analysis; however, neither approach is able to
accurately capture context of use at the time of searching. In this study, we
aim to gain a better understanding of task relevance and search behaviour via a
task-based user study (n=31) employing a bespoke Android app. The app allowed
us to accurately capture the user's context when completing tasks at different
times of the day over the period of a week. Through analysis of the collected
data, we gain a better understanding of how using smart phones on the go
impacts search behaviour, search performance and task relevance and whether or
not the actual context is an important factor.Comment: To appear in CHIIR 2019 in Glasgow, U
Towards Spatial Word Embeddings
Leveraging textual and spatial data provided in spatio-textual objects (eg., tweets), has become increasingly important in real-world applications, favoured by the increasing rate of their availability these last decades (eg., through smartphones). In this paper, we propose a spatial retrofitting method of word embeddings that could reveal the localised similarity of word pairs as well as the diversity of their localised meanings. Experiments based on the semantic location prediction task show that our method achieves significant improvement over strong baselines
Local and global query expansion for hierarchical complex topics
In this work we study local and global methods for query expansion for multifaceted complex topics. We study word-based and entity-based expansion methods and extend these approaches to complex topics using fine-grained expansion on different elements of the hierarchical query structure. For a source of hierarchical complex topics we use the TREC Complex Answer Retrieval (CAR) benchmark data collection. We find that leveraging the hierarchical topic structure is needed for both local and global expansion methods to be effective. Further, the results demonstrate that entity-based expansion methods show significant gains over word-based models alone, with local feedback providing the largest improvement. The results on the CAR paragraph retrieval task demonstrate that expansion models that incorporate both the hierarchical query structure and entity-based expansion result in a greater than 20% improvement over word-based expansion approaches
BUSTER: a "BUSiness Transaction Entity Recognition" dataset
Albeit Natural Language Processing has seen major breakthroughs in the last
few years, transferring such advances into real-world business cases can be
challenging. One of the reasons resides in the displacement between popular
benchmarks and actual data. Lack of supervision, unbalanced classes, noisy data
and long documents often affect real problems in vertical domains such as
finance, law and health. To support industry-oriented research, we present
BUSTER, a BUSiness Transaction Entity Recognition dataset. The dataset consists
of 3779 manually annotated documents on financial transactions. We establish
several baselines exploiting both general-purpose and domain-specific language
models. The best performing model is also used to automatically annotate 6196
documents, which we release as an additional silver corpus to BUSTER.Comment: The 2023 Conference on Empirical Methods in Natural Language
Processing (EMNLP 2023), Industry Trac
Encoder-decoder neural network for automatic news headline generation
В последнее время искусственные нейронные сети все чаще используются для задачи автоматического реферирования текста. Одним из частных случаев реферирования текста является генерация новостных заголовков. Среди многих моделей нейронных сетей, применяемых для генерации текста, самой распространенно
- …