87,839 research outputs found
Query-Specific Knowledge Graphs for Complex Finance Topics
Across the financial domain, researchers answer complex questions by
extensively "searching" for relevant information to generate long-form reports.
This workshop paper discusses automating the construction of query-specific
document and entity knowledge graphs (KGs) for complex research topics. We
focus on the CODEC dataset, where domain experts (1) create challenging
questions, (2) construct long natural language narratives, and (3) iteratively
search and assess the relevance of documents and entities. For the construction
of query-specific KGs, we show that state-of-the-art ranking systems have
headroom for improvement, with specific failings due to a lack of context or
explicit knowledge representation. We demonstrate that entity and document
relevance are positively correlated, and that entity-based query feedback
improves document ranking effectiveness. Furthermore, we construct
query-specific KGs using retrieval and evaluate using CODEC's "ground-truth
graphs", showing the precision and recall trade-offs. Lastly, we point to
future work, including adaptive KG retrieval algorithms and GNN-based weighting
methods, while highlighting key challenges such as high-quality data,
information extraction recall, and the size and sparsity of complex topic
graphs.Comment: AKBC 2022 Workshop, Knowledge Graphs in Finance and Economic
Conversational Financial Information Retrieval Model (ConFIRM)
With the exponential growth in large language models (LLMs), leveraging their
emergent properties for specialized domains like finance merits exploration.
However, regulated fields such as finance pose unique constraints, requiring
domain-optimized frameworks. We present ConFIRM, an LLM-based conversational
financial information retrieval model tailored for query intent classification
and knowledge base labeling.
ConFIRM comprises two modules:
1) a method to synthesize finance domain-specific question-answer pairs, and
2) evaluation of parameter efficient fine-tuning approaches for the query
classification task. We generate a dataset of over 4000 samples, assessing
accuracy on a separate test set.
ConFIRM achieved over 90% accuracy, essential for regulatory compliance.
ConFIRM provides a data-efficient solution to extract precise query intent for
financial dialog systems.Comment: 10 pages, 2 figures, 2 tables, 2 appendice
GPT-FinRE: In-context Learning for Financial Relation Extraction using Large Language Models
Relation extraction (RE) is a crucial task in natural language processing
(NLP) that aims to identify and classify relationships between entities
mentioned in text. In the financial domain, relation extraction plays a vital
role in extracting valuable information from financial documents, such as news
articles, earnings reports, and company filings. This paper describes our
solution to relation extraction on one such dataset REFinD. The dataset was
released along with shared task as a part of the Fourth Workshop on Knowledge
Discovery from Unstructured Data in Financial Services, co-located with SIGIR
2023. In this paper, we employed OpenAI models under the framework of
in-context learning (ICL). We utilized two retrieval strategies to find top K
relevant in-context learning demonstrations / examples from training data for a
given test example. The first retrieval mechanism, we employed, is a
learning-free dense retriever and the other system is a learning-based
retriever. We were able to achieve 3rd rank overall. Our best F1-score is
0.718.Comment: arXiv admin note: text overlap with arXiv:2305.02105 by other author
Topic-dependent sentiment analysis of financial blogs
While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches
Post processing of multimedia information - concepts, problems, and techniques
Currently, most research work on multimedia information processing is focused on multimedia information storage and retrieval, especially indexing and content-based access of multimedia information. We consider multimedia information processing should include one more level-post-processing. Here "post-processing" means further processing of retrieved multimedia information, which includes fusion of multimedia information and reasoning with multimedia information to reach new conclusions. In this paper, the three levels of multimedia information processing storage, retrieval, and post-processing- are discussed. The concepts and problems of multimedia information post-processing are identified. Potential techniques that can be used in post-processing are suggested, By highlighting the problems in multimedia information post-processing, hopefully this paper will stimulate further research on this important but ignored topic.<br /
Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain
Word embeddings have made enormous inroads in recent years in a wide variety
of text mining applications. In this paper, we explore a word embedding-based
architecture for predicting the relevance of a role between two financial
entities within the context of natural language sentences. In this extended
abstract, we propose a pooled approach that uses a collection of sentences to
train word embeddings using the skip-gram word2vec architecture. We use the
word embeddings to obtain context vectors that are assigned one or more labels
based on manual annotations. We train a machine learning classifier using the
labeled context vectors, and use the trained classifier to predict contextual
role relevance on test data. Our approach serves as a good minimal-expertise
baseline for the task as it is simple and intuitive, uses open-source modules,
requires little feature crafting effort and performs well across roles.Comment: DSMM 2017 workshop at ACM SIGMOD conferenc
- …