30 research outputs found
Efficient querying and resource management using distributed presence information in converged networks
Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging
The U.S. Securities and Exchange Commission (SEC) mandates all public
companies to file periodic financial statements that should contain numerals
annotated with a particular label from a taxonomy. In this paper, we formulate
the task of automating the assignment of a label to a particular numeral span
in a sentence from an extremely large label set. Towards this task, we release
a dataset, Financial Numeric Extreme Labelling (FNXL), annotated with 2,794
labels. We benchmark the performance of the FNXL dataset by formulating the
task as (a) a sequence labelling problem and (b) a pipeline with span
extraction followed by Extreme Classification. Although the two approaches
perform comparably, the pipeline solution provides a slight edge for the least
frequent labels.Comment: Accepted to ACL'23 Findings Pape
Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling
We study the problem of automatically annotating relevant numerals (GAAP
metrics) occurring in the financial documents with their corresponding XBRL
tags. Different from prior works, we investigate the feasibility of solving
this extreme classification problem using a generative paradigm through
instruction tuning of Large Language Models (LLMs). To this end, we leverage
metric metadata information to frame our target outputs while proposing a
parameter efficient solution for the task using LoRA. We perform experiments on
two recently released financial numeric labeling datasets. Our proposed model,
FLAN-FinXC, achieves new state-of-the-art performances on both the datasets,
outperforming several strong baselines. We explain the better scores of our
proposed model by demonstrating its capability for zero-shot as well as the
least frequently occurring tags. Also, even when we fail to predict the XBRL
tags correctly, our generated output has substantial overlap with the
ground-truth in majority of the cases.Comment: This work has been accepted to appear at North American Chapter of
the Association for Computational Linguistics (NAACL), 202
FinRED: A Dataset for Relation Extraction in Financial Domain
Relation extraction models trained on a source domain cannot be applied on a
different target domain due to the mismatch between relation sets. In the
current literature, there is no extensive open-source relation extraction
dataset specific to the finance domain. In this paper, we release FinRED, a
relation extraction dataset curated from financial news and earning call
transcripts containing relations from the finance domain. FinRED has been
created by mapping Wikidata triplets using distance supervision method. We
manually annotate the test data to ensure proper evaluation. We also experiment
with various state-of-the-art relation extraction models on this dataset to
create the benchmark. We see a significant drop in their performance on FinRED
compared to the general relation extraction datasets which tells that we need
better models for financial relation extraction.Comment: Accepted at FinWeb at WWW'2
TRACCS: Trajectory-Aware Coordinated Urban Crowd-Sourcing
We investigate the problem of large-scale mobile crowd-tasking, where a large pool of citizen crowd-workers are used to perform a variety of location-specific urban logis-tics tasks. Current approaches to such mobile crowd-tasking are very decentralized: a crowd-tasking platform usually pro-vides each worker a set of available tasks close to the worker’s current location; each worker then independently chooses which tasks she wants to accept and perform. In contrast, we propose TRACCS, a more coordinated task assignment ap-proach, where the crowd-tasking platform assigns a sequence of tasks to each worker, taking into account their expected location trajectory over a wider time horizon, as opposed to just instantaneous location. We formulate such task assign-ment as an optimization problem, that seeks to maximize the total payoff from all assigned tasks, subject to a maximum bound on the detour (from the expected path) that a worker will experience to complete her assigned tasks. We develop credible computationally-efficient heuristics to address this optimization problem (whose exact solution requires solving a complex integer linear program), and show, via simulations with realistic topologies and commuting patterns, that a spe-cific heuristic (called Greedy-ILS) increases the fraction of assigned tasks by more than 20%, and reduces the average detour overhead by more than 60%, compared to the current decentralized approach
TASKer: Behavioral insights via campus-based experimental mobile crowd-sourcing
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ