606 research outputs found
Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation
To acquire noun phrases from running texts is useful for many applications,
such as word grouping,terminology indexing, etc. The reported literatures adopt
pure probabilistic approach, or pure rule-based noun phrases grammar to tackle
this problem. In this paper, we apply a probabilistic chunker to deciding the
implicit boundaries of constituents and utilize the linguistic knowledge to
extract the noun phrases by a finite state mechanism. The test texts are
SUSANNE Corpus and the results are evaluated by comparing the parse field of
SUSANNE Corpus automatically. The results of this preliminary experiment are
encouraging.Comment: 8 pages, Postscript file, Unix compressed, uuencode
Numeral Understanding in Financial Tweets for Fine-grained Crowd-based Forecasting
Numerals that contain much information in financial documents are crucial for
financial decision making. They play different roles in financial analysis
processes. This paper is aimed at understanding the meanings of numerals in
financial tweets for fine-grained crowd-based forecasting. We propose a
taxonomy that classifies the numerals in financial tweets into 7 categories,
and further extend some of these categories into several subcategories. Neural
network-based models with word and character-level encoders are proposed for
7-way classification and 17-way classification. We perform backtest to confirm
the effectiveness of the numeric opinions made by the crowd. This work is the
first attempt to understand numerals in financial social media data, and we
provide the first comparison of fine-grained opinion of individual investors
and analysts based on their forecast price. The numeral corpus used in our
experiments, called FinNum 1.0 , is available for research purposes.Comment: Accepted by the 2018 IEEE/WIC/ACM International Conference on Web
Intelligence (WI 2018), Santiago, Chil
Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations
Large language models (LMs) have exhibited superior in-context learning (ICL)
ability to adopt to target tasks by prompting with a few input-output
demonstrations. Towards better ICL, different methods are proposed to select
representative demonstrations from existing training corpora. However, such a
setting is not aligned with real-world practices, as end-users usually query
LMs without accesses to demonstration pools. Inspired by evidence suggesting
LMs' zero-shot capabilities are underrated, and the role of demonstrations are
primarily for exposing models' intrinsic functionalities, we introduce
Self-ICL, a simple framework for zero-shot ICL. Given a test input, Self-ICL
first prompts the model to generate pseudo-inputs. Next, the model predicts
pseudo-labels for the pseudo-inputs via zero-shot prompting. Finally, we
construct pseudo-demonstrations from pseudo-input-label pairs, and perform ICL
for the test input. Evaluation on BIG-Bench Hard shows Self-ICL steadily
surpasses zero-shot and zero-shot chain-of-thought baselines on head-to-head
and all-task average performance. Our findings suggest the possibility to
bootstrap LMs' intrinsic capabilities towards better zero-shot performance.Comment: Work in progres
- …