Search CORE

47 research outputs found

Fact Check: Analyzing Financial Events from Multilingual News Sources

Author: Dong Ruihai
Ng Tin Lok James
Smyth Barry
Yang Linyi
Publication venue
Publication date: 30/06/2021
Field of study

The explosion in the sheer magnitude and complexity of financial news data in recent years makes it increasingly challenging for investment analysts to extract valuable insights and perform analysis. We propose FactCheck in finance, a web-based news aggregator with deep learning models, to provide analysts with a holistic view of important financial events from multilingual news sources and extract events using an unsupervised clustering method. A web interface is provided to examine the credibility of news articles using a transformer-based fact-checker. The performance of the fact checker is evaluated using a dataset related to merger and acquisition (M\&A) events and is shown to outperform several strong baselines.Comment: Dem

arXiv.org e-Print Archive

Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations

Author: Dong Ruihai
Li Irene
Liu Dairui
Suzumura Toyotaro
Yang Boming
Publication venue
Publication date: 21/07/2023
Field of study

Precisely recommending candidate news articles to users has always been a core challenge for personalized news recommendation systems. Most recent works primarily focus on using advanced natural language processing techniques to extract semantic information from rich textual data, employing content-based methods derived from local historical news. However, this approach lacks a global perspective, failing to account for users' hidden motivations and behaviors beyond semantic information. To address this challenge, we propose a novel model called GLORY (Global-LOcal news Recommendation sYstem), which combines global representations learned from other users with local representations to enhance personalized recommendation systems. We accomplish this by constructing a Global-aware Historical News Encoder, which includes a global news graph and employs gated graph neural networks to enrich news representations, thereby fusing historical news representations by a historical news aggregator. Similarly, we extend this approach to a Global Candidate News Encoder, utilizing a global entity graph and a candidate news aggregator to enhance candidate news representation. Evaluation results on two public news datasets demonstrate that our method outperforms existing approaches. Furthermore, our model offers more diverse recommendations.Comment: 10 pages, Recsys 202

arXiv.org e-Print Archive

NumHTML : numeric-oriented hierarchical transformer model for multi-task financial forecasting

Author: Dong Ruihai
Li Jiazheng
Smyth Barry
Yang Linyi
Zhang Yue
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 05/01/2022
Field of study

Financial forecasting has been an important and active area of machine learning research because of the challenges it presents and the potential rewards that even minor improvements in prediction accuracy or forecasting may entail. Traditionally, financial forecasting has heavily relied on quantitative indicators and metrics derived from structured financial statements. Earnings conference call data, including text and audio, is an important source of unstructured data that has been used for various prediction tasks using deep earning and related approaches. However, current deep learning-based methods are limited in the way that they deal with numeric data; numbers are typically treated as plain-text tokens without taking advantage of their underlying numeric structure. This paper describes a numeric-oriented hierarchical transformer model (NumHTML) to predict stock returns, and financial risk using multi-modal aligned earnings calls data by taking advantage of the different categories of numbers (monetary, temporal, percentages etc.) and their magnitude. We present the results of a comprehensive evaluation of NumHTML against several state-of-the-art baselines using a real-world publicly available dataset. The results indicate that NumHTML significantly outperforms the current state-of-the-art across a variety of evaluation metrics and that it has the potential to offer significant financial gains in a practical trading context

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Association for the Advancement of Artificial Intelligence: AAAI Publications

Lensless polarimetric coded ptychography (pol-CP) for high-resolution, high-throughput birefringence imaging on a chip

Author: Guo Chengfei
Jiang Shaowei
Pandey Rishikesh
Shao Xiaopeng
Song Pengming
Wang Ruihai
Wang Tianbo
Yang Liming
Zhao Qianhao
Zheng Guoan
Publication venue
Publication date: 01/06/2023
Field of study

Polarimetric imaging provides valuable insights into the polarization state of light interacting with a sample. It can infer crucial birefringence properties of bio-specimens without using any labels, thereby facilitating the diagnosis of diseases such as cancer and osteoarthritis. In this study, we introduce a novel polarimetric coded ptychography (pol-CP) approach that enables high-resolution, high-throughput birefringence imaging on a chip. Our platform deviates from traditional lens-based polarization systems by employing an integrated polarimetric coded sensor for lensless diffraction data acquisition. Utilizing Jones calculus, we quantitatively determine the birefringence retardance and orientation information of bio-specimens from four recovered intensity images. Our portable pol-CP prototype can resolve the 435-nm linewidth on the resolution target and the imaging field of view for a single acquisition is limited only by the detector size of 41 mm^2. The prototype allows for the acquisition of gigapixel birefringence images with a 180-mm^2 field of view in ~3.5 minutes, achieving an imaging throughput comparable to that of a conventional whole slide scanner. To demonstrate its biomedical applications, we perform high-throughput imaging of malaria-infected blood smears, locating parasites using birefringence contrast. We also generate birefringence maps of label-free thyroid smears to identify thyroid follicles. Notably, the recovered birefringence maps emphasize the same regions as autofluorescence images, indicating the potential for rapid on-site evaluation of label-free biopsies. The reported approach offers a portable, turnkey solution for high-resolution, high-throughput polarimetric analysis without using lenses, with potential applications in disease diagnosis, sample screening, and label-free chemical imaging

arXiv.org e-Print Archive

Status of cardiovascular health among adults in a rural area of Northwest China: Results from a cross-sectional study.

Author: Cao Lei
Dang Shaonong
Li Qiang
Liu Ruru
Marshall Roger J
Pei Leilei
Wang Duolao
Yan Hong
Yang Ruihai
Zhao Yaling
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/07/2016
Field of study

The aim of this study was to assess the status of cardiovascular health among a rural population in Northwest China and to determine the associated factors for cardiovascular health.A population-based cross-sectional study was conducted in the rural areas of Hanzhong in Northwest China. Interview, physical examination, and fasting blood glucose and lipid measurements were completed for 2693 adults. The construct of cardiovascular health and the definitions of cardiovascular health metrics proposed by the American Heart Association were used to assess cardiovascular health. The proportions of subjects with cardiovascular health metrics were calculated, adjusting for age and sex. The multiple logistic regression model was used to evaluate the association between ideal cardiovascular health and its associated factors.Only 0.5% (0.0% in men vs 0.9% in women, P = 0.002) of the participants had ideal cardiovascular health, whereas 33.8% (18.0% in men vs 50.0% in women, P < 0.001) and 65.7% (82.0% in men vs 49.1% in women, P < 0.001) of the participants had intermediate and poor cardiovascular health, respectively. The prevalence of poor cardiovascular health increased with increasing age (P < 0.001 for trend). Participants fulfilled, on average, 4.4 (95% confidence interval: 4.2-4.7) of the ideal cardiovascular health metrics. Also, 22.2% of the participants presented with 3 or fewer ideal metrics. Only 19.4% of the participants presented with 6 or more ideal metrics. 24.1% of the participants had all 4 ideal health factors, but only 1.1% of the participants had all 4 ideal health behaviors. Women were more likely to have ideal cardiovascular health, whereas adults aged 35 years or over and those who had a family history of hypertension were less likely to have ideal cardiovascular health.The prevalence of ideal cardiovascular health was extremely low among the rural population in Northwest China. Most adults, especially men and the elderly, had a poor cardiovascular health status. To improve cardiovascular health among the rural population, efforts, especially lifestyle improvements, education and interventions to make healthier food choices, reduce salt intake, increase physical activities, and cease smoking, will be required at the individual, population, and social levels

LSTM Online Archive

Crossref

PubMed Central

Recommended from our members

Global land surface temperature influenced by vegetation cover and PM2.5 from 2001 to 2016

Author: Ge Wei
Han Xujun
Li Qiuping
Li Ruihai
Liu Siyao
Ma Mingguo
Qiu Ruiyang
Shi Weiyu
Song Lisheng
Song Zengjng
Tan Chao
Tang Xuguang
Yang Hong
Yu Wenping
Publication venue: 'MDPI AG'
Publication date: 01/12/2018
Field of study

Land surface temperature (LST) is an important parameter to evaluate environmental changes. In this paper, time series analysis was conducted to estimate the interannual variations in global LST from 2001 to 2016 based on moderate resolution imaging spectroradiometer (MODIS) LST, and normalized difference vegetation index (NDVI) products and fine particulate matter (PM2.5) data from the Atmospheric Composition Analysis Group. The results showed that LST, seasonally integrated normalized difference vegetation index (SINDVI), and PM2.5 increased by 0.17 K, 0.04, and 1.02 �g/m3 in the period of 2001–2016, respectively. During the past 16 years, LST showed an increasing trend in most areas, with two peaks of 1.58 K and 1.85 K at 72�N and 48�S, respectively. Marked warming also appeared in the Arctic. On the contrary, remarkable decrease in LST occurred in Antarctic. In most parts of the world, LST was affected by the variation in vegetation cover and air pollutant, which can be detected by the satellite. In the Northern Hemisphere, positive relations between SINDVI and LST were found; however, in the Southern Hemisphere, negative correlations were detected. The impact of PM2.5 on LST was more complex. On the whole, LST increased with a small increase in PM2.5 concentrations but decreased with a marked increase in PM2.5. The study provides insights on the complex relationship between vegetation cover, air pollution, and land surface temperature

Central Archive at the University of Reading

Directory of Open Access Journals

MAEC: A Multimodal Aligned Earnings Conference Call Dataset for Financial Risk Prediction

Author: Dong Ruihai
Li Jiazheng
Smyth Barry
Yang Linyi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/09/2020
Field of study

The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, (CIKM '20), 19-23 October 2020In the area of natural language processing, various financial datasets have informed recent research and analysis including financial news, financial reports, social media, and audio data from earnings calls. We introduce a new, large-scale multi-modal, text-audio paired, earnings-call dataset named MAEC, based on S&P 1500 companies. We describe the main features of MAEC, how it was collected and assembled, paying particular attention to the text-audio alignment process used. We present the approach used in this work as providing a suitable framework for processing similar forms of data in the future. The resulting dataset is more than six times larger than those currently available to the research community and we discuss its potential in terms of current and future research challenges and opportunities. All resources of this work are available at https://github.com/Earnings-Call-Dataset/Science Foundation IrelandInsight Research Centre2020-10-06 JG: PDF replaced with correct versio

Crossref

Research Repository UCD

Leveraging BERT to Improve the FEARS Index for Stock Forecasting

Author: Dong Ruihai
Ng Tin Lok James
Xu Yang
Yang Linyi
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

The First Workshop on Financial Technology and Natural Language Processing, Macao, China, 12 August 2019Financial and Economic Attitudes Revealed by Search (FEARS) index reflects the attention and sentiment of public investors and is an important factor for predicting stock price return. In this paper, we take into account the semantics of the FEARS search terms by leveraging the Bidirectional Encoder Representations from Transformers (BERT), and further apply a self-attention deep learning model to our refined FEARS seamlessly for stock return prediction. We demonstrate the practical benefits of our approach by comparing to baseline works.Science Foundation Irelan

Research Repository UCD

Irish Universities

Research Online