Search CORE

887 research outputs found

What Are People Asking About COVID-19? A Question Classification Dataset

Author: Huang Chengyu
Vosoughi Soroush
Wei Jason
Wei Jerry
Publication venue
Publication date: 15/09/2020
Field of study

We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources, which we annotate into 15 question categories and 207 question clusters. The most common questions in our dataset asked about transmission, prevention, and societal effects of COVID, and we found that many questions that appeared in multiple sources were not answered by any FAQ websites of reputable organizations such as the CDC and FDA. We post our dataset publicly at https://github.com/JerryWei03/COVID-Q. For classifying questions into 15 categories, a BERT baseline scored 58.1% accuracy when trained on 20 examples per category, and for a question clustering task, a BERT + triplet loss baseline achieved 49.5% accuracy. We hope COVID-Q can help either for direct use in developing applied systems or as a domain-specific resource for model evaluation.Comment: Published in Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 202

arXiv.org e-Print Archive

NewB: 200,000+ Sentences for Political Bias Detection

Author: Wei Jerry
Publication venue
Publication date: 04/06/2020
Field of study

We present the Newspaper Bias Dataset (NewB), a text corpus of more than 200,000 sentences from eleven news sources regarding Donald Trump. While previous datasets have labeled sentences as either liberal or conservative, NewB covers the political views of eleven popular media sources, capturing more nuanced political viewpoints than a traditional binary classification system does. We train two state-of-the-art deep learning models to predict the news source of a given sentence from eleven newspapers and find that a recurrent neural network achieved top-1, top-3, and top-5 accuracies of 33.3%, 61.4%, and 77.6%, respectively, significantly outperforming a baseline logistic regression model's accuracies of 18.3%, 42.6%, and 60.8%. Using the news source label of sentences, we analyze the top n-grams with our model to gain meaningful insight into the portrayal of Trump by media sources.We hope that the public release of our dataset will encourage further research in using natural language processing to analyze more complex political biases. Our dataset is posted at https://github.com/JerryWei03/NewB

arXiv.org e-Print Archive

Efficient chain structure for high-utility sequential pattern mining

Author: Djenouri Youcef
Fournier-Viger Philippe
Li Yuanfa
Lin Jerry Chun-Wei
Zhang Ji
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, which considers both utility and sequence factors to derive the set of high-utility sequential patterns (HUSPs) from the quantitative databases. Several works have been presented to reduce the computational cost by variants of pruning strategies. In this paper, we present an efficient sequence-utility (SU)-chain structure, which can be used to store more relevant information to improve mining performance. Based on the SU-Chain structure, the existing pruning strategies can also be utilized here to early prune the unpromising candidates and obtain the satisfied HUSPs. Experiments are then compared with the state-of-the-art HUSPM algorithms and the results showed that the SU-Chain-based model can efficiently improve the efficiency performance than the existing HUSPM algorithms in terms of runtime and number of the determined candidates

SINTEF Open

NORA - Norwegian Open Research Archives

University of Southern Queensland ePrints

Slowly expanding/evolving lesions as a magnetic resonance imaging marker of chronic active multiple sclerosis lesions.

Author: Arnold Douglas L
Barkhof Frederik
Belachew Shibeshih
Bernasconi Corrado
Elliott Colm
Hauser Stephen L
Kappos Ludwig
Wei Wei
Wolinsky Jerry S
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

BACKGROUND:Chronic lesion activity driven by smoldering inflammation is a pathological hallmark of progressive forms of multiple sclerosis (MS). OBJECTIVE:To develop a method for automatic detection of slowly expanding/evolving lesions (SELs) on conventional brain magnetic resonance imaging (MRI) and characterize such SELs in primary progressive MS (PPMS) and relapsing MS (RMS) populations. METHODS:We defined SELs as contiguous regions of existing T2 lesions showing local expansion assessed by the Jacobian determinant of the deformation between reference and follow-up scans. SEL candidates were assigned a heuristic score based on concentricity and constancy of change in T2- and T1-weighted MRIs. SELs were examined in 1334 RMS patients and 555 PPMS patients. RESULTS:Compared with RMS patients, PPMS patients had higher numbers of SELs (p = 0.002) and higher T2 volumes of SELs (p < 0.001). SELs were devoid of gadolinium enhancement. Compared with areas of T2 lesions not classified as SEL, SELs had significantly lower T1 intensity at baseline and larger decrease in T1 intensity over time. CONCLUSION:We suggest that SELs reflect chronic tissue loss in the absence of ongoing acute inflammation. SELs may represent a conventional brain MRI correlate of chronic active MS lesions and a candidate biomarker for smoldering inflammation in MS

eScholarship - University of California

FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection

Author: Chang Liang
Fournier-Viger Philippe
Li Hongzhou
Lin Jerry Chun-Wei
Zhang Ji
Zhu Xiaodong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/11/2017
Field of study

In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient

University of Southern Queensland ePrints

Privacy Preserving Utility Mining: A Survey

Author: Chao Han-Chieh
Gan Wensheng
Lin Jerry Chun-Wei
Wang Shyue-Liang
Yu Philip S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/11/2018
Field of study

In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

arXiv.org e-Print Archive

Crossref