887 research outputs found
What Are People Asking About COVID-19? A Question Classification Dataset
We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources,
which we annotate into 15 question categories and 207 question clusters. The
most common questions in our dataset asked about transmission, prevention, and
societal effects of COVID, and we found that many questions that appeared in
multiple sources were not answered by any FAQ websites of reputable
organizations such as the CDC and FDA. We post our dataset publicly at
https://github.com/JerryWei03/COVID-Q. For classifying questions into 15
categories, a BERT baseline scored 58.1% accuracy when trained on 20 examples
per category, and for a question clustering task, a BERT + triplet loss
baseline achieved 49.5% accuracy. We hope COVID-Q can help either for direct
use in developing applied systems or as a domain-specific resource for model
evaluation.Comment: Published in Proceedings of the 1st Workshop on NLP for COVID-19 at
ACL 202
NewB: 200,000+ Sentences for Political Bias Detection
We present the Newspaper Bias Dataset (NewB), a text corpus of more than
200,000 sentences from eleven news sources regarding Donald Trump. While
previous datasets have labeled sentences as either liberal or conservative,
NewB covers the political views of eleven popular media sources, capturing more
nuanced political viewpoints than a traditional binary classification system
does. We train two state-of-the-art deep learning models to predict the news
source of a given sentence from eleven newspapers and find that a recurrent
neural network achieved top-1, top-3, and top-5 accuracies of 33.3%, 61.4%, and
77.6%, respectively, significantly outperforming a baseline logistic regression
model's accuracies of 18.3%, 42.6%, and 60.8%. Using the news source label of
sentences, we analyze the top n-grams with our model to gain meaningful insight
into the portrayal of Trump by media sources.We hope that the public release of
our dataset will encourage further research in using natural language
processing to analyze more complex political biases.
Our dataset is posted at https://github.com/JerryWei03/NewB
Efficient chain structure for high-utility sequential pattern mining
High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, which considers both utility and sequence factors to derive the set of high-utility sequential patterns (HUSPs) from the quantitative databases. Several works have been presented to reduce the computational cost by variants of pruning strategies. In this paper, we present an efficient sequence-utility (SU)-chain structure, which can be used to store more relevant information to improve mining performance. Based on the SU-Chain structure, the existing pruning strategies can also be utilized here to early prune the unpromising candidates and obtain the satisfied HUSPs. Experiments are then compared with the state-of-the-art HUSPM algorithms and the results showed that the SU-Chain-based model can efficiently improve the efficiency performance than the existing HUSPM algorithms in terms of runtime and number of the determined candidates
Slowly expanding/evolving lesions as a magnetic resonance imaging marker of chronic active multiple sclerosis lesions.
BACKGROUND:Chronic lesion activity driven by smoldering inflammation is a pathological hallmark of progressive forms of multiple sclerosis (MS). OBJECTIVE:To develop a method for automatic detection of slowly expanding/evolving lesions (SELs) on conventional brain magnetic resonance imaging (MRI) and characterize such SELs in primary progressive MS (PPMS) and relapsing MS (RMS) populations. METHODS:We defined SELs as contiguous regions of existing T2 lesions showing local expansion assessed by the Jacobian determinant of the deformation between reference and follow-up scans. SEL candidates were assigned a heuristic score based on concentricity and constancy of change in T2- and T1-weighted MRIs. SELs were examined in 1334 RMS patients and 555 PPMS patients. RESULTS:Compared with RMS patients, PPMS patients had higher numbers of SELs (p = 0.002) and higher T2 volumes of SELs (p < 0.001). SELs were devoid of gadolinium enhancement. Compared with areas of T2 lesions not classified as SEL, SELs had significantly lower T1 intensity at baseline and larger decrease in T1 intensity over time. CONCLUSION:We suggest that SELs reflect chronic tissue loss in the absence of ongoing acute inflammation. SELs may represent a conventional brain MRI correlate of chronic active MS lesions and a candidate biomarker for smoldering inflammation in MS
FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection
In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
- …