4,382 research outputs found
Empirical Study of Deep Learning for Text Classification in Legal Document Review
Predictive coding has been widely used in legal matters to find relevant or
privileged documents in large sets of electronically stored information. It
saves the time and cost significantly. Logistic Regression (LR) and Support
Vector Machines (SVM) are two popular machine learning algorithms used in
predictive coding. Recently, deep learning received a lot of attentions in many
industries. This paper reports our preliminary studies in using deep learning
in legal document review. Specifically, we conducted experiments to compare
deep learning results with results obtained using a SVM algorithm on the four
datasets of real legal matters. Our results showed that CNN performed better
with larger volume of training dataset and should be a fit method in the text
classification in legal industry.Comment: 2018 IEEE International Conference on Big Data (Big Data
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
An Interpretable Machine Vision Approach to Human Activity Recognition using Photoplethysmograph Sensor Data
The current gold standard for human activity recognition (HAR) is based on
the use of cameras. However, the poor scalability of camera systems renders
them impractical in pursuit of the goal of wider adoption of HAR in mobile
computing contexts. Consequently, researchers instead rely on wearable sensors
and in particular inertial sensors. A particularly prevalent wearable is the
smart watch which due to its integrated inertial and optical sensing
capabilities holds great potential for realising better HAR in a non-obtrusive
way. This paper seeks to simplify the wearable approach to HAR through
determining if the wrist-mounted optical sensor alone typically found in a
smartwatch or similar device can be used as a useful source of data for
activity recognition. The approach has the potential to eliminate the need for
the inertial sensing element which would in turn reduce the cost of and
complexity of smartwatches and fitness trackers. This could potentially
commoditise the hardware requirements for HAR while retaining the functionality
of both heart rate monitoring and activity capture all from a single optical
sensor. Our approach relies on the adoption of machine vision for activity
recognition based on suitably scaled plots of the optical signals. We take this
approach so as to produce classifications that are easily explainable and
interpretable by non-technical users. More specifically, images of
photoplethysmography signal time series are used to retrain the penultimate
layer of a convolutional neural network which has initially been trained on the
ImageNet database. We then use the 2048 dimensional features from the
penultimate layer as input to a support vector machine. Results from the
experiment yielded an average classification accuracy of 92.3%. This result
outperforms that of an optical and inertial sensor combined (78%) and
illustrates the capability of HAR systems using...Comment: 26th AIAI Irish Conference on Artificial Intelligence and Cognitive
Scienc
- …