6,517 research outputs found
A Short Review of Ethical Challenges in Clinical Natural Language Processing
Clinical NLP has an immense potential in contributing to how clinical
practice will be revolutionized by the advent of large scale processing of
clinical records. However, this potential has remained largely untapped due to
slow progress primarily caused by strict data access policies for researchers.
In this paper, we discuss the concern for privacy and the measures it entails.
We also suggest sources of less sensitive data. Finally, we draw attention to
biases that can compromise the validity of empirical research and lead to
socially harmful applications.Comment: First Workshop on Ethics in Natural Language Processing (EACL'17
NASA aviation safety reporting system
The origins and development of the NASA Aviation Safety Reporting System (ASRS) are briefly reviewed. The results of the first quarter's activity are summarized and discussed. Examples are given of bulletins describing potential air safety hazards, and the disposition of these bulletins. During the first quarter of operation, the ASRS received 1464 reports; 1407 provided data relevant to air safety. All reports are being processed for entry into the ASRS data base. During the reporting period, 130 alert bulletins describing possible problems in the aviation system were generated and disseminated. Responses were received from FAA and others regarding 108 of the alert bulletins. Action was being taken with respect to 70 of the 108 responses received. Further studies are planned of a number of areas, including human factors problems related to automation of the ground and airborne portions of the national aviation system
Easy over Hard: A Case Study on Deep Learning
While deep learning is an exciting new technique, the benefits of this method
need to be assessed with respect to its computational cost. This is
particularly important for deep learning since these learners need hours (to
weeks) to train the model. Such long training time limits the ability of (a)~a
researcher to test the stability of their conclusion via repeated runs with
different random seeds; and (b)~other researchers to repeat, improve, or even
refute that original work.
For example, recently, deep learning was used to find which questions in the
Stack Overflow programmer discussion forum can be linked together. That deep
learning system took 14 hours to execute. We show here that applying a very
simple optimizer called DE to fine tune SVM, it can achieve similar (and
sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84
times faster hours than deep learning method.
We offer these results as a cautionary tale to the software analytics
community and suggest that not every new innovation should be applied without
critical analysis. If researchers deploy some new and expensive process, that
work should be baselined against some simpler and faster alternatives.Comment: 12 pages, 6 figures, accepted at FSE201
Review and synthesis of problems and directions for large scale geographic information system development
Problems and directions for large scale geographic information system development were reviewed and the general problems associated with automated geographic information systems and spatial data handling were addressed
Adversarial Removal of Demographic Attributes from Text Data
Recent advances in Representation Learning and Adversarial Training seem to
succeed in removing unwanted features from the learned representation. We show
that demographic information of authors is encoded in -- and can be recovered
from -- the intermediate representations learned by text-based neural
classifiers. The implication is that decisions of classifiers trained on
textual data are not agnostic to -- and likely condition on -- demographic
attributes. When attempting to remove such demographic information using
adversarial training, we find that while the adversarial component achieves
chance-level development-set accuracy during training, a post-hoc classifier,
trained on the encoded sentences from the first part, still manages to reach
substantially higher classification accuracies on the same data. This behavior
is consistent across several tasks, demographic properties and datasets. We
explore several techniques to improve the effectiveness of the adversarial
component. Our main conclusion is a cautionary one: do not rely on the
adversarial training to achieve invariant representation to sensitive features
- …