34,882 research outputs found
Learning Credible Models
In many settings, it is important that a model be capable of providing
reasons for its predictions (i.e., the model must be interpretable). However,
the model's reasoning may not conform with well-established knowledge. In such
cases, while interpretable, the model lacks \textit{credibility}. In this work,
we formally define credibility in the linear setting and focus on techniques
for learning models that are both accurate and credible. In particular, we
propose a regularization penalty, expert yielded estimates (EYE), that
incorporates expert knowledge about well-known relationships among covariates
and the outcome of interest. We give both theoretical and empirical results
comparing our proposed method to several other regularization techniques.
Across a range of settings, experiments on both synthetic and real data show
that models learned using the EYE penalty are significantly more credible than
those learned using other penalties. Applied to a large-scale patient risk
stratification task, our proposed technique results in a model whose top
features overlap significantly with known clinical risk factors, while still
achieving good predictive performance
Econometrics meets sentiment : an overview of methodology and applications
The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software
Computerized Analysis of Magnetic Resonance Images to Study Cerebral Anatomy in Developing Neonates
The study of cerebral anatomy in developing neonates is of great importance for
the understanding of brain development during the early period of life. This
dissertation therefore focuses on three challenges in the modelling of cerebral
anatomy in neonates during brain development. The methods that have been
developed all use Magnetic Resonance Images (MRI) as source data.
To facilitate study of vascular development in the neonatal period, a set of image
analysis algorithms are developed to automatically extract and model cerebral
vessel trees. The whole process consists of cerebral vessel tracking from
automatically placed seed points, vessel tree generation, and vasculature
registration and matching. These algorithms have been tested on clinical Time-of-
Flight (TOF) MR angiographic datasets.
To facilitate study of the neonatal cortex a complete cerebral cortex segmentation
and reconstruction pipeline has been developed. Segmentation of the neonatal
cortex is not effectively done by existing algorithms designed for the adult brain
because the contrast between grey and white matter is reversed. This causes pixels
containing tissue mixtures to be incorrectly labelled by conventional methods. The
neonatal cortical segmentation method that has been developed is based on a novel
expectation-maximization (EM) method with explicit correction for mislabelled
partial volume voxels. Based on the resulting cortical segmentation, an implicit
surface evolution technique is adopted for the reconstruction of the cortex in
neonates. The performance of the method is investigated by performing a detailed
landmark study.
To facilitate study of cortical development, a cortical surface registration algorithm
for aligning the cortical surface is developed. The method first inflates extracted
cortical surfaces and then performs a non-rigid surface registration using free-form
deformations (FFDs) to remove residual alignment. Validation experiments using
data labelled by an expert observer demonstrate that the method can capture local
changes and follow the growth of specific sulcus
A recurrent neural network for classification of unevenly sampled variable stars
Astronomical surveys of celestial sources produce streams of noisy time
series measuring flux versus time ("light curves"). Unlike in many other
physical domains, however, large (and source-specific) temporal gaps in data
arise naturally due to intranight cadence choices as well as diurnal and
seasonal constraints. With nightly observations of millions of variable stars
and transients from upcoming surveys, efficient and accurate discovery and
classification techniques on noisy, irregularly sampled data must be employed
with minimal human-in-the-loop involvement. Machine learning for inference
tasks on such data traditionally requires the laborious hand-coding of
domain-specific numerical summaries of raw data ("features"). Here we present a
novel unsupervised autoencoding recurrent neural network (RNN) that makes
explicit use of sampling times and known heteroskedastic noise properties. When
trained on optical variable star catalogs, this network produces supervised
classification models that rival other best-in-class approaches. We find that
autoencoded features learned on one time-domain survey perform nearly as well
when applied to another survey. These networks can continue to learn from new
unlabeled observations and may be used in other unsupervised tasks such as
forecasting and anomaly detection.Comment: 23 pages, 14 figures. The published version is at Nature Astronomy
(https://www.nature.com/articles/s41550-017-0321-z). Source code for models,
experiments, and figures at
https://github.com/bnaul/IrregularTimeSeriesAutoencoderPaper (Zenodo Code
DOI: 10.5281/zenodo.1045560
Weakly-supervised appraisal analysis
This article is concerned with the computational treatment of Appraisal, a Systemic Functional Linguistic theory of the types of language employed to communicate opinion in English. The theory considers aspects such as Attitude (how writers communicate their point of view), Engagement (how writers align themselves with respect to the opinions of others) and Graduation (how writers amplify or diminish their attitudes and engagements). To analyse text according to the theory we employ a weakly-supervised approach to text classification, which involves comparing the similarity of words with prototypical examples of classes. We evaluate the method's performance using a collection of book reviews annotated according to the Appraisal theory
Extracting Implicit Social Relation for Social Recommendation Techniques in User Rating Prediction
Recommendation plays an increasingly important role in our daily lives.
Recommender systems automatically suggest items to users that might be
interesting for them. Recent studies illustrate that incorporating social trust
in Matrix Factorization methods demonstrably improves accuracy of rating
prediction. Such approaches mainly use the trust scores explicitly expressed by
users. However, it is often challenging to have users provide explicit trust
scores of each other. There exist quite a few works, which propose Trust
Metrics to compute and predict trust scores between users based on their
interactions. In this paper, first we present how social relation can be
extracted from users' ratings to items by describing Hellinger distance between
users in recommender systems. Then, we propose to incorporate the predicted
trust scores into social matrix factorization models. By analyzing social
relation extraction from three well-known real-world datasets, which both:
trust and recommendation data available, we conclude that using the implicit
social relation in social recommendation techniques has almost the same
performance compared to the actual trust scores explicitly expressed by users.
Hence, we build our method, called Hell-TrustSVD, on top of the
state-of-the-art social recommendation technique to incorporate both the
extracted implicit social relations and ratings given by users on the
prediction of items for an active user. To the best of our knowledge, this is
the first work to extend TrustSVD with extracted social trust information. The
experimental results support the idea of employing implicit trust into matrix
factorization whenever explicit trust is not available, can perform much better
than the state-of-the-art approaches in user rating prediction
- ā¦