1,085 research outputs found
Low- and high-resource opinion summarization
Customer reviews play a vital role in the online purchasing decisions we make. The reviews
express user opinions that are useful for setting realistic expectations and uncovering important
details about products. However, some products receive hundreds or even thousands of
reviews, making them time-consuming to read. Moreover, many reviews contain uninformative
content, such as irrelevant personal experiences. Automatic summarization offers an
alternative – short text summaries capturing the essential information expressed in reviews.
Automatically produced summaries can reflect overall or particular opinions and be tailored to
user preferences. Besides being presented on major e-commerce platforms, home assistants
can also vocalize them. This approach can improve user satisfaction by assisting in making
faster and better decisions.
Modern summarization approaches are based on neural networks, often requiring thousands of
annotated samples for training. However, human-written summaries for products are expensive
to produce because annotators need to read many reviews. This has led to annotated data
scarcity where only a few datasets are available. Data scarcity is the central theme of our
works, and we propose a number of approaches to alleviate the problem. The thesis consists
of two parts where we discuss low- and high-resource data settings.
In the first part, we propose self-supervised learning methods applied to customer reviews
and few-shot methods for learning from small annotated datasets. Customer reviews without
summaries are available in large quantities, contain a breadth of in-domain specifics, and
provide a powerful training signal. We show that reviews can be used for learning summarizers
via a self-supervised objective. Further, we address two main challenges associated with
learning from small annotated datasets. First, large models rapidly overfit on small datasets
leading to poor generalization. Second, it is not possible to learn a wide range of in-domain
specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to
subtle semantic mistakes in generated summaries, such as ‘great dead on arrival battery.’ We
address the first challenge by explicitly modeling summary properties (e.g., content coverage
and sentiment alignment). Furthermore, we leverage small modules – adapters – that are
more robust to overfitting. As we show, despite their size, these modules can be used to
store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method
for learning personalized summarizers based on aspects, such as ‘price,’ ‘battery life,’ and
‘resolution.’ This task is harder to learn, and we present a few-shot method for training a
query-based summarizer on small annotated datasets.
In the second part, we focus on the high-resource setting and present a large dataset with
summaries collected from various online resources. The dataset has more than 33,000 humanwritten
summaries, where each is linked up to thousands of reviews. This, however, makes it
challenging to apply an ‘expensive’ deep encoder due to memory and computational costs. To
address this problem, we propose selecting small subsets of informative reviews. Only these
subsets are encoded by the deep encoder and subsequently summarized. We show that the
selector and summarizer can be trained end-to-end via amortized inference and policy gradient
methods
On the Utility of Representation Learning Algorithms for Myoelectric Interfacing
Electrical activity produced by muscles during voluntary movement is a reflection of the firing patterns of relevant motor neurons and, by extension, the latent motor intent driving the movement. Once transduced via electromyography (EMG) and converted into digital form, this activity can be processed to provide an estimate of the original motor intent and is as such a feasible basis for non-invasive efferent neural interfacing. EMG-based motor intent decoding has so far received the most attention in the field of upper-limb prosthetics, where alternative means of interfacing are scarce and the utility of better control apparent. Whereas myoelectric prostheses have been available since the 1960s, available EMG control interfaces still lag behind the mechanical capabilities of the artificial limbs they are intended to steer—a gap at least partially due to limitations in current methods for translating EMG into appropriate motion commands. As the relationship between EMG signals and concurrent effector kinematics is highly non-linear and apparently stochastic, finding ways to accurately extract and combine relevant information from across electrode sites is still an active area of inquiry.This dissertation comprises an introduction and eight papers that explore issues afflicting the status quo of myoelectric decoding and possible solutions, all related through their use of learning algorithms and deep Artificial Neural Network (ANN) models. Paper I presents a Convolutional Neural Network (CNN) for multi-label movement decoding of high-density surface EMG (HD-sEMG) signals. Inspired by the successful use of CNNs in Paper I and the work of others, Paper II presents a method for automatic design of CNN architectures for use in myocontrol. Paper III introduces an ANN architecture with an appertaining training framework from which simultaneous and proportional control emerges. Paper Iv introduce a dataset of HD-sEMG signals for use with learning algorithms. Paper v applies a Recurrent Neural Network (RNN) model to decode finger forces from intramuscular EMG. Paper vI introduces a Transformer model for myoelectric interfacing that do not need additional training data to function with previously unseen users. Paper vII compares the performance of a Long Short-Term Memory (LSTM) network to that of classical pattern recognition algorithms. Lastly, paper vIII describes a framework for synthesizing EMG from multi-articulate gestures intended to reduce training burden
Continuous Estimation of Smoking Lapse Risk from Noisy Wrist Sensor Data Using Sparse and Positive-Only Labels
Estimating the imminent risk of adverse health behaviors provides opportunities for developing effective behavioral intervention mechanisms to prevent the occurrence of the target behavior. One of the key goals is to find opportune moments for intervention by passively detecting the rising risk of an imminent adverse behavior. Significant progress in mobile health research and the ability to continuously sense internal and external states of individual health and behavior has paved the way for detecting diverse risk factors from mobile sensor data. The next frontier in this research is to account for the combined effects of these risk factors to produce a composite risk score of adverse behaviors using wearable sensors convenient for daily use. Developing a machine learning-based model for assessing the risk of smoking lapse in the natural environment faces significant outstanding challenges requiring the development of novel and unique methodologies for each of them. The first challenge is coming up with an accurate representation of noisy and incomplete sensor data to encode the present and historical influence of behavioral cues, mental states, and the interactions of individuals with their ever-changing environment. The next noteworthy challenge is the absence of confirmed negative labels of low-risk states and adequate precise annotations of high-risk states. Finally, the model should work on convenient wearable devices to facilitate widespread adoption in research and practice. In this dissertation, we develop methods that account for the multi-faceted nature of smoking lapse behavior to train and evaluate a machine learning model capable of estimating composite risk scores in the natural environment. We first develop mRisk, which combines the effects of various mHealth biomarkers such as stress, physical activity, and location history in producing the risk of smoking lapse using sequential deep neural networks. We propose an event-based encoding of sensor data to reduce the effect of noises and then present an approach to efficiently model the historical influence of recent and past sensor-derived contexts on the likelihood of smoking lapse. To circumvent the lack of confirmed negative labels (i.e., annotated low-risk moments) and only a few positive labels (i.e., sensor-based detection of smoking lapse corroborated by self-reports), we propose a new loss function to accurately optimize the models. We build the mRisk models using biomarker (stress, physical activity) streams derived from chest-worn sensors. Adapting the models to work with less invasive and more convenient wrist-based sensors requires adapting the biomarker detection models to work with wrist-worn sensor data. To that end, we develop robust stress and activity inference methodologies from noisy wrist-sensor data. We first propose CQP, which quantifies wrist-sensor collected PPG data quality. Next, we show that integrating CQP within the inference pipeline improves accuracy-yield trade-offs associated with stress detection from wrist-worn PPG sensors in the natural environment. mRisk also requires sensor-based precise detection of smoking events and confirmation through self-reports to extract positive labels. Hence, we develop rSmoke, an orientation-invariant smoking detection model that is robust to the variations in sensor data resulting from orientation switches in the field. We train the proposed mRisk risk estimation models using the wrist-based inferences of lapse risk factors. To evaluate the utility of the risk models, we simulate the delivery of intelligent smoking interventions to at-risk participants as informed by the composite risk scores. Our results demonstrate the envisaged impact of machine learning-based models operating on wrist-worn wearable sensor data to output continuous smoking lapse risk scores. The novel methodologies we propose throughout this dissertation help instigate a new frontier in smoking research that can potentially improve the smoking abstinence rate in participants willing to quit
Data-proximal complementary -TV reconstruction for limited data CT
In a number of tomographic applications, data cannot be fully acquired,
resulting in a severely underdetermined image reconstruction. In such cases,
conventional methods lead to reconstructions with significant artifacts. To
overcome these artifacts, regularization methods are applied that incorporate
additional information. An important example is TV reconstruction, which is
known to be efficient at compensating for missing data and reducing
reconstruction artifacts. At the same time, however, tomographic data is also
contaminated by noise, which poses an additional challenge. The use of a single
regularizer must therefore account for both the missing data and the noise.
However, a particular regularizer may not be ideal for both tasks. For example,
the TV regularizer is a poor choice for noise reduction across multiple scales,
in which case curvelet regularization methods are well suited. To
address this issue, in this paper we introduce a novel variational
regularization framework that combines the advantages of different
regularizers. The basic idea of our framework is to perform reconstruction in
two stages, where the first stage mainly aims at accurate reconstruction in the
presence of noise, and the second stage aims at artifact reduction. Both
reconstruction stages are connected by a data proximity condition. The proposed
method is implemented and tested for limited-view CT using a combined
curvelet-TV approach. We define and implement a curvelet transform adapted to
the limited-view problem and illustrate the advantages of our approach in
numerical experiments
Beam scanning by liquid-crystal biasing in a modified SIW structure
A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium
Learning Weakly Convex Regularizers for Convergent Image-Reconstruction Algorithms
We propose to learn non-convex regularizers with a prescribed upper bound on
their weak-convexity modulus. Such regularizers give rise to variational
denoisers that minimize a convex energy. They rely on few parameters (less than
15,000) and offer a signal-processing interpretation as they mimic handcrafted
sparsity-promoting regularizers. Through numerical experiments, we show that
such denoisers outperform convex-regularization methods as well as the popular
BM3D denoiser. Additionally, the learned regularizer can be deployed to solve
inverse problems with iterative schemes that provably converge. For both CT and
MRI reconstruction, the regularizer generalizes well and offers an excellent
tradeoff between performance, number of parameters, guarantees, and
interpretability when compared to other data-driven approaches
Curvature corrected tangent space-based approximation of manifold-valued data
When generalizing schemes for real-valued data approximation or decomposition
to data living in Riemannian manifolds, tangent space-based schemes are very
attractive for the simple reason that these spaces are linear. An open
challenge is to do this in such a way that the generalized scheme is applicable
to general Riemannian manifolds, is global-geometry aware and is
computationally feasible. Existing schemes have been unable to account for all
three of these key factors at the same time.
In this work, we take a systematic approach to developing a framework that is
able to account for all three factors. First, we will restrict ourselves to the
-- still general -- class of symmetric Riemannian manifolds and show how
curvature affects general manifold-valued tensor approximation schemes. Next,
we show how the latter observations can be used in a general strategy for
developing approximation schemes that are also global-geometry aware. Finally,
having general applicability and global-geometry awareness taken into account
we restrict ourselves once more in a case study on low-rank approximation. Here
we show how computational feasibility can be achieved and propose the
curvature-corrected truncated higher-order singular value decomposition
(CC-tHOSVD), whose performance is subsequently tested in numerical experiments
with both synthetic and real data living in symmetric Riemannian manifolds with
both positive and negative curvature
Geometric Data Analysis: Advancements of the Statistical Methodology and Applications
Data analysis has become fundamental to our society and comes in multiple facets and approaches. Nevertheless, in research and applications, the focus was primarily on data from Euclidean vector spaces. Consequently, the majority of methods that are applied today are not suited for more general data types. Driven by needs from fields like image processing, (medical) shape analysis, and network analysis, more and more attention has recently been given to data from non-Euclidean spaces–particularly (curved) manifolds. It has led to the field of geometric data analysis whose methods explicitly take the structure (for example, the topology and geometry) of the underlying space into account.
This thesis contributes to the methodology of geometric data analysis by generalizing several fundamental notions from multivariate statistics to manifolds. We thereby focus on two different viewpoints.
First, we use Riemannian structures to derive a novel regression scheme for general manifolds that relies on splines of generalized Bézier curves. It can accurately model non-geodesic relationships, for example, time-dependent trends with saturation effects or cyclic trends. Since Bézier curves can be evaluated with the constructive de Casteljau algorithm, working with data from manifolds of high dimensions (for example, a hundred thousand or more) is feasible. Relying on the regression, we further develop
a hierarchical statistical model for an adequate analysis of longitudinal data in manifolds, and a method to control for confounding variables.
We secondly focus on data that is not only manifold- but even Lie group-valued, which is frequently the case in applications. We can only achieve this by endowing the group with an affine connection structure that is generally not Riemannian. Utilizing it, we derive generalizations of several well-known dissimilarity measures between data distributions that can be used for various tasks, including hypothesis testing. Invariance under data translations is proven, and a connection to continuous distributions is given for one measure.
A further central contribution of this thesis is that it shows use cases for all notions in real-world applications, particularly in problems from shape analysis in medical imaging and archaeology. We can replicate or further quantify several known findings for shape changes of the femur and the right hippocampus under osteoarthritis and Alzheimer's, respectively. Furthermore, in an archaeological application, we obtain new insights into the construction principles of ancient sundials. Last but not least, we use the geometric structure underlying human brain connectomes to predict cognitive scores. Utilizing a sample selection procedure, we obtain state-of-the-art results
What's in a Prior? Learned Proximal Networks for Inverse Problems
Proximal operators are ubiquitous in inverse problems, commonly appearing as
part of algorithmic strategies to regularize problems that are otherwise
ill-posed. Modern deep learning models have been brought to bear for these
tasks too, as in the framework of plug-and-play or deep unrolling, where they
loosely resemble proximal operators. Yet, something essential is lost in
employing these purely data-driven approaches: there is no guarantee that a
general deep network represents the proximal operator of any function, nor is
there any characterization of the function for which the network might provide
some approximate proximal. This not only makes guaranteeing convergence of
iterative schemes challenging but, more fundamentally, complicates the analysis
of what has been learned by these networks about their training data. Herein we
provide a framework to develop learned proximal networks (LPN), prove that they
provide exact proximal operators for a data-driven nonconvex regularizer, and
show how a new training strategy, dubbed proximal matching, provably promotes
the recovery of the log-prior of the true data distribution. Such LPN provide
general, unsupervised, expressive proximal operators that can be used for
general inverse problems with convergence guarantees. We illustrate our results
in a series of cases of increasing complexity, demonstrating that these models
not only result in state-of-the-art performance, but provide a window into the
resulting priors learned from data
- …