6 research outputs found
Deep Learning With Sentiment Inference For Discourse-Oriented Opinion Analysis
Opinions are omnipresent in written and spoken text ranging from editorials, reviews, blogs, guides, and informal conversations to written and broadcast news. However, past research in NLP has mainly addressed explicit opinion expressions, ignoring implicit opinions. As a result, research in opinion analysis has plateaued at a somewhat superficial level, providing methods that only recognize what is explicitly said and do not understand what is implied.
In this dissertation, we develop machine learning models for two tasks that presumably support propagation of sentiment in discourse, beyond one sentence. The first task we address is opinion role labeling, i.e.\ the task of detecting who expressed a given attitude toward what or who. The second task is abstract anaphora resolution, i.e.\ the task of finding a (typically) non-nominal antecedent of pronouns and noun phrases that refer to abstract objects like facts, events, actions, or situations in the preceding discourse.
We propose a neural model for labeling of opinion holders and targets and circumvent the problems that arise from the limited labeled data. In particular, we extend the baseline model with different multi-task learning frameworks. We obtain clear performance improvements using semantic role labeling as the auxiliary task. We conduct a thorough analysis to demonstrate how multi-task learning helps, what has been solved for the task, and what is next. We show that future developments should improve the ability of the models to capture long-range dependencies and consider other auxiliary tasks such as dependency parsing or recognizing textual entailment. We emphasize that future improvements can be measured more reliably if opinion expressions with missing roles are curated and if the evaluation considers all mentions in opinion role coreference chains as well as discontinuous roles.
To the best of our knowledge, we propose the first abstract anaphora resolution model that handles the unrestricted phenomenon in a realistic setting.
We cast abstract anaphora resolution as the task of learning attributes of the relation that holds between the sentence with the abstract anaphor and its antecedent. We propose a Mention-Ranking siamese-LSTM model (MR-LSTM) for learning what characterizes the mentioned relation in a data-driven fashion. The current resources for abstract anaphora resolution are quite limited. However, we can train our models without conventional data for abstract anaphora resolution. In particular, we can train our models on many instances of antecedent-anaphoric sentence pairs. Such pairs can be automatically extracted from parsed corpora by searching for a common construction which consists of a verb with an embedded sentence (complement or adverbial), applying a simple transformation that replaces the embedded sentence with an abstract anaphor, and using the cut-off embedded sentence as the antecedent. We refer to the extracted data as silver data.
We evaluate our MR-LSTM models in a realistic task setup in which models need to rank embedded sentences and verb phrases from the sentence with the anaphor as well as a few preceding sentences. We report the first benchmark results on an abstract anaphora subset of the ARRAU corpus \citep{uryupina_et_al_2016} which presents a greater challenge due to a mixture of nominal and pronominal anaphors as well as a greater range of confounders. We also use two additional evaluation datasets: a subset of the CoNLL-12 shared task dataset \citep{pradhan_et_al_2012} and a subset of the ASN corpus \citep{kolhatkar_et_al_2013_crowdsourcing}. We show that our MR-LSTM models outperform the baselines in all evaluation datasets, except for events in the CoNLL-12 dataset. We conclude that training on the small-scale gold data works well if we encounter the same type of anaphors at the evaluation time. However, the gold training data contains only six shell nouns and events and thus resolution of anaphors in the ARRAU corpus that covers a variety of anaphor types benefits from the silver data. Our MR-LSTM models for resolution of abstract anaphors outperform the prior work for shell noun resolution \citep{kolhatkar_et_al_2013} in their restricted task setup. Finally, we try to get the best out of the gold and silver training data by mixing them. Moreover, we speculate that we could improve the training on a mixture if we: (i) handle artifacts in the silver data with adversarial training and (ii) use multi-task learning to enable our models to make ranking decisions dependent on the type of anaphor. These proposals give us mixed results and hence a robust mixed training strategy remains a challenge
Recommended from our members
Federal Register
Daily publication of the U.S. Office of the Federal Register contains rules and regulations, proposed legislation and rule changes, and other notices, including "Presidential proclamations and Executive Orders, Federal agency documents having general applicability and legal effect, documents required to be published by act of Congress, and other Federal agency documents of public interest" (p. ii). Table of Contents starts on page iii
Proteomics and protein activity profiling: an investigation into the salivary proteome and kinase activities in various systems using mass spectrometry
Protein identification and quantitation using mass spectrometry has evolved as the dominant technique
for studying the protein complement of a system: cell, tissue or organism. The proteomics of body
fluids is a very active research area as there is great potential for protein biomarker discovery;
application of such technologies would revolutionise medical practice and treatment. Saliva, through
its non intrusive nature of sampling, is an ideal body fluid for disease diagnosis, screening and
monitoring. Gingivitis is a gum disease with symptoms including bleeding, swollen, and receding
gums. After dental decay, gingivitis is estimated to be the most common disease worldwide, and
around 40% of the population in the US are reported to have gingivitis. The end point goal of this
project was to identify salivary biomarkers for gingivitis.
This dissertation presents an investigation of: 1) the salivary proteome; 2) developments and
applications of a mass spectrometry kinase assay; and 3) salivary biomarkers for gingivitis using
proteomics and kinase activities.
The soluble portion of the human salivary proteome (saliva supernatant) has been studied by several
research groups but very few proteomic studies have been performed on the insoluble, cellular and
bacterial portion of saliva. Presented here, is the first global proteomics study performed on the saliva
residue and supernatant from the same test subject. A total of 834 and 1426 proteins were identified
in the saliva supernatant and residue, respectively. A global analysis of protein complexes in saliva
was also performed and is the first study, to date, of such an analysis. KAYAK (‘Kinase ActivitY
Assay for Kinome analysis’) was further developed for its application on a number of cell types, tissue
types, and a variety of organisms. Proof of concept work for in-gel kinase activity/kinase abundance
correlation profiling using blue native gels was performed, and experiments using anion exchange
chromatographic kinase activity/kinase abundance correlation profiling were performed to identify
kinase-substrate pairs. KAYAK applications included the analysis of kinase activities in
Saccharomyces cervisiae, Drosophila, mouse, and human saliva in which significant kinase activity
was detected in the saliva supernatant, a novel finding. Finally, gingivitis was induced in patients, and
the saliva samples were analysed using proteomics and kinase activity profiling. Although this work is
ongoing, preliminary data indicate that there are increases in various inflammatory proteins, certain
bacteria and also in the activity of particular kinases as a result of the induction of gingivitis.
The overall study provided insights into the salivary proteome for both the human and bacterial
complement, as well as discovering the presence of significant kinase activity in saliva. In the induced
gingivitis study, almost half of all the proteins identified in the residue were from bacteria (1274
bacterial proteins, 198 species identified) and there may be more potential for biomarker discovery for
certain diseases in the saliva residue than in the supernatant. A very large overlap was observed
between the human proteins in the saliva supernatant and residue, indicating that many of the salivary
proteins originate from lysed cells. The origin of the kinase activity in the saliva supernatant is not
known but is also proposed to originate predominantly from lysed cells. A range of novel KAYAK
applications have been investigated, demonstrating that KAYAK has a wide variety of future uses
ranging from target compound evaluation in Pharmaceutical companies to patient testing in the clinic
Cyber-Physical Systems of Systems: Foundations – A Conceptual Model and Some Derivations: The AMADEOS Legacy
Computer Systems Organization and Communication Networks; Software Engineering; Complex Systems; Information Systems Applications (incl. Internet); Computer Application