31 research outputs found
DCU's experiments in NTCIR-8 IR4QA task
We describe DCU's participation in the NTCIR-8 IR4QA
task [16]. This task is a cross-language information retrieval(CLIR) task from English to Simplified Chinese which seeks to provide relevant documents for later cross language question answering (CLQA) tasks. For the IR4QA task, we submitted 5 official runs including two monolingual runs and three CLIR runs. For the monolingual retrieval we tested two information retrieval models. The results show that the KL-Divergence language model method performs better than the Okapi BM25 model for the Simplified Chinese retrieval task. This agrees with our previous CLIR experimental results at NTCIR-5. For the CLIR task, we compare query translation and document translation methods. In the query translation based runs, we tested a method for query expansion from external resource (QEE) before query translation. Our result for this run is slightly lower than the run without QEE. Our results show that the document translation method achieves 68.24% MAP performance compared to our best query translation run. For the document translation method, we found that the main issue is the lack of named entity translation in the documents since we do not have a suitable parallel corpus for training data for the statistical machine translation system. Our best CLIR run comes from the combination of query translation using Google translate and the KL-Divergence language model retrieval method. It achieves 79.94% MAP relative to our best monolingual run
Predicting visual context for unsupervised event segmentation in continuous photo-streams
Segmenting video content into events provides semantic structures for
indexing, retrieval, and summarization. Since motion cues are not available in
continuous photo-streams, and annotations in lifelogging are scarce and costly,
the frames are usually clustered into events by comparing the visual features
between them in an unsupervised way. However, such methodologies are
ineffective to deal with heterogeneous events, e.g. taking a walk, and
temporary changes in the sight direction, e.g. at a meeting. To address these
limitations, we propose Contextual Event Segmentation (CES), a novel
segmentation paradigm that uses an LSTM-based generative network to model the
photo-stream sequences, predict their visual context, and track their
evolution. CES decides whether a frame is an event boundary by comparing the
visual context generated from the frames in the past, to the visual context
predicted from the future. We implemented CES on a new and massive lifelogging
dataset consisting of more than 1.5 million images spanning over 1,723 days.
Experiments on the popular EDUB-Seg dataset show that our model outperforms the
state-of-the-art by over 16% in f-measure. Furthermore, CES' performance is
only 3 points below that of human annotators.Comment: Accepted for publication at the 2018 ACM Multimedia Conference (MM
'18
A knowledge regularized hierarchical approach for emotion cause analysis
Emotion cause analysis, which aims to identify the reasons behind emotions, is a key topic in sentiment analysis. A variety of neural network models have been proposed recently, however, these previous models mostly focus on the learning architecture with local textual information, ignoring the discourse and prior knowledge, which play crucial roles in human text comprehension. In this paper, we propose a new method to extract emotion cause with a hierarchical neural model and knowledge-based regularizations, which aims to incorporate discourse context information and restrain the parameters by sentiment lexicon and common knowledge. The experimental results demonstrate that our proposed method achieves the state-of-the-art performance on two public datasets in different languages (Chinese and English), outperforming a number of competitive baselines by at least 2.08% in F-measure
Making Presentation Math Computable
This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book
Making Presentation Math Computable
This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book
Tree-Based Representation and Generation of Natural and Mathematical Language
Mathematical language in scientific communications and educational scenarios
is important yet relatively understudied compared to natural languages. Recent
works on mathematical language focus either on representing stand-alone
mathematical expressions, especially in their natural tree format, or
mathematical reasoning in pre-trained natural language models. Existing works
on jointly modeling and generating natural and mathematical languages simply
treat mathematical expressions as text, without accounting for the rigid
structural properties of mathematical expressions. In this paper, we propose a
series of modifications to existing language models to jointly represent and
generate text and math: representing mathematical expressions as sequences of
node tokens in their operator tree format, using math symbol and tree position
embeddings to preserve the semantic and structural properties of mathematical
expressions, and using a constrained decoding method to generate mathematically
valid expressions. We ground our modifications in GPT-2, resulting in a model
MathGPT, and demonstrate that it outperforms baselines on mathematical
expression generation tasks
Developing a distributed electronic health-record store for India
The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India