Search CORE

423 research outputs found

Large-scale Hierarchical Alignment for Data-driven Text Rewriting

Author: Hahnloser Richard H. R.
Nikolov Nikola I.
Publication venue
Publication date: 01/01/2019
Field of study

We propose a simple unsupervised method for extracting pseudo-parallel monolingual sentence pairs from comparable corpora representative of two different text styles, such as news articles and scientific papers. Our approach does not require a seed parallel corpus, but instead relies solely on hierarchical search over pre-trained embeddings of documents and sentences. We demonstrate the effectiveness of our method through automatic and extrinsic evaluation on text simplification from the normal to the Simple Wikipedia. We show that pseudo-parallel sentences extracted with our method not only supplement existing parallel data, but can even lead to competitive performance on their own.Comment: RANLP 201

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

ZORA

Character-level Chinese-English Translation through ASCII Encoding

Author: Hahnloser Richard H. R.
Hu Yuhuang
Nikolov Nikola I.
Tan Mi Xue
Publication venue
Publication date: 01/01/2018
Field of study

Character-level Neural Machine Translation (NMT) models have recently achieved impressive results on many language pairs. They mainly do well for Indo-European language pairs, where the languages share the same writing system. However, for translating between Chinese and English, the gap between the two different writing systems poses a major challenge because of a lack of systematic correspondence between the individual linguistic units. In this paper, we enable character-level NMT for Chinese, by breaking down Chinese characters into linguistic units similar to that of Indo-European languages. We use the Wubi encoding scheme, which preserves the original shape and semantic information of the characters, while also being reversible. We show promising results from training Wubi-based models on the character- and subword-level with recurrent as well as convolutional models.Comment: 7 pages, 3 figures, 3rd Conference on Machine Translation (WMT18), 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

Crossref

ZORA

SciLit: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation

Author: Gu Nianlong
Hahnloser Richard H. R.
Publication venue
Publication date: 06/06/2023
Field of study

Scientific writing involves retrieving, summarizing, and citing relevant papers, which can be time-consuming processes in large and rapidly evolving fields. By making these processes inter-operable, natural language processing (NLP) provides opportunities for creating end-to-end assistive writing tools. We propose SciLit, a pipeline that automatically recommends relevant papers, extracts highlights, and suggests a reference sentence as a citation of a paper, taking into consideration the user-provided context and keywords. SciLit efficiently recommends papers from large databases of hundreds of millions of papers using a two-stage pre-fetching and re-ranking literature search system that flexibly deals with addition and removal of a paper database. We provide a convenient user interface that displays the recommended papers as extractive summaries and that offers abstractively-generated citing sentences which are aligned with the provided context and which mention the chosen keyword(s). Our assistive tool for literature discovery and scientific writing is available at https://scilit.vercel.appComment: Accepted at ACL 2023 System Demonstratio

arXiv.org e-Print Archive

Spike Correlations in a Songbird Agree with a Simple Markov Population Model

Author: Hahnloser Richard H. R
Weber Andrea P
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

The relationships between neural activity at the single-cell and the population levels are of central importance for understanding neural codes. In many sensory systems, collective behaviors in large cell groups can be described by pairwise spike correlations. Here, we test whether in a highly specialized premotor system of songbirds, pairwise spike correlations themselves can be seen as a simple corollary of an underlying random process. We test hypotheses on connectivity and network dynamics in the motor pathway of zebra finches using a high-level population model that is independent of detailed single-neuron properties. We assume that neural population activity evolves along a finite set of states during singing, and that during sleep population activity randomly switches back and forth between song states and a single resting state. Individual spike trains are generated by associating with each of the population states a particular firing mode, such as bursting or tonic firing. With an overall modification of one or two simple control parameters, the Markov model is able to reproduce observed firing statistics and spike correlations in different neuron types and behavioral states. Our results suggest that song- and sleep-related firing patterns are identical on short time scales and result from random sampling of a unique underlying theme. The efficiency of our population model may apply also to other neural systems in which population hypotheses can be tested on recordings from small neuron groups

Public Library of Science (PLOS)

Repository for Publications and Research Data

CiteSeerX

Directory of Open Access Journals

PubMed Central

ZORA

Bilateral neurotoxic lesions in NCM before tutoring onset do not prevent successful tutor song learning

Author: Canopoli Alessandro
Zai Anja
Hahnloser Richard H R
Publication venue: ScienceMatters AG
Publication date: 24/10/2007
Field of study

Sensorimotor learning crucially depends on the ability to acquire a sensory memory for shaping motor commands. Such learning is conveniently studied in young songbirds when they memorize the song of an adult singer and gradually transform their own vocalizations toward the memorized target song. Here we study the involvement of the Caudal Medial Nidopallium (NCM), a higher auditory cortical area, in acquisition of a song memory. NCM has previously been shown to be involved in tutor song memorization. To study the necessity of NCM in this process, we perform large irreversible NCM lesions using ibotenic acid injections in about 40-days old juvenile zebra finches, before their first exposure to tutor song. Surprisingly, NCM-lesioned juveniles successfully copied the tutor song at least as well as untreated control animals, showing that a fully intact NCM is not required for tutor song memory formation and normal song development

Crossref

ZORA

Activos Digitales IAPH

MemSum: Extractive Summarization of Long Documents using Multi-step Episodic Markov Decision Processes

Author: Ash Elliott
Gu Nianlong
Hahnloser Richard H R
Publication venue
Publication date: 01/01/2021
Field of study

We introduce MemSum (Multi-step Episodic Markov decision process extractive SUMmarizer), a reinforcement-learning-based extractive summarizer enriched at any given time step with information on the current extraction history. Similar to previous models in this vein, MemSum iteratively selects sentences into the summary. Our innovation is in considering a broader information set when summarizing that would intuitively also be used by humans in this task: 1) the text content of the sentence, 2) the global text context of the rest of the document, and 3) the extraction history consisting of the set of sentences that have already been extracted. With a lightweight architecture, MemSum nonetheless obtains state-of-the-art test-set performance (ROUGE score) on long document datasets (PubMed, arXiv, and GovReport). Supporting analysis demonstrates that the added awareness of extraction history gives MemSum robustness against redundancy in the source document

ZORA

MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering

Author: Gao Yingqiang
Gu Nianlong
Hahnloser Richard H R
Publication venue
Publication date: 01/01/2023
Field of study

We introduce MemSum-DQA, an efficient system for document question answering (DQA) that leverages MemSum, a long document extractive summarizer. By prefixing each text block in the parsed document with the provided question and question type, MemSum-DQA selectively extracts text blocks as answers from documents. On full-document answering tasks, this approach yields a 9% improvement in exact match accuracy over prior state-of-the-art baselines. Notably, MemSum-DQA excels in addressing questions related to child-relationship understanding, underscoring the potential of extractive summarization techniques for DQA tasks

ZORA

Local Citation Recommendation with Hierarchical-Attention Text Encoder and SciBERT-based Reranking

Author: Gao Yingqiang
Gu Nianlong
Hahnloser Richard H R
Publication venue
Publication date: 01/01/2021
Field of study

The goal of local citation recommendation is to recommend a missing reference from the local citation context and optionally also from the global context. To balance the tradeoff between speed and accuracy of citation recommendation in the context of a large-scale paper database, a viable approach is to first prefetch a limited number of relevant documents using efficient ranking methods and then to perform a fine-grained reranking using more sophisticated models. In that vein, BM25 has been found to be a tough-to-beat approach to prefetching, which is why recent work has focused mainly on the reranking step. Even so, we explore prefetching with nearest neighbor search among text embeddings constructed by a hierarchical attention network. When coupled with a SciBERT reranker fine-tuned on local citation recommendation tasks, our hierarchical Attention encoder (HAtten) achieves high prefetch recall for a given number of candidates to be reranked. Consequently, our reranker requires fewer prefetch candidates to rerank, yet still achieves state-of-the-art performance on various local citation recommendation datasets such as ACL-200, FullTextPeerRead, RefSeer, and arXiv

ZORA

MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering

Author: Gao Yingqiang
Gu Nianlong
Hahnloser Richard H. R.
Publication venue
Publication date: 10/10/2023
Field of study

arXiv.org e-Print Archive

Correlative Microscopy of Densely Labeled Projection Neurons Using Neural Tracers

Author: Hahnloser Richard H. R.
Kirschmann Moritz A.
Oberti Daniele
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2010
Field of study

Three-dimensional morphological information about neural microcircuits is of high interest in neuroscience, but acquiring this information remains challenging. A promising new correlative technique for brain imaging is array tomography (Micheva and Smith, 2007), in which series of ultrathin brain sections are treated with fluorescent antibodies against neurotransmitters and synaptic proteins. Treated sections are repeatedly imaged in the fluorescence light microscope (FLM) and then in the electron microscope (EM). We explore a similar correlative imaging technique in which we differentially label distinct populations of projection neurons, the key routers of electrical signals in the brain. In songbirds, projection neurons can easily be labeled using neural tracers, because the vocal control areas are segregated into separate nuclei. We inject tracers into areas afferent and efferent to the main premotor area for vocal production, HVC, to retrogradely and anterogradely label different classes of projection neurons. We optimize tissue preparation protocols to achieve high fluorescence contrast in the FLM and good ultrastructure in the EM (using osmium tetroxide). Although tracer fluorescence is lost during EM preparation, we localize the tracer molecules after fixation and embedding by using fluorescent antibodies against them. We detect signals mainly in somata and dendrites, allowing us to classify synapses within a single ultrathin section as belonging to a particular type of projection neuron. The use of our method will be to provide statistical information about connectivity among different neuron classes, and to elucidate how signals in the brain are processed and routed among different areas

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

PubMed Central