161 research outputs found
Interactive cross-language document selection
The problem of finding documents written in a language that the searcher cannot read is perhaps the most challenging application of cross-language information retrieval technology. In interactive applications, that task involves at least two steps: (1) the machine locates promising documents in a collection that is larger than the searcher could scan, and (2) the searcher recognizes documents relevant to their intended use from among those nominated by the machine. This article presents the results of experiments designed to explore three techniques for supporting interactive relevance assessment: (1) full machine translation, (2) rapid term-by-term translation, and (3) focused phrase translation. Machine translation was found to better support this task than term-by-term translation, and focused phrase translation further improved recall without an adverse effect on precision. The article concludes with an assessment of the strengths and weaknesses of the evaluation framework used in this study and some remarks on implications of these results for future evaluation campaigns
Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation
With the uptake of algorithmic personalization in the news domain, news
organizations increasingly trust automated systems with previously considered
editorial responsibilities, e.g., prioritizing news to readers. In this paper
we study an automated news recommender system in the context of a news
organization's editorial values. We conduct and present two online studies with
a news recommender system, which span one and a half months and involve over
1,200 users. In our first study we explore how our news recommender steers
reading behavior in the context of editorial values such as serendipity,
dynamism, diversity, and coverage. Next, we present an intervention study where
we extend our news recommender to steer our readers to more dynamic reading
behavior. We find that (i) our recommender system yields more diverse reading
behavior and yields a higher coverage of articles compared to non-personalized
editorial rankings, and (ii) we can successfully incorporate dynamism in our
recommender system as a re-ranking method, effectively steering our readers to
more dynamic articles without hurting our recommender system's accuracy.Comment: To appear in UMAP 202
Personalized classification for keyword-based category profiles
Personalized classification refers to allowing users to define their own categories and automating the assignment of documents to these categories. In this paper, we examine the use of keywords to define personalized categories and propose the use of Support Vector Machine (SVM) to perform personalized classification. Two scenarios have been investigated. The first assumes that the personalized categories are defined in a flat category space. The second assumes that each personalized category is defined within a pre-defined general category that provides a more specific context for the personalized category. The training documents for personalized categories are obtained from a training document pool using a search engine and a set of keywords. Our experiments have delivered better classification results using the second scenario. We also conclude that the number of keywords used can be very small and increasing them does not always lead to better classification performance
Learning with Weak Supervision for Email Intent Detection
Email remains one of the most frequently used means of online communication.
People spend a significant amount of time every day on emails to exchange
information, manage tasks and schedule events. Previous work has studied
different ways for improving email productivity by prioritizing emails,
suggesting automatic replies or identifying intents to recommend appropriate
actions. The problem has been mostly posed as a supervised learning problem
where models of different complexities were proposed to classify an email
message into a predefined taxonomy of intents or classes. The need for labeled
data has always been one of the largest bottlenecks in training supervised
models. This is especially the case for many real-world tasks, such as email
intent classification, where large scale annotated examples are either hard to
acquire or unavailable due to privacy or data access constraints. Email users
often take actions in response to intents expressed in an email (e.g., setting
up a meeting in response to an email with a scheduling request). Such actions
can be inferred from user interaction logs. In this paper, we propose to
leverage user actions as a source of weak supervision, in addition to a limited
set of annotated examples, to detect intents in emails. We develop an
end-to-end robust deep neural network model for email intent identification
that leverages both clean annotated data and noisy weak supervision along with
a self-paced learning mechanism. Extensive experiments on three different
intent detection tasks show that our approach can effectively leverage the
weakly supervised data to improve intent detection in emails.Comment: 10 pages, 3 figure
Deep sequencing reveals the complex and coordinated transcriptional regulation of genes related to grain quality in rice cultivars
<p>Abstract</p> <p>Background</p> <p>Milling yield and eating quality are two important grain quality traits in rice. To identify the genes involved in these two traits, we performed a deep transcriptional analysis of developing seeds using both massively parallel signature sequencing (MPSS) and sequencing-by-synthesis (SBS). Five MPSS and five SBS libraries were constructed from 6-day-old developing seeds of Cypress (high milling yield), LaGrue (low milling yield), Ilpumbyeo (high eating quality), YR15965 (low eating quality), and Nipponbare (control).</p> <p>Results</p> <p>The transcriptomes revealed by MPSS and SBS had a high correlation co-efficient (0.81 to 0.90), and about 70% of the transcripts were commonly identified in both types of the libraries. SBS, however, identified 30% more transcripts than MPSS. Among the highly expressed genes in Cypress and Ilpumbyeo, over 100 conserved <it>cis </it>regulatory elements were identified. Numerous specifically expressed transcription factor (TF) genes were identified in Cypress (282), LaGrue (312), Ilpumbyeo (363), YR15965 (260), and Nipponbare (357). Many key grain quality-related genes (i.e., genes involved in starch metabolism, aspartate amino acid metabolism, storage and allergenic protein synthesis, and seed maturation) that were expressed at high levels underwent alternative splicing and produced antisense transcripts either in Cypress or Ilpumbyeo. Further, a time course RT-PCR analysis confirmed a higher expression level of genes involved in starch metabolism such as those encoding ADP glucose pyrophosphorylase (AGPase) and granule bound starch synthase I (GBSS I) in Cypress than that in LaGrue during early seed development.</p> <p>Conclusion</p> <p>This study represents the most comprehensive analysis of the developing seed transcriptome of rice available to date. Using two high throughput sequencing methods, we identified many differentially expressed genes that may affect milling yield or eating quality in rice. Many of the identified genes are involved in the biosynthesis of starch, aspartate family amino acids, and storage proteins. Some of the differentially expressed genes could be useful for the development of molecular markers if they are located in a known QTL region for milling yield or eating quality in the rice genome. Therefore, our comprehensive and deep survey of the developing seed transcriptome in five rice cultivars has provided a rich genomic resource for further elucidating the molecular basis of grain quality in rice.</p
Molecular Dynamics Simulation of Phosphorylated KID Post-Translational Modification
BACKGROUND:Kinase-inducible domain (KID) as transcriptional activator can stimulate target gene expression in signal transduction by associating with KID interacting domain (KIX). NMR spectra suggest that apo-KID is an unstructured protein. After post-translational modification by phosphorylation, KID undergoes a transition from disordered to well folded protein upon binding to KIX. However, the mechanism of folding coupled to binding is poorly understood. METHODOLOGY:To get an insight into the mechanism, we have performed ten trajectories of explicit-solvent molecular dynamics (MD) for both bound and apo phosphorylated KID (pKID). Ten MD simulations are sufficient to capture the average properties in the protein folding and unfolding. CONCLUSIONS:Room-temperature MD simulations suggest that pKID becomes more rigid and stable upon the KIX-binding. Kinetic analysis of high-temperature MD simulations shows that bound pKID and apo-pKID unfold via a three-state and a two-state process, respectively. Both kinetics and free energy landscape analyses indicate that bound pKID folds in the order of KIX access, initiation of pKID tertiary folding, folding of helix alpha(B), folding of helix alpha(A), completion of pKID tertiary folding, and finalization of pKID-KIX binding. Our data show that the folding pathways of apo-pKID are different from the bound state: the foldings of helices alpha(A) and alpha(B) are swapped. Here we also show that Asn139, Asp140 and Leu141 with large Phi-values are key residues in the folding of bound pKID. Our results are in good agreement with NMR experimental observations and provide significant insight into the general mechanisms of binding induced protein folding and other conformational adjustment in post-translational modification
Cross-lingual C*ST*RD: English access to Hindi information
We present C*ST*RD, a cross-language information delivery system that supports cross-language information retrieval, information space visualization and navigation, machine translation, and text summarization of single documents and clusters of documents. C*ST*RD was assembled and trained within 1 month, in the context of DARPA’s Surprise Language Exercise, that selected as source a heretofore unstudied language, Hindi. Given the brief time, we could not create deep Hindi capabilities for all the modules, but instead experimented with combining shallow Hindi capabilities, or even English-only modules, into one integrated system. Various possible configurations, with different tradeoffs in processing speed and ease of use, enable the rapid deployment of C*ST*RD to new languages under various conditions
- …