5 research outputs found
Knowledge-guided Text Structuring in Clinical Trials
Clinical trial records are variable resources or the analysis of patients and
diseases. Information extraction from free text such as eligibility criteria
and summary of results and conclusions in clinical trials would better support
computer-based eligibility query formulation and electronic patient screening.
Previous research has focused on extracting information from eligibility
criteria, with usually a single pair of medical entity and attribute, but
seldom considering other kinds of free text with multiple entities, attributes
and relations that are more complex for parsing. In this paper, we propose a
knowledge-guided text structuring framework with an automatically generated
knowledge base as training corpus and word dependency relations as context
information to transfer free text into formal, computer-interpretable
representations. Experimental results show that our method can achieve overall
high precision and recall, demonstrating the effectiveness and efficiency of
the proposed method
A Common Gene Expression Signature Analysis Method for Multiple Types of Cancer
Mining gene expression profiles has proven valuable for identifying
signatures serving as surrogates of cancer phenotypes. However, the
similarities of such signatures across different cancer types have not been
strong enough to conclude that they represent a universal biological mechanism
shared among multiple cancer types. Here we describe a network-based approach
that explores gene-to-gene connections in multiple cancer datasets while
maximizing the overall association of the subnetwork with clinical outcomes.
With the dataset of The Cancer Genome Atlas (TCGA), we studied the
characteristics of common gene expression of three types of cancers: Rectum
adenocarcinoma (READ), Breast invasive carcinoma (BRCA) and Colon
adenocarcinoma (COAD). By analyzing several pairs of highly correlated genes
after filtering and clustering work, we found that the co-expressed genes
across multiple types of cancers point to particular biological mechanisms
related to cancer cell progression , suggesting that they represent important
attributes of cancer in need of being elucidated for potential applications in
diagnostic, prognostic and therapeutic products applicable to multiple cancer
types
Context Aware Image Annotation in Active Learning
Image annotation for active learning is labor-intensive. Various automatic
and semi-automatic labeling methods are proposed to save the labeling cost, but
a reduction in the number of labeled instances does not guarantee a reduction
in cost because the queries that are most valuable to the learner may be the
most difficult or ambiguous cases, and therefore the most expensive for an
oracle to label accurately. In this paper, we try to solve this problem by
using image metadata to offer the oracle more clues about the image during
annotation process. We propose a Context Aware Image Annotation Framework
(CAIAF) that uses image metadata as similarity metric to cluster images into
groups for annotation. We also present useful metadata information as context
for each image on the annotation interface. Experiments show that it reduces
that annotation cost with CAIAF compared to the conventional framework, while
maintaining a high classification performance.Comment: arXiv admin note: text overlap with arXiv:1508.07647, arXiv:1207.3809
by other author
Conversational Structure Aware and Context Sensitive Topic Model for Online Discussions
Millions of online discussions are generated everyday on social media
platforms. Topic modelling is an efficient way of better understanding large
text datasets at scale. Conventional topic models have had limited success in
online discussions, and to overcome their limitations, we use the discussion
thread tree structure and propose a "popularity" metric to quantify the number
of replies to a comment to extend the frequency of word occurrences, and the
"transitivity" concept to characterize topic dependency among nodes in a nested
discussion thread. We build a Conversational Structure Aware Topic Model
(CSATM) based on popularity and transitivity to infer topics and their
assignments to comments. Experiments on real forum datasets are used to
demonstrate improved performance for topic extraction with six different
measurements of coherence and impressive accuracy for topic assignments
Eliminating Search Intent Bias in Learning to Rank
Click-through data has proven to be a valuable resource for improving
search-ranking quality. Search engines can easily collect click data, but
biases introduced in the data can make it difficult to use the data
effectively. In order to measure the effects of biases, many click models have
been proposed in the literature. However, none of the models can explain the
observation that users with different search intent (e.g., informational,
navigational, etc.) have different click behaviors. In this paper, we study how
differences in user search intent can influence click activities and determined
that there exists a bias between user search intent and the relevance of the
document relevance. Based on this observation, we propose a search intent bias
hypothesis that can be applied to most existing click models to improve their
ability to learn unbiased relevance. Experimental results demonstrate that
after adopting the search intent hypothesis, click models can better interpret
user clicks and substantially improve retrieval performance