55 research outputs found

    The information retrieval challenge of human digital memories

    Get PDF
    Today people are storing increasing amounts of personal information in digital format. While storage of such information is becoming straight forward, retrieval from the vast personal archives that this is creating poses significant challenges. Existing retrieval techniques are good at retrieving from non-personal spaces, such as the World Wide Web. However they are not sufficient for retrieval of items from these new unstructured spaces which contain items that are personal to the individual, and of which the user has personal memories and with which has had previous interaction. We believe that there are new and exciting possibilities for retrieval from personal archives. Memory cues act as triggers for individuals in the remembering process, a better understanding of memory cues will enable us to design new and effective retrieval algorithms and systems for personal archives. Context data, such as time and location, is already proving to play a key part in this special retrieval domain, for example for searching personal photo archives, we believe there are many other rich sources of context that can be exploited for retrieval from personal archives

    Venturing into the labyrinth: the information retrieval challenge of human digital memories

    Get PDF
    Advances in digital capture and storage technologies mean that it is now possible to capture and store oneโ€™s entire life experiences in a Human Digital Memory (HDM). However, these vast personal archives are of little benefit if an individual cannot locate and retrieve significant items from them. While potentially offering exciting opportunities to support a user in their activities by providing access to information stored from previous experiences, we believe that the features of HDM datasets present new research challenges for information retrieval which must be addressed if these possibilities are to be realised. Specifically we postulate that effective retrieval from HDMs must exploit the rich sources of context data which can be captured and associated with items stored within them. Userโ€™s memories of experiences stored within their memory archive will often be linked to these context features. We suggest how such contextual metadata can be exploited within the retrieval process

    The use of provenance in information retrieval

    Get PDF
    The volume of electronic information that users accumulate is steadily rising. A recent study [2] found that there were on average 32,000 pieces of information (e-mails, web pages, documents, etc.) for each user. The problem of organizin

    ์ฐจ์„ธ๋Œ€ ์‹œํ€€์‹ฑ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ SNV/InDel ํ˜ธ์ถœ ๋ฐ ํ•˜ํ”Œ๋กœํƒ€์ดํ•‘์˜ ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ๋ฐ•๊ทผ์ˆ˜.์ฐจ์„ธ๋Œ€ ์‹œํ€€์‹ฑ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ˆ˜๋งŽ์€ ๋ณ€์ด ํ˜ธ์ถœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๊ฐœ๋ฐœ๋˜์–ด ์™”๋‹ค. ๋Œ€๋‹ค์ˆ˜ ๋ณ€์ด ํ˜ธ์ถœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ถ”๊ฐ€์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์—ฌ์ง€๋Š” ๋‚จ์•„์žˆ๋‹ค. ํŠนํžˆ, ๋‚ฎ์€ ๋ฆฌ๋“œ ๊นŠ์ด๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ณ€์ด ํ˜ธ์ถœ๊ณผ ์ฒด์„ธํฌ ๋ณ€์ด ํ˜ธ์ถœ์€ ๊ฐœ์„ ๋  ์—ฌ์ง€๊ฐ€ ๋งŽ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์œ„ ์–‘์„ฑ ๋ณ€์ด๋ฅผ ์ œ๊ฑฐํ•˜์—ฌ ๋ณ€์ด ํ˜ธ์ถœ์˜ ์ •๋ฐ€๋„๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ RDscan์„ ์ œ์•ˆํ•œ๋‹ค. RDscan์€ ์ž˜๋ชป ์ •๋ ฌ๋œ ๋ฆฌ๋“œ๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ๋ฆฌ๋“œ๋ฅผ ์žฌ๋ฐฐ์น˜ ํ•œ ํ›„, ๋ฆฌ๋“œ ๊นŠ์ด ๋ถ„ํฌ์— ๊ธฐ๋ฐ˜ํ•œ ๋ณ€์ด์˜ ์‹ ๋ขฐ๋„ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์œ„ ์–‘์„ฑ ๋ณ€์ด๋ฅผ ์ œ๊ฑฐํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ตœ์‹  ๋ณ€์ด ํ˜ธ์ถœ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ RDscan์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. 1000 ๊ฒŒ๋†ˆ ํ”„๋กœ์ ํŠธ์™€ ์ผ๋ฃจ๋ฏธ๋‚˜์˜ ๋ฐ์ดํ„ฐ์„ธํŠธ์— ๋Œ€ํ•˜์—ฌ RDscan์„ ํ†ตํ•œ ์ถ”๊ฐ€์ ์ธ ๋ณ€์ด ํ•„ํ„ฐ๋ง์€ ํ…Œ์ŠคํŠธ์— ์‚ฌ์šฉ๋œ ๋Œ€๋ถ€๋ถ„์˜ ๋ณ€์ด ํ˜ธ์ถœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ •ํ™•์„ฑ์„ ๊ฐœ์„ ์‹œ์ผฐ๋‹ค. ์ƒ์‹์„ธํฌ ๋ณ€์ด์— ๋Œ€ํ•œ ํ˜ธ์ถœ์€ 12๊ฑด์˜ ํ…Œ์ŠคํŠธ ์ค‘ 11๊ฑด, ์ฒด์„ธํฌ ๋ณ€์ด ๋Œ€ํ•œ ํ˜ธ์ถœ์€ 24๊ฑด์˜ ํ…Œ์ŠคํŠธ ์ค‘ 21๊ฑด์—์„œ ์ •ํ™•์„ฑ์ด ์ฆ๊ฐ€๋˜์—ˆ๋‹ค. ์•Œ๋ ค์ง„ ๊ณจ๋“œ ์Šคํƒ ๋‹ค๋“œ ๋ณ€์ด ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์ตœ์ ์˜ ๋ณ€์ด ์„ธํŠธ์— ๋Œ€ํ•ด์„œ๋„, RDscan ์€ ์ƒ์‹์„ธํฌ ๋ณ€์ด์— ๋Œ€ํ•œ 12๊ฑด ์ค‘ 5๊ฑด, ์ฒด์„ธํฌ ๋ณ€์ด์— ๋Œ€ํ•œ 24 ๊ฑด ์ค‘ 21๊ฑด์—์„œ ๋ณ€์ด ํ˜ธ์ถœ ์ •ํ™•์„ฑ์„ ๊ฐœ์„ ํ•˜์˜€๋‹ค. ์ž„์ƒ ๋ฐ ์—ฐ๊ตฌ์—์„œ๋Š” ๋‹จ์ผ ๊ฒŒ๋†ˆ ๊ฐ€๋‹ฅ์— ์กด์žฌํ•˜๋Š” ๋ณ€์ด์˜ ์„ธํŠธ ์ •๋ณด (ํ•˜ํ”Œ๋กœํƒ€์ดํ•‘)๋ฅผ ํ•„์š”๋กœ ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค. ํŠนํžˆ ์ธ๊ฐ„ ๋ฐฑํ˜ˆ๊ตฌ ํ•ญ์› ์œ ์ „์ž๋“ค์— ๋Œ€ํ•œ ํ•˜ํ”Œ๋กœํƒ€์ดํ•‘์€ ์‹ค์ œ ์ž„์ƒ์—์„œ ๋‹ค๋ฃจ๋Š” ์ค‘์š”ํ•œ ๋ฌธ์ œ์ด๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ์ฐจ์„ธ๋Œ€ ์‹œํ€€์‹ฑ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ธ๊ฐ„ ๋ฐฑํ˜ˆ๊ตฌ ํ•ญ์› ์œ ์ „์ž์— ๋Œ€ํ•œ ํ•˜ํ”Œ๋กœํƒ€์ดํ•‘์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ์— ์ ํ•ฉํ•จ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ํ•˜์ง€๋งŒ, ํ•˜ํ”Œ๋กœํƒ€์ดํ•‘์˜ ์ •ํ™•์„ฑ์„ ์ €ํ•˜์‹œํ‚ค๋Š” ๋Œ€๋ฆฝ ์œ ์ „์ž์˜ ์ƒ ์กฐ์ • ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์—†๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ฐจ์„ธ๋Œ€ ์‹œํ€€์‹ฑ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ธ๊ฐ„ ๋ฐฑํ˜ˆ๊ตฌ ํ•ญ์› ์œ ์ „์ž๋“ค์— ๋Œ€ํ•œ ํ•˜ํ”Œ๋กœํƒ€์ดํ•‘์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ HLAscan์„ ์†Œ๊ฐœํ•œ๋‹ค. HLAscan์€ ImMunoGeneTics ํ”„๋กœ์ ํŠธ์—์„œ ์ œ๊ณตํ•˜๋Š” IMGT/HLA ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ์ธ๊ฐ„ ๋ฐฑํ˜ˆ๊ตฌ ํ•ญ์› ์œ ์ „์ž ์„œ์—ด๋“ค์— ๋Œ€ํ•ด ๊ฐœ์ธ์˜ ์œ ์ „์ฒด ๋ฆฌ๋“œ๋ฅผ ์ •๋ ฌํ•œ๋‹ค. ๊ทธ ํ›„, ์ •๋ ฌ๋œ ๋ฆฌ๋“œ์˜ ๋ถ„ํฌ์— ๊ธฐ๋ฐ˜ํ•œ ์ ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ฌ๋ฐ”๋ฅธ ๋Œ€๋ฆฝ์œ ์ „์ž์˜ ์ƒ์„ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. HLAscan์„ ํ†ตํ•œ ํ•˜ํ”Œ๋กœํƒ€์ดํ•‘์€1000 ๊ฒŒ๋†ˆ ํ”„๋กœ์ ํŠธ์™€ HapMap ํ”„๋กœ์ ํŠธ์˜ ๊ณต์‹ ๋ฐ์ดํ„ฐ์„ธํŠธ์— ๋Œ€ํ•ด์„œ ๊ธฐ์กด์˜ ์ฐจ์„ธ๋Œ€ ์‹œํ€€์‹ฑ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๋ณด๋‹ค ๋†’์€ ์ •ํ™•์„ฑ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ๋˜ํ•œ HiSeq X-TEN์œผ๋กœ ์ƒ์„ฑํ•œ ์•„ํ™‰ ๊ฐ€์กฑ์˜ ๋ฐ์ดํ„ฐ์„ธํŠธ์— ๋Œ€ํ•ด์„œ, HLAscan์„ ์‚ฌ์šฉํ•œ ํ•˜ํ”Œ๋กœํƒ€์ดํ•‘ ๊ฒฐ๊ณผ๋Š” 96.9%์˜ ์ •ํ™•์„ฑ์„ ๋ณด์˜€๊ณ , ๊ทธ ์ค‘ 90ร— ์ด์ƒ์˜ ๋†’์€ ๋ฆฌ๋“œ ๊นŠ์ด๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ์„ธํŠธ์— ๋Œ€ํ•ด์„œ๋Š” 100% ์ •ํ™•์„ฑ์„ ๋ณด์˜€๋‹ค.Several tools have been developed for calling variants from next-generation sequencing data. Although they are generally accurate and reliable, most of them have room for improvement, especially in regard to calling variants in datasets with low read depth coverage. In addition, the somatic variants predicted by several somatic variant callers tend to have very low concordance rates. First, we propose a new tool (RDscan) for improving germline and somatic variant calling in next-generation sequencing data. RDscan removes misaligned reads, repositions reads, and calculates RDscore based on the read depth distribution. With RDscore, RDscan improves the precision of variant callers by removing false variants. When we tested our new tool using the latest variant calling algorithms, accuracy was improved for most of the algorithms. After screening variants with RDscan, calling accuracies increased for germline variants in 11 out of 12 cases and for somatic variants in 21 out of 24 cases. For the best set of variants produced by optimizing the parameters of each algorithm using the known truth sets, RDscan increased the calling accuracies for germline variants in 5 out of 12 cases and for somatic variants in 21 out of 24 cases. Some applications require information on multiple variants in a single genome strand (haplotyping). In particular, precise haplotyping for human leukocyte antigen genes is of great clinical importance. Several recent studies showed that next-generation sequencing based method is a feasible and promising technique for haplotyping of human leukocyte antigen genes. To date, however, no method with sufficient read depth has completely solved the allele phasing issue. Second, we developed a new method (HLAscan) for HLA haplotyping using NGS data. HLAscan performs alignment of reads to HLA sequences from human leukocyte antigen (IMGT/HLA) database in the international ImMunoGeneTics project. The distribution of aligned reads was used to calculate a score function to determine correctly phased alleles by progressively removing false-positive alleles. Comparative HLA typing tests using public datasets from the 1000 Genomes Project and the International HapMap Project demonstrated that HLAscan could perform HLA typing more accurately than previously reported NGS-based methods. We also applied HLAscan to a family dataset with various coverage depths generated on the Illumina HiSeq X-TEN platform. HLAscan identified allele types of HLA-A, -B, -C, -DQB1, and -DRB1 with 100% accuracy for sequences at โ‰ฅ 90ร— depth, and the overall accuracy was 96.9%.Abstract i Contents iii List of Figures iv List of Tables vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Problem Statement 7 1.3 Previous Works and New Results 8 1.4 Organization 10 Chapter 2 SNV and InDel Calling 11 2.1 Preliminaries 11 2.2 Germline Variant Calling Algorithm 16 2.3 Somatic Variant Calling Algorithm 21 2.4 Results 22 2.5 Discussions 46 Chapter 3 Haplotyping for MHC region 48 3.1 Preliminaries 48 3.2 Haplotyping Algorithm 52 3.3 Results 58 3.4 Discussions 72 Chapter 4 Conclusion 74 4.1 Summary 74 4.2 Future Directions 76 Bibliography 78Docto

    This Time It's Personal: from PIM to the Perfect Digital Assistant

    No full text
    Interacting with digital PIM tools like calendars, to-do lists, address books, bookmarks and so on, is a highly manual, often repetitive and frequently tedious process. Despite increases in memory and processor power over the past two decades of personal computing, not much has changed in the way we engage with such applications. We must still manually decompose frequently performed tasks into multiple smaller, data specific processes if we want to be able to recall or reuse the information in some meaningful way. "Meeting with Yves at 5 in Stata about blah" breaks down into rigid, fixed semantics in separate applications: data to be recorded in calendar fields, address book fields and, as for the blah, something that does not necessarily exist as a PIM application data structure. We argue that a reason Personal Information Management tools may be so manual, and so effectively fragmented, is that they are not personal enough. If our information systems were more personal, that is, if they knew in a manner similar to the way a personal assistant would know us and support us, then our tools would be more helpful: an assistive PIM tool would gather together the necessary material in support of our meeting with Yves. We, therefore, have been investigating the possible paths towards PIM tools as tools that work for us, rather than tools that seemingly make us work for them. To that end, in the following sections we consider how we may develop a framework for PIM tools as "perfect digital assistants" (PDA). Our impetus has been to explore how, by considering the affordances of a Real World personal assistant, we can conceptualize a design framework, and from there a development program for a digital simulacrum of such an assistant that is not for some far off future, but for the much nearer term

    Workshop on evaluating personal search

    Get PDF
    The first ECIR workshop on Evaluating Personal Search was held on 18th April 2011 in Dublin, Ireland. The workshop consisted of 6 oral paper presentations and several discussion sessions. This report presents an overview of the scope and contents of the workshop and outlines the major outcomes

    HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data

    Get PDF
    BACKGROUND: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important insights into autoimmune disorders, disease susceptibility, oncological immunotherapy, regenerative medicine, transplant rejection, and toxicogenomics. Traditional approaches to HLA typing are low throughput, target only a few genes, are labor intensive and costly, or require specialized protocols. RNA sequencing promises a relatively inexpensive, high-throughput solution for HLA calling across all genes, with the bonus of complete transcriptome information and widespread availability of historical data. Existing tools have been limited in their ability to accurately and comprehensively call HLA genes from RNA-seq data. RESULTS: We created HLAProfiler ( https://github.com/ExpressionAnalysis/HLAProfiler ), a k-mer profile-based method for HLA calling in RNA-seq data which can identify rare and common HLA alleles with >โ€‰99% accuracy at two-field precision in both biological and simulated data. For 68% of novel alleles not present in the reference database, HLAProfiler can correctly identify the two-field precision or exact coding sequence, a significant advance over existing algorithms. CONCLUSIONS: HLAProfiler allows for accurate HLA calls in RNA-seq data, reliably expanding the utility of these data in HLA-related research and enabling advances across a broad range of disciplines. Additionally, by using the observed data to identify potential novel alleles and update partial alleles, HLAProfiler will facilitate further improvements to the existing database of reference HLA alleles. HLAProfiler is available at https://expressionanalysis.github.io/HLAProfiler/

    The role of places and spaces in lifelog retrieval

    Get PDF
    Finding relevant interesting items when searching or browsing within a large multi-modal personal lifelog archive is a significant challenge. The use of contextual cues to filter the collection and aid in the determination of relevant content is often suggested as means to address such challenges. This work presents an exploration of the various locations, garnered through context logging, several participants engaged in during personal information access over a 15 month period. We investigate the implications of the varying data accessed across multiple locations for context-based retrieval from such collections. Our analysis highlights that a large number of spaces and places may be used for information access, but high volume of content is accessed in few
    • โ€ฆ
    corecore