54 research outputs found

    Reverse Nearest Neighbor Heat Maps: A Tool for Influence Exploration

    Full text link
    We study the problem of constructing a reverse nearest neighbor (RNN) heat map by finding the RNN set of every point in a two-dimensional space. Based on the RNN set of a point, we obtain a quantitative influence (i.e., heat) for the point. The heat map provides a global view on the influence distribution in the space, and hence supports exploratory analyses in many applications such as marketing and resource management. To construct such a heat map, we first reduce it to a problem called Region Coloring (RC), which divides the space into disjoint regions within which all the points have the same RNN set. We then propose a novel algorithm named CREST that efficiently solves the RC problem by labeling each region with the heat value of its containing points. In CREST, we propose innovative techniques to avoid processing expensive RNN queries and greatly reduce the number of region labeling operations. We perform detailed analyses on the complexity of CREST and lower bounds of the RC problem, and prove that CREST is asymptotically optimal in the worst case. Extensive experiments with both real and synthetic data sets demonstrate that CREST outperforms alternative algorithms by several orders of magnitude.Comment: Accepted to appear in ICDE 201

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    31st Annual Meeting and Associated Programs of the Society for Immunotherapy of Cancer (SITC 2016) : part two

    Get PDF
    Background The immunological escape of tumors represents one of the main ob- stacles to the treatment of malignancies. The blockade of PD-1 or CTLA-4 receptors represented a milestone in the history of immunotherapy. However, immune checkpoint inhibitors seem to be effective in specific cohorts of patients. It has been proposed that their efficacy relies on the presence of an immunological response. Thus, we hypothesized that disruption of the PD-L1/PD-1 axis would synergize with our oncolytic vaccine platform PeptiCRAd. Methods We used murine B16OVA in vivo tumor models and flow cytometry analysis to investigate the immunological background. Results First, we found that high-burden B16OVA tumors were refractory to combination immunotherapy. However, with a more aggressive schedule, tumors with a lower burden were more susceptible to the combination of PeptiCRAd and PD-L1 blockade. The therapy signifi- cantly increased the median survival of mice (Fig. 7). Interestingly, the reduced growth of contralaterally injected B16F10 cells sug- gested the presence of a long lasting immunological memory also against non-targeted antigens. Concerning the functional state of tumor infiltrating lymphocytes (TILs), we found that all the immune therapies would enhance the percentage of activated (PD-1pos TIM- 3neg) T lymphocytes and reduce the amount of exhausted (PD-1pos TIM-3pos) cells compared to placebo. As expected, we found that PeptiCRAd monotherapy could increase the number of antigen spe- cific CD8+ T cells compared to other treatments. However, only the combination with PD-L1 blockade could significantly increase the ra- tio between activated and exhausted pentamer positive cells (p= 0.0058), suggesting that by disrupting the PD-1/PD-L1 axis we could decrease the amount of dysfunctional antigen specific T cells. We ob- served that the anatomical location deeply influenced the state of CD4+ and CD8+ T lymphocytes. In fact, TIM-3 expression was in- creased by 2 fold on TILs compared to splenic and lymphoid T cells. In the CD8+ compartment, the expression of PD-1 on the surface seemed to be restricted to the tumor micro-environment, while CD4 + T cells had a high expression of PD-1 also in lymphoid organs. Interestingly, we found that the levels of PD-1 were significantly higher on CD8+ T cells than on CD4+ T cells into the tumor micro- environment (p < 0.0001). Conclusions In conclusion, we demonstrated that the efficacy of immune check- point inhibitors might be strongly enhanced by their combination with cancer vaccines. PeptiCRAd was able to increase the number of antigen-specific T cells and PD-L1 blockade prevented their exhaus- tion, resulting in long-lasting immunological memory and increased median survival

    Destination Prediction by Identifying and Clustering Prominent Features from Public Trajectory Datasets

    No full text
    Destination prediction is an essential task in many location-based services (LBS) such as providing targeted advertisements and route recommendations. Most existing solutions were generative methods that model the problem as a series of probabilistic events that are then used to compute the destination probability using Bayes’ rule. In contrast, we propose a discriminative method that chooses the most prominent features found in a public trajectory dataset, clusters the trajectories into groups based on these features, and performs destination prediction queries accordingly. Our method is more concise and simple than existing methods while achieving better runtime efficiency and prediction accuracy as verified by experimental studies

    Destination Prediction by Sub-Trajectory Synthesis and Privacy Protection Against Such Prediction

    No full text
    Abstract — Destination prediction is an essential task for many emerging location based applications such as recommending sightseeing places and targeted advertising based on destination. A common approach to destination prediction is to derive the probability of a location being the destination based on historical trajectories. However, existing techniques using this approach suffer from the “data sparsity problem”, i.e., the available historical trajectories is far from being able to cover all possible trajectories. This problem considerably limits the number of query trajectories that can obtain predicted destinations. We propose a novel method named Sub-Trajectory Synthesis (SubSyn) algorithm to address the data sparsity problem. SubSyn algorithm first decomposes historical trajectories into sub-trajectories comprising two neighbouring locations, and then connects the sub-trajectories into “synthesised ” trajectories. The number of query trajectories that can have predicted destinations is exponentially increased by this means. Experiments based on real datasets show that SubSyn algorithm can predict destinations for up to ten times more query trajectories than a baseline algorithm while the SubSyn prediction algorithm runs over two orders of magnitude faster than the baseline algorithm. In this paper, we also consider the privacy protection issue in case an adversary uses SubSyn algorithm to derive sensitive location information of users. We propose an efficient algorithm to select a minimum number of locations a user has to hide on her trajectory in order to avoid privacy leak. Experiments also validate the high efficiency of the privacy protection algorithm. I

    DesTeller: A System for Destination Prediction Based on Trajectories with Privacy Protection

    No full text
    Destination prediction is an essential task for a number of emerging location based applications such as recommending sightseeing places and sending targeted advertisements. A common approach to destination prediction is to derive the probability of a location being the destination based on historical trajectories. However, existing techniques suffer from the “data sparsity problem”, i.e., the number of available historical trajectories is far from sufficient to cover all possible trajectories. This problem considerably limits the amount of query trajectories whose predicted destinations can be inferred. In this demonstration, we showcase a system named “DesTeller” that is interactive, user-friendly, publicly accessible, and capable of answering real-time queries. The underlying algorithm Sub-Trajectory Synthesis (SubSyn) successfully addressed the data sparsity problem and is able to predict destinations for almost every query submitted by travellers. We also consider the privacy protection issue in case an adversary uses SubSyn algorithm to derive sensitive location information of users.

    World Wide Web manuscript No. (will be inserted by the editor) The Min-dist Location Selection and Facility Replacement Queries

    Get PDF
    Abstract We propose and study a new type of location optimization problem, the min-dist location selection problem: given a set of clients and a set of existing facilities, we select a location from a given set of potential locations for establishing a new facility, so that the average distance between a client and her nearest facility is minimized. The problem has a wide range of applications in urban development simulation, massively multiplayer online games, and decision support systems. We also investigate a variant of the problem, where we consider replacing (instead of adding) a facility while achieving the same optimization goal. We call this variant the min-dist facility replacement problem. We explore two common approaches to location optimization problems and present methods based on those approaches for solving the min-dist location selection problem. However, those methods either need to maintain an extra index or fall short in efficiency. To address their drawbacks, we propose a novel method (named MND), which has very close performance to the fastest method but does not need an extra index. We then utilize the key idea behind MND to approach the min-dist facility replacement problem, which results in two algorithms names MSND and RID. We provide a detailed comparative cost analysis and conduct extensive experiments on the various algorithms. The results show that MND and RID outperform their competitors by orders of magnitude

    Sex differences in oncogenic mutational processes

    Get PDF
    Sex differences have been observed in multiple facets of cancer epidemiology, treatment and biology, and in most cancers outside the sex organs. Efforts to link these clinical differences to specific molecular features have focused on somatic mutations within the coding regions of the genome. Here we report a pan-cancer analysis of sex differences in whole genomes of 1983 tumours of 28 subtypes as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. We both confirm the results of exome studies, and also uncover previously undescribed sex differences. These include sex-biases in coding and non-coding cancer drivers, mutation prevalence and strikingly, in mutational signatures related to underlying mutational processes. These results underline the pervasiveness of molecular sex differences and strengthen the call for increased consideration of sex in molecular cancer research.Sex differences have been observed in multiple facets of cancer epidemiology, treatment and biology, and in most cancers outside the sex organs. Efforts to link these clinical differences to specific molecular features have focused on somatic mutations within the coding regions of the genome. Here we report a pan-cancer analysis of sex differences in whole genomes of 1983 tumours of 28 subtypes as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. We both confirm the results of exome studies, and also uncover previously undescribed sex differences. These include sex-biases in coding and non-coding cancer drivers, mutation prevalence and strikingly, in mutational signatures related to underlying mutational processes. These results underline the pervasiveness of molecular sex differences and strengthen the call for increased consideration of sex in molecular cancer research.Peer reviewe
    corecore