230 research outputs found

    Exploring Communities in Large Profiled Graphs

    Full text link
    Given a graph GG and a vertex q∈Gq\in G, the community search (CS) problem aims to efficiently find a subgraph of GG whose vertices are closely related to qq. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph. This is a graph in which each vertex has labels arranged in a hierarchical manner. Extensive experiments show that PCS can identify communities with themes that are common to their vertices, and is more effective than existing CS approaches. As a naive solution for PCS is highly expensive, we have also developed a tree index, which facilitate efficient and online solutions for PCS

    Model-free screening procedure for ultrahigh-dimensional survival data based on Hilbert-Schmidt independence criterion

    Full text link
    How to select the active variables which have significant impact on the event of interest is a very important and meaningful problem in the statistical analysis of ultrahigh-dimensional data. Sure independent screening procedure has been demonstrated to be an effective method to reduce the dimensionality of data from a large scale to a relatively moderate scale. For censored survival data, the existing screening methods mainly adopt the Kaplan--Meier estimator to handle censoring, which may not perform well for scenarios which have heavy censoring rate. In this article, we propose a model-free screening procedure based on the Hilbert-Schmidt independence criterion (HSIC). The proposed method avoids the complication to specify an actual model from a large number of covariates. Compared with existing screening procedures, this new approach has several advantages. First, it does not involve the Kaplan--Meier estimator, thus its performance is much more robust for the cases with a heavy censoring rate. Second, the empirical estimate of HSIC is very simple as it just depends on the trace of a product of Gram matrices. In addition, the proposed procedure does not require any complicated numerical optimization, so the corresponding calculation is very simple and fast. Finally, the proposed procedure which employs the kernel method is substantially more resistant to outliers. Extensive simulation studies demonstrate that the proposed method has favorable exhibition over the existing methods. As an illustration, we apply the proposed method to analyze the diffuse large-B-cell lymphoma (DLBCL) data and the ovarian cancer data

    An Augmented Index-based Efficient Community Search for Large Directed Graphs

    Full text link
    Given a graph G and a query vertex q, the topic of community search (CS), aiming to retrieve a dense subgraph of G containing q, has gained much attention. Most existing works focus on undirected graphs which overlooks the rich information carried by the edge directions. Recently, the problem of community search over directed graphs (or CSD problem) has been studied; it finds a connected subgraph containing q, where the in-degree and out-degree of each vertex within the subgraph are at least k and l, respectively. However, existing solutions are inefficient, especially on large graphs. To tackle this issue, in this paper, we propose a novel index called D-Forest, which allows a CSD query to be completed within the optimal time cost. We further propose efficient index construction methods. Extensive experiments on six real large graphs show that our index-based query algorithm is up to two orders of magnitude faster than existing solutions.Comment: Full version of our IJCAI20 pape

    Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models

    Full text link
    Prompting, which casts downstream applications as language modeling tasks, has shown to be sample efficient compared to standard fine-tuning with pre-trained models. However, one pitfall of prompting is the need of manually-designed patterns, whose outcome can be unintuitive and requires large validation sets to tune. To tackle the challenge, we propose AutoSeq, a fully automatic prompting method: (1) We adopt natural language prompts on sequence-to-sequence models, enabling free-form generation and larger label search space; (2) We propose label sequences -- phrases with indefinite lengths to verbalize the labels -- which eliminate the need of manual templates and are more expressive than single label words; (3) We use beam search to automatically generate a large amount of label sequence candidates and propose contrastive re-ranking to get the best combinations. AutoSeq significantly outperforms other no-manual-design methods, such as soft prompt tuning, adapter tuning, and automatic search on single label words; the generated label sequences are even better than curated manual ones on a variety of tasks. Our method reveals the potential of sequence-to-sequence models in few-shot learning and sheds light on a path to generic and automatic prompting. The source code of this paper can be obtained from https://github.com/thunlp/Seq2Seq-Prompt.Comment: Accepted to COLING 202

    Impact of Enteromorpha Blooms on Aquaculture Research Off Qianliyan Island, Yellow Sea, China

    Get PDF
    Between 2008 and 2016, there were mass summer blooms of Enteromorpha in the Yellow Sea, China. They covered an area of thousands of square kilometers annually, lasting an average of 90 days. Remote sensing data, model predictions, and marine environment ecological data measured by ships before, during, and after the Enteromorpha blooms were used in this study of the Qianliyan Island area. Underwater robots survey trepang, wrinkles abalone, and submarine ecological status. We found that the time taken by Enteromorpha to cover the Qianliyan Island area was relevant, as were changes in sea surface temperature (SST). The Enteromorpha made a rise in inorganic nitrogen, reactive phosphate, and heavy metals content in upper, middle, and bottom layers of sea water, dissolved oxygen (DO) and pH were reduced; and there were changes in the dominant animal and plant population. Enteromorpha sedimentation during outbreaks was measured by benthos sampling. Considerable growth in starfish number was obtained by underwater robot observation. All of this directly influenced the regional ecological environment. Numbers of trepang and wrinkles abalone were declined over the years. Global warming and SST anomalies are the two main reasons for frequent marine disasters that take place. National aquatic germ plasm resources of Qianliyan should be protected from the blooms

    Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

    Full text link
    Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs.Comment: Accepted by Findings of EMNL

    Characteristics of multicystic biliary hamartoma: A case report

    Get PDF
    IntroductionMulticystic biliary hamartoma (MCBH) is a very rare hepatic benign neoplasm that manifests as a localized cystic-solid mass. Only 17 cases have been described in the literature to date. MCBH diagnosis is currently dependent on imaging and pathology following surgical resection and no precise standards are in place.Case PresentationThis case study involves a middle-aged male patient with a history of drinking but no other liver diseases. A routine ultrasound examination showed a 6.0 × 5.5 cm inhomogeneous echo mass in the right lobe of the liver. The patient experienced no discomfort or other symptoms, and blood tests were normal. Imaging revealed a localized cystic-solid neoplasm in segment 6 of the liver that did not have the features of a malignant tumor. Surgical resection was performed. Based on imaging, macroscopic examination, and histological results, a final diagnosis of MCBH was made.ConclusionThe imaging and pathological features of MCBH were summarized based on the published case reports to date. As a non-invasive examination, the imaging features will aid in the diagnosis of MCBH. Furthermore, these features, along with tumor size and patient symptoms, will facilitate clinicians in selecting surgical resection or follow-up for individual patients

    Emergent Modularity in Pre-trained Transformers

    Full text link
    This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202
    • …
    corecore