Search CORE

230 research outputs found

Exploring Communities in Large Profiled Graphs

Author: Chen Xiaojun
Chen Yankai
Cheng Reynold
Fang Yixiang
Li Yun
Zhang Jie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Given a graph

G

and a vertex

q\in G

, the community search (CS) problem aims to efficiently find a subgraph of

G

whose vertices are closely related to

q

. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph. This is a graph in which each vertex has labels arranged in a hierarchical manner. Extensive experiments show that PCS can identify communities with themes that are common to their vertices, and is more effective than existing CS approaches. As a naive solution for PCS is highly expensive, we have also developed a tree index, which facilitate efficient and online solutions for PCS

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Model-free screening procedure for ultrahigh-dimensional survival data based on Hilbert-Schmidt independence criterion

Author: Li Xuerui
Liu Yanyan
Peng Yankai
Zhang Jing
Publication venue
Publication date: 26/03/2023
Field of study

How to select the active variables which have significant impact on the event of interest is a very important and meaningful problem in the statistical analysis of ultrahigh-dimensional data. Sure independent screening procedure has been demonstrated to be an effective method to reduce the dimensionality of data from a large scale to a relatively moderate scale. For censored survival data, the existing screening methods mainly adopt the Kaplan--Meier estimator to handle censoring, which may not perform well for scenarios which have heavy censoring rate. In this article, we propose a model-free screening procedure based on the Hilbert-Schmidt independence criterion (HSIC). The proposed method avoids the complication to specify an actual model from a large number of covariates. Compared with existing screening procedures, this new approach has several advantages. First, it does not involve the Kaplan--Meier estimator, thus its performance is much more robust for the cases with a heavy censoring rate. Second, the empirical estimate of HSIC is very simple as it just depends on the trace of a product of Gram matrices. In addition, the proposed procedure does not require any complicated numerical optimization, so the corresponding calculation is very simple and fast. Finally, the proposed procedure which employs the kernel method is substantially more resistant to outliers. Extensive simulation studies demonstrate that the proposed method has favorable exhibition over the existing methods. As an illustration, we apply the proposed method to analyze the diffuse large-B-cell lymphoma (DLBCL) data and the ovarian cancer data

arXiv.org e-Print Archive

An Augmented Index-based Efficient Community Search for Large Directed Graphs

Author: Cao Xin
Chen Yankai
Fang Yixiang
King Irwin
Zhang Jie
Publication venue
Publication date: 16/11/2023
Field of study

Given a graph G and a query vertex q, the topic of community search (CS), aiming to retrieve a dense subgraph of G containing q, has gained much attention. Most existing works focus on undirected graphs which overlooks the rich information carried by the edge directions. Recently, the problem of community search over directed graphs (or CSD problem) has been studied; it finds a connected subgraph containing q, where the in-degree and out-degree of each vertex within the subgraph are at least k and l, respectively. However, existing solutions are inefficient, especially on large graphs. To tackle this issue, in this paper, we propose a novel index called D-Forest, which allows a CSD query to be completed within the optimal time cost. We further propose efficient index construction methods. Extensive experiments on six real large graphs show that our index-based query algorithm is up to two orders of magnitude faster than existing solutions.Comment: Full version of our IJCAI20 pape

arXiv.org e-Print Archive

Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models

Author: Gao Tianyu
Lin Yankai
Liu Zhiyuan
Sun Maosong
Yu Zichun
Zhang Zhengyan
Zhou Jie
Publication venue
Publication date: 19/09/2022
Field of study

Prompting, which casts downstream applications as language modeling tasks, has shown to be sample efficient compared to standard fine-tuning with pre-trained models. However, one pitfall of prompting is the need of manually-designed patterns, whose outcome can be unintuitive and requires large validation sets to tune. To tackle the challenge, we propose AutoSeq, a fully automatic prompting method: (1) We adopt natural language prompts on sequence-to-sequence models, enabling free-form generation and larger label search space; (2) We propose label sequences -- phrases with indefinite lengths to verbalize the labels -- which eliminate the need of manual templates and are more expressive than single label words; (3) We use beam search to automatically generate a large amount of label sequence candidates and propose contrastive re-ranking to get the best combinations. AutoSeq significantly outperforms other no-manual-design methods, such as soft prompt tuning, adapter tuning, and automatic search on single label words; the generated label sequences are even better than curated manual ones on a variety of tasks. Our method reveals the potential of sequence-to-sequence models in few-shot learning and sheds light on a path to generic and automatic prompting. The source code of this paper can be obtained from https://github.com/thunlp/Seq2Seq-Prompt.Comment: Accepted to COLING 202

arXiv.org e-Print Archive

Impact of Enteromorpha Blooms on Aquaculture Research Off Qianliyan Island, Yellow Sea, China

Author: Chawei Hou
Diansheng Ji
Hongyang Yu
Jie Guo
Ling Ji
Tianlong Zhang
Yankai Mu
Publication venue: 'IntechOpen'
Publication date: 20/12/2017
Field of study

Between 2008 and 2016, there were mass summer blooms of Enteromorpha in the Yellow Sea, China. They covered an area of thousands of square kilometers annually, lasting an average of 90 days. Remote sensing data, model predictions, and marine environment ecological data measured by ships before, during, and after the Enteromorpha blooms were used in this study of the Qianliyan Island area. Underwater robots survey trepang, wrinkles abalone, and submarine ecological status. We found that the time taken by Enteromorpha to cover the Qianliyan Island area was relevant, as were changes in sea surface temperature (SST). The Enteromorpha made a rise in inorganic nitrogen, reactive phosphate, and heavy metals content in upper, middle, and bottom layers of sea water, dissolved oxygen (DO) and pH were reduced; and there were changes in the dominant animal and plant population. Enteromorpha sedimentation during outbreaks was measured by benthos sampling. Considerable growth in starfish number was obtained by underwater robot observation. All of this directly influenced the regional ecological environment. Numbers of trepang and wrinkles abalone were declined over the years. Global warming and SST anomalies are the two main reasons for frequent marine disasters that take place. National aquatic germ plasm resources of Qianliyan should be protected from the blooms

IntechOpen

Crossref

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

Author: Han Xu
Lin Yankai
Liu Zhiyuan
Luo Yuqi
Sun Maosong
Xiao Chaojun
Xie Ruobing
Zhang Pengle
Zhang Wenbin
Zhang Zhengyan
Zhou Jie
Publication venue
Publication date: 24/10/2023
Field of study

Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs.Comment: Accepted by Findings of EMNL

arXiv.org e-Print Archive

An acetyl-L-carnitine switch on mitochondrial dysfunction and rescue in the metabolomics study on aluminum oxide nanoparticles

Author: Chengcheng Zhang
Hongbao Yang
Qingtao Meng
Rui Chen
Shenshen Wu
Shizhi Wang
Xiaobo Li
Xin Zhang
Yankai Xia
Publication venue: Springer Nature
Publication date
Field of study

Springer - Publisher Connector

Characteristics of multicystic biliary hamartoma: A case report

Author: Guiqiu Liu
Jia Lian
Jia Lian
Jia Lian
Jia Lian
Jia Lian
Jun Li
Jun Li
Jun Li
Jun Li
Jun Li
Lixia Sun
Lixia Sun
Lixia Sun
Lixia Sun
Lixia Sun
Weijuan Hu
Yankai Yang
Yankai Yang
Yankai Yang
Yankai Yang
Yankai Yang
Ye Zhang
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2023
Field of study

IntroductionMulticystic biliary hamartoma (MCBH) is a very rare hepatic benign neoplasm that manifests as a localized cystic-solid mass. Only 17 cases have been described in the literature to date. MCBH diagnosis is currently dependent on imaging and pathology following surgical resection and no precise standards are in place.Case PresentationThis case study involves a middle-aged male patient with a history of drinking but no other liver diseases. A routine ultrasound examination showed a 6.0 × 5.5 cm inhomogeneous echo mass in the right lobe of the liver. The patient experienced no discomfort or other symptoms, and blood tests were normal. Imaging revealed a localized cystic-solid neoplasm in segment 6 of the liver that did not have the features of a malignant tumor. Surgical resection was performed. Based on imaging, macroscopic examination, and histological results, a final diagnosis of MCBH was made.ConclusionThe imaging and pathological features of MCBH were summarized based on the published case reports to date. As a non-invasive examination, the imaging features will aid in the diagnosis of MCBH. Furthermore, these features, along with tumor size and patient symptoms, will facilitate clinicians in selecting surgical resection or follow-up for individual patients

Directory of Open Access Journals

Emergent Modularity in Pre-trained Transformers

Author: Han Xu
Lin Yankai
Liu Zhiyuan
Sun Maosong
Wang Xiaozhi
Xiao Chaojun
Xie Ruobing
Zeng Zhiyuan
Zhang Zhengyan
Zhou Jie
Publication venue
Publication date: 28/05/2023
Field of study

This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202

arXiv.org e-Print Archive