104 research outputs found
SCNrank: spectral clustering for network-based ranking to reveal potential drug targets and its application in pancreatic ductal adenocarcinoma
Background: Pancreatic ductal adenocarcinoma (PDAC) is the most common pancreatic malignancy. Due to its wide heterogeneity, PDAC acts aggressively and responds poorly to most chemotherapies, causing an urgent need for the development of new therapeutic strategies. Cell lines have been used as the foundation for drug development and disease modeling. CRISPR-Cas9 plays a key role in every step-in drug discovery: from target identification and validation to preclinical cancer cell testing. Using cell-line models and CRISPR-Cas9 technology together make drug target prediction feasible. However, there is still a large gap between predicted results and actionable targets in real tumors. Biological network models provide great modus to mimic genetic interactions in real biological systems, which can benefit gene perturbation studies and potential target identification for treating PDAC. Nevertheless, building a network model that takes cell-line data and CRISPR-Cas9 data as input to accurately predict potential targets that will respond well on real tissue remains unsolved.
Methods: We developed a novel algorithm 'Spectral Clustering for Network-based target Ranking' (SCNrank) that systematically integrates three types of data: expression profiles from tumor tissue, normal tissue and cell-line PDAC; protein-protein interaction network (PPI); and CRISPR-Cas9 data to prioritize potential drug targets for PDAC. The whole algorithm can be classified into three steps: 1. using STRING PPI network skeleton, SCNrank constructs tissue-specific networks with PDAC tumor and normal pancreas tissues from expression profiles; 2. With the same network skeleton, SCNrank constructs cell-line-specific networks using the cell-line PDAC expression profiles and CRISPR-Cas 9 data from pancreatic cancer cell-lines; 3. SCNrank applies a novel spectral clustering approach to reduce data dimension and generate gene clusters that carry common features from both networks. Finally, SCNrank applies a scoring scheme called 'Target Influence score' (TI), which estimates a given target's influence towards the cluster it belongs to, for scoring and ranking each drug target.
Results: We applied SCNrank to analyze 263 expression profiles, CRPSPR-Cas9 data from 22 different pancreatic cancer cell-lines and the STRING protein-protein interaction (PPI) network. With SCNrank, we successfully constructed an integrated tissue PDAC network and an integrated cell-line PDAC network, both of which contain 4414 selected genes that are overexpressed in tumor tissue samples. After clustering, 4414 genes are distributed into 198 clusters, which include 367 targets of FDA approved drugs. These drug targets are all scored and ranked by their TI scores, which we defined to measure their influence towards the network. We validated top-ranked targets in three aspects: Firstly, mapping them onto the existing clinical drug targets of PDAC to measure the concordance. Secondly, we performed enrichment analysis to these drug targets and the clusters there are within, to reveal functional associations between clusters and PDAC; Thirdly, we performed survival analysis for the top-ranked targets to connect targets with clinical outcomes. Survival analysis reveals that overexpression of three top-ranked genes, PGK1, HMMR and POLE2, significantly increases the risk of death in PDAC patients. SCNrank is an unbiased algorithm that systematically integrates multiple types of omics data to do potential drug target selection and ranking. SCNrank shows great capability in predicting drug targets for PDAC. Pancreatic cancer-associated gene candidates predicted by our SCNrank approach have the potential to guide genetics-based anti-pancreatic drug discovery
Inductive Graph Neural Networks for Spatiotemporal Kriging
Time series forecasting and spatiotemporal kriging are the two most important
tasks in spatiotemporal data analysis. Recent research on graph neural networks
has made substantial progress in time series forecasting, while little
attention has been paid to the kriging problem -- recovering signals for
unsampled locations/sensors. Most existing scalable kriging methods (e.g.,
matrix/tensor completion) are transductive, and thus full retraining is
required when we have a new sensor to interpolate. In this paper, we develop an
Inductive Graph Neural Network Kriging (IGNNK) model to recover data for
unsampled sensors on a network/graph structure. To generalize the effect of
distance and reachability, we generate random subgraphs as samples and
reconstruct the corresponding adjacency matrix for each sample. By
reconstructing all signals on each sample subgraph, IGNNK can effectively learn
the spatial message passing mechanism. Empirical results on several real-world
spatiotemporal datasets demonstrate the effectiveness of our model. In
addition, we also find that the learned model can be successfully transferred
to the same type of kriging tasks on an unseen dataset. Our results show that:
1) GNN is an efficient and effective tool for spatial kriging; 2) inductive
GNNs can be trained using dynamic adjacency matrices; 3) a trained model can be
transferred to new graph structures and 4) IGNNK can be used to generate
virtual sensors.Comment: AAAI 202
Efficient Public Key Searchable Encryption Schemes from Standard Hard Lattice Problems for Cloud Computing
Cloud storage and computing offers significant convenience and management efficiency in the information era. Privacy protection is a major challenge in cloud computing. Public key encryption with keyword search (PEKS) is an ingenious tool for ensuring privacy and functionality in certain scenario, such as ensuring privacy for data retrieval appearing in the cloud computing. Despite many attentions received, PEKS schemes still face several challenges in practical applications, such as low computational efficiency, high end-to-end delay, vulnerability to inside keyword guessing attacks(IKGA) and key management defects in the multi-user environment.
In this work, we introduce three Ring-LWE/ISIS based PEKS schemes: (1) Our basic PEKS scheme achieves high level security in the standard model. (2) Our PAEKS scheme utilizes the sender\u27s private key to generate an authentication when encrypting, which can resist IKGA. (3) Our IB-PAEKS scheme not only can resist IKGA, but also significantly reduces the complexity of key management in practical applications. Experimental results indicate that the first scheme provides lower end-to-end delay and higher computational efficiency compared to similar ones, and that our last two schemes can provide more
secure properties with little additional overhead
The Genomes of Oryza sativa: A History of Duplications
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
Synthesis of one-dimensional GaN nanorods on Si(111) substrates by magnetron sputtering
GaN nanorods have been successfully synthesized on Si(111) substrates by
magnetron sputtering through ammoniating the Ga2O3/ZnO films at
950 °C in a quartz tube. The GaN nanorods have been characterized by
X-ray diffraction (XRD), scanning electron microscopy (SEM), field-emission
transmission electron microscope (FETEM), Fourier transform infrared (FTIR)
system and fluorescence spectrophotometer. The results show that the
nanorods are pure hexagonal GaN wurtzite structure with lengths of about
several micrometers and diameters ranging from 100 nm to 750 nm, and the
growth direction of GaN nanorods is perpendicular to (101) plane. The
photoluminescence (PL) spectrum indicates that the good emission property
for the nanorods. Finally, the growth mechanism is also briefly discussed
- …