2,819 research outputs found
Efficient selection of globally optimal rules on large imbalanced data based on rule coverage relationship analysis
Copyright © SIAM. Rule-based anomaly and fraud detection systems often suffer from massive false alerts against a huge number of enterprise transactions. A crucial and challenging problem is to effectively select a globally optimal rule set which can capture very rare anomalies dispersed in large-scale background transactions. The existing rule selection methods which suffer significantly from complex rule interactions and overlapping in large imbalanced data, often lead to very high false positive rate. In this paper, we analyze the interactions and relationships between rules and their coverage on transactions, and propose a novel metric, Max Coverage Gain. Max Coverage Gain selects the optimal rule set by evaluating the contribution of each rule in terms of overall performance to cut out those locally significant but globally redundant rules, without any negative impact on the recall. An effective algorithm, MCGminer, is then designed with a series of built-in mechanisms and pruning strategies to handle complex rule interactions and reduce computational complexity towards identifying the globally optimal rule set. Substantial experiments on 13 UCI data sets and a real time online banking transactional database demonstrate that MCGminer achieves significant improvement on both accuracy, scalability, stability and efficiency on large imbalanced data compared to several state-of-the-art rule selection techniques
Robust textual data streams mining based on continuous transfer learning
Copyright © SIAM. In textual data stream environment, concept drift can occur at any time, existing approaches partitioning streams into chunks can have problem if the chunk boundary does not coincide with the change point which is impossible to predict. Since concept drift can occur at any point of the streams, it will certainly occur within chunks, which is called random concept drift. The paper proposed an approach, which is called chunk level-based concept drift method (CLCD), that can overcome this chunking problem by continuously monitoring chunk characteristics to revise the classifier based on transfer learning in positive and unlabeled (PU) textual data stream environment. Our proposed approach works in three steps. In the first step, we propose core vocabulary-based criteria to justify and identify random concept drift. In the second step, we put forward the extension of LELC (PU learning by extracting likely positive and negative microclusters)[ 1], called soft-LELC, to extract representative examples from unlabeled data, and assign a confidence score to each extracted example. The assigned confidence score represents the degree of belongingness of an example towards its corresponding class. In the third step, we set up a transfer learning-based SVM to build an accurate classifier for the chunks where concept drift is identified in the first step. Extensive experiments have shown that CLCD can capture random concept drift, and outperforms state-of-the-art methods in positive and unlabeled textual data stream environments
Cloning, expression, and functional analysis of human dopamine D1 receptors
Aim : To construct an HEK293 cell line stably expressing human dopamine D 1 receptor (D 1 R). Methods : cDNA was amplified by RT-PCR using total RNA from human embryo brain tissue as the template. The PCR products were subcloned into the plasmid pcDNA3 and cloned into the plasmid pcDNA3.1. The cloned D 1 R cDNA was sequenced and stably expressed in HEK293 cells. Expression of D 1 R in HEK293 cells was monitored by the [ 3 H]SCH23390 binding assay. The function of D 1 R was studied by the cAMP accumulation assay, CRE-SEAP reporter gene activity assay, and intracellular calcium assay. Results : An HEK293 cell line stably expressing human D 1 R was obtained. A saturation radioligand binding experiment with [ 3 H]SCH23390 demonstrated that the K d and B max values were 1.5±0.2 nmol/L and 2.94±0.15 nmol/g of protein, respectively. In the [ 3 H]SCH23390 competition assay, D 1 R agonist SKF38393 displaced [ 3 H]SCH23390 with an IC 50 value of 2.0 (1.5–2.8) Μmol/L. SKF38393 increased the intracellular cAMP level and CRE-SEAP activity through D 1 R expressed in HEK293 cells in a concentration-dependent manner with an EC 50 value of 0.25 (0.12–0.53) Μmol/L and 0.39 (0.27–0.57) Μmol/L at 6 h/0.59 (0.22–1.58) Μmol/L at 12 h, respectively. SKF38393 also increased the intracellular calcium level in a concentration-dependent manner with EC 50 value of 27 (8.6–70) nmol/L. Conclusion : An HEK293 cell line stably expressing human D 1 R was obtained successfuly. The study also demonstrated that the CRE-SEAP activity assay could be substituted for the cAMP accumulation assay for measuring increase in cAMP levels. Thus, both intracellular calcium measurements and the CRE-SEAP activity assay are suitable for high-throughput screening in drug research.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/75187/1/j.1745-7254.2005.00017.x.pd
Knowledge-preserving incremental social event detection via heterogeneous GNNs
Social events provide valuable insights into group social behaviors and public concerns and therefore have many applications in fields such as product recommendation and crisis management. The complexity and streaming nature of social messages make it appealing to address social event detection in an incremental learning setting, where acquiring, preserving, and extending knowledge are major concerns. Most existing methods, including those based on incremental clustering and community detection, learn limited amounts of knowledge as they ignore the rich semantics and structural information contained in social data. Moreover, they cannot memorize previously acquired knowledge. In this paper, we propose a novel Knowledge-Preserving Incremental Heterogeneous Graph Neural Network (KPGNN) for incremental social event detection. To acquire more knowledge, KPGNN models complex social messages into unified social graphs to facilitate data utilization and explores the expressive power of GNNs for knowledge extraction. To continuously adapt to the incoming data, KPGNN adopts contrastive loss terms that cope with a changing number of event classes. It also leverages the inductive learning ability of GNNs to efficiently detect events and extends its knowledge from previously unseen data. To deal with large social streams, KPGNN adopts a mini-batch subgraph sampling strategy for scalable training, and periodically removes obsolete data to maintain a dynamic embedding space. KPGNN requires no feature engineering and has few hyperparameters to tune. Extensive experiment results demonstrate the superiority of KPGNN over various baselines
Graph Learning based Recommender Systems: A Review
Recent years have witnessed the fast development of the emerging topic of Graph Learning based Recommender Systems (GLRS). GLRS employ advanced graph learning approaches to model users' preferences and intentions as well as items' characteristics for recommendations. Differently from other RS approaches, including content-based filtering and collaborative filtering, GLRS are built on graphs where the important objects, e.g., users, items, and attributes, are either explicitly or implicitly connected. With the rapid development of graph learning techniques, exploring and exploiting homogeneous or heterogeneous relations in graphs are a promising direction for building more effective RS. In this paper, we provide a systematic review of GLRS, by discussing how they extract important knowledge from graph-based representations to improve the accuracy, reliability and explainability of the recommendations. First, we characterize and formalize GLRS, and then summarize and categorize the key challenges and main progress in this novel research area
Effects of sample handling and storage on quantitative lipid analysis in human serum
There is sparse information about specific storage and handling protocols that minimize analytical error and variability in samples evaluated by targeted metabolomics. Variance components that affect quantitative lipid analysis in a set of human serum samples were determined. The effects of freeze-thaw, extraction state, storage temperature, and freeze-thaw prior to density-based lipoprotein fractionation were quantified. The quantification of high abundance metabolites, representing the biologically relevant lipid species in humans, was highly repeatable (with coefficients of variation as low as 0.01 and 0.02) and largely unaffected by 1–3 freeze-thaw cycles (with 0–8% of metabolites affected in each lipid class). Extraction state had effects on total lipid class amounts, including decreased diacylglycerol and increased phosphatidylethanolamine in thawed compared with frozen samples. The effects of storage temperature over 1 week were minimal, with 0–4% of metabolites affected by storage at 4°C, −20°C, or −80°C in most lipid classes, and 19% of metabolites in diacylglycerol affected by storage at −20°C. Freezing prior to lipoprotein fractionation by density ultracentrifugation decreased HDL free cholesterol by 37% and VLDL free fatty acid by 36%, and increased LDL cholesterol ester by 35% compared with fresh samples. These findings suggest that density-based fractionation should preferably be undertaken in fresh serum samples because up to 37% variability in HDL and LDL cholesterol could result from a single freeze-thaw cycle. Conversely, quantitative lipid analysis within unfractionated serum is minimally affected even with repeated freeze-thaw cycles
Design and testing of hydrophobic core/hydrophilic shell nano/micro particles for drug-eluting stent coating
In this study, we designed a novel drug-eluting coating for vascular implants consisting of a core coating of the anti-proliferative drug docetaxel (DTX) and a shell coating of the platelet glycoprotein IIb/IIIa receptor monoclonal antibody SZ-21. The core/shell structure was sprayed onto the surface of 316L stainless steel stents using a coaxial electrospray process with the aim of creating a coating that exhibited a differential release of the two drugs. The prepared stents displayed a uniform coating consisting of nano/micro particles. In vitro drug release experiments were performed, and we demonstrated that a biphasic mathematical model was capable of capturing the data, indicating that the release of the two drugs conformed to a diffusion-controlled release system. We demonstrated that our coating was capable of inhibiting the adhesion and activation of platelets, as well as the proliferation and migration of smooth muscle cells (SMCs), indicating its good biocompatibility and anti-proliferation qualities. In an in vivo porcine coronary artery model, the SZ-21/DTX drug-loaded hydrophobic core/hydrophilic shell particle coating stents were observed to promote re-endothelialization and inhibit neointimal hyperplasia. This core/shell particle-coated stent may serve as part of a new strategy for the differential release of different functional drugs to sequentially target thrombosis and in-stent restenosis during the vascular repair process and ensure rapid re-endothelialization in the field of cardiovascular disease
Plant-RRBS, a bisulfite and next-generation sequencing-based methylome profiling method enriching for coverage of cytosine positions
Background: Cytosine methylation in plant genomes is important for the regulation of gene transcription and transposon activity. Genome-wide methylomes are studied upon mutation of the DNA methyltransferases, adaptation to environmental stresses or during development. However, from basic biology to breeding programs, there is a need to monitor multiple samples to determine transgenerational methylation inheritance or differential cytosine methylation. Methylome data obtained by sodium hydrogen sulfite (bisulfite)-conversion and next-generation sequencing (NGS) provide genome- wide information on cytosine methylation. However, a profiling method that detects cytosine methylation state dispersed over the genome would allow high-throughput analysis of multiple plant samples with distinct epigenetic signatures. We use specific restriction endonucleases to enrich for cytosine coverage in a bisulfite and NGS-based profiling method, which was compared to whole-genome bisulfite sequencing of the same plant material.
Methods: We established an effective methylome profiling method in plants, termed plant-reduced representation bisulfite sequencing (plant-RRBS), using optimized double restriction endonuclease digestion, fragment end repair, adapter ligation, followed by bisulfite conversion, PCR amplification and NGS. We report a performant laboratory protocol and a straightforward bioinformatics data analysis pipeline for plant-RRBS, applicable for any reference-sequenced plant species.
Results: As a proof of concept, methylome profiling was performed using an Oryza sativa ssp. indica pure breeding line and a derived epigenetically altered line (epiline). Plant-RRBS detects methylation levels at tens of millions of cytosine positions deduced from bisulfite conversion in multiple samples. To evaluate the method, the coverage of cytosine positions, the intra-line similarity and the differential cytosine methylation levels between the pure breeding line and the epiline were determined. Plant-RRBS reproducibly covers commonly up to one fourth of the cytosine positions in the rice genome when using MspI-DpnII within a group of five biological replicates of a line. The method predominantly detects cytosine methylation in putative promoter regions and not-annotated regions in rice.
Conclusions: Plant-RRBS offers high-throughput and broad, genome- dispersed methylation detection by effective read number generation obtained from reproducibly covered genome fractions using optimized endonuclease combinations, facilitating comparative analyses of multi-sample studies for cytosine methylation and transgenerational stability in experimental material and plant breeding populations
- …