45 research outputs found

    DualTable: A Hybrid Storage Model for Update Optimization in Hive

    Full text link
    Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in the Hadoop ecosystem. It is successfully used in many Internet companies and shows its value for big data processing in traditional industries. However, enterprise big data processing systems as in Smart Grid applications usually require complicated business logics and involve many data manipulation operations like updates and deletes. Hive cannot offer sufficient support for these while preserving high query performance. Hive using the Hadoop Distributed File System (HDFS) for storage cannot implement data manipulation efficiently and Hive on HBase suffers from poor query performance even though it can support faster data manipulation.There is a project based on Hive issue Hive-5317 to support update operations, but it has not been finished in Hive's latest version. Since this ACID compliant extension adopts same data storage format on HDFS, the update performance problem is not solved. In this paper, we propose a hybrid storage model called DualTable, which combines the efficient streaming reads of HDFS and the random write capability of HBase. Hive on DualTable provides better data manipulation support and preserves query performance at the same time. Experiments on a TPC-H data set and on a real smart grid data set show that Hive on DualTable is up to 10 times faster than Hive when executing update and delete operations.Comment: accepted by industry session of ICDE201

    Regular Patterns for Proteome-Wide Distribution of Protein Abundance across Species

    Get PDF
    A proteome of the bio-entity, including cell, tissue, organ, and organism, consists of proteins of diverse abundance. The principle that determines the abundance of different proteins in a proteome is of fundamental significance for an understanding of the building blocks of the bio-entity. Here, we report three regular patterns in the proteome-wide distribution of protein abundance across species such as human, mouse, fly, worm, yeast, and bacteria: in most cases, protein abundance is positively correlated with the protein's origination time or sequence conservation during evolution; it is negatively correlated with the protein's domain number and positively correlated with domain coverage in protein structure, and the correlations became stronger during the course of evolution; protein abundance can be further stratified by the function of the protein, whereby proteins that act on material conversion and transportation (mass category) are more abundant than those that act on information modulation (information category). Thus, protein abundance is intrinsically related to the protein's inherent characters of evolution, structure, and function

    Stochastic dynamics of aircraft ground taxiing via improved physics-informed neural networks

    No full text
    In this paper, the stochastic propagation of the aircraft taxiing under the excitation of uneven runway is investigated based on physics-informed neural networks (PINNs). In particular, we successfully applied the PINNs with layer-wise locally adaptive activation functions (L-LAAF) and the learning rate decay strategy to address the challenging task of parameter identification for some aircraft systems. Specifically, the accuracy and effectiveness of the proposed method in solving the time-dependent Fokker-Planck equation for systems were first demonstrated. Subsequently, the proposed method is effectively utilized to identify the damping coefficient of landing gear and the aircraft body weight. Through numerical experiments and comparisons, we have demonstrated that incorporating L-LAAF and learning rate decay strategies can further enhance the performance of the network. The numerical simulation based on Monte Carlo fully validates the method. The development of physics-based deep learning techniques for aircraft system parameter identification research can help researchers better understand and control the behavior of systems, providing effective solutions for optimizing system design

    Proteomic overview of hepatocellular carcinoma cell lines and generation of the spectral library

    No full text
    Measurement(s) Proteome of hepatocellular carcinoma cell lines Technology Type(s) Liquid chromatography-tandem mass spectrometry Sample Characteristic - Organism Homo sapien

    Noninvasive urinary protein signatures combined clinical information associated with microvascular invasion risk in HCC patients

    No full text
    BACKGROUND: Microvascular invasion (MVI) is the main factor affecting the prognosis of patients with hepatocellular carcinoma (HCC). The aim of this study was to identify accurate diagnostic biomarkers from urinary protein signatures for preoperative prediction. METHODS: We conducted label-free quantitative proteomic studies on urine samples of 91 HCC patients and 22 healthy controls. We identified candidate biomarkers capable of predicting MVI status and combined them with patient clinical information to perform a preoperative nomogram for predicting MVI status in the training cohort. Then, the nomogram was validated in the testing cohort (n = 23). Expression levels of biomarkers were further confirmed by enzyme-linked immunosorbent assay (ELISA) in an independent validation HCC cohort (n = 57). RESULTS: Urinary proteomic features of healthy controls are mainly characterized by active metabolic processes. Cell adhesion and cell proliferation-related pathways were highly defined in the HCC group, such as extracellular matrix organization, cell–cell adhesion, and cell–cell junction organization, which confirms the malignant phenotype of HCC patients. Based on the expression levels of four proteins: CETP, HGFL, L1CAM, and LAIR2, combined with tumor diameter, serum AFP, and GGT concentrations to establish a preoperative MVI status prediction model for HCC patients. The nomogram achieved good concordance indexes of 0.809 and 0.783 in predicting MVI in the training and testing cohorts. CONCLUSIONS: The four-protein-related nomogram in urine samples is a promising preoperative prediction model for the MVI status of HCC patients. Using the model, the risk for an individual patient to harbor MVI can be determined

    RUPE-phospho: Rapid Ultrasound-Assisted Peptide-Identification-Enhanced Phosphoproteomics Workflow for Microscale Samples

    No full text
    Global phosphoproteome profiling can provide insights into cellular signaling and disease pathogenesis. To achieve comprehensive phosphoproteomic analyses with minute quantities of material, we developed a rapid and sensitive phosphoproteomics sample preparation strategy based on ultrasound. We found that ultrasonication-assisted digestion can significantly improve peptide identification by 20% due to the generation of longer peptides that can be detected by mass spectrometry. By integrating this rapid ultrasound-assisted peptide-identification-enhanced proteomic method (RUPE) with streamlined phosphopeptide enrichment steps, we established RUPE-phospho, a fast and efficient strategy to characterize protein phosphorylation in mass-limited samples. This approach dramatically reduces the sample loss and processing time: 24 samples can be processed in 3 h; 5325 phosphosites, 4549 phosphopeptides, and 1888 phosphoproteins were quantified from 5 μg of human embryonic kidney (HEK) 293T cell lysate. In addition, 9219 phosphosites were quantified from 1–2 mg of OCT-embedded mouse brain with 120 min streamlined RUPE-phospho workflow. RUPE-phospho facilitates phosphoproteome profiling for microscale samples and will provide a powerful tool for proteomics-driven precision medicine research

    Autophagy and biotransformation affect sorafenib resistance in hepatocellular carcinoma

    No full text
    As sorafenib is a first-line drug for treating advanced hepatocellular carcinoma, sorafenib resistance has historically attracted attention. However, most of this attention has been focused on a series of mechanisms related to drug resistance arising after sorafenib treatment. In this study, we used proteomic techniques to explore the potential mechanisms by which pretreatment factors affect sorafenib resistance. The degree of redundant pathway PI3K/AKT activation, biotransformation capacity, and autophagy level in hepatocellular carcinoma patients prior to sorafenib treatment might affect their sensitivity to sorafenib, in which ADH1A and STING1 are key molecules. These three factors could interact mechanistically to promote tumor cell survival, might be malignant features of tumor cells, and are associated with hepatocellular carcinoma prognosis. Our study suggests possible avenues of therapeutic intervention for patients with sorafenib-resistance and the potential application of immunotherapy with the aim of improving the survival of such patients

    Additional file 2: of The aspirin-induced long non-coding RNA OLA1P2 blocks phosphorylated STAT3 homodimer formation

    No full text
    All supplementary figures associated with lncRNA OLA1P2. Figure S1. Dysregulation of genes in primary cultured colon cancer cells transfected with shRNA-OLA1P2. Figure S2. OLA1P2 affected STAT3 targets expression. Figure S3. OLA1P2 affected the translocation of the phosphorylated STAT3 protein. Figure S4. OLA1P2 interacted directly with phosphorylated STAT3 (Tyr705). Figure S5. The transcriptional activity of the phosphorylated STAT3 (Tyr705) protein was affected by OLA1P2. Figure S6. OLA1P2 suppressed cancer cells proliferation and mediated the aspirin-induced anti-invasive phenotype. Figure S7. OLA1P2 mediated the aspirin-induced anti-metastatic phonotype. Figure S8. The expression levels of OLA1P2, FOXD3, and phosphorylated STAT3 (Tyr705) in clinical tumor tissues. Figure S9. Clinical pathological features correlation analysis. (PDF 10719 kb

    A Highly Efficient and Visualized Method for Glycan Enrichment by Self-Assembling Pyrene Derivative Functionalized Free Graphene Oxide

    No full text
    Protein glycosylation plays key roles in many biological processes, such as cell growth, differentiation, and cell–cell recognition. Therefore, global structure profiling of glycans is very important for investigating the biological significance and roles of glycans in disease occurrence and development. Mass spectrometry (MS) is currently the most powerful technique for structure analysis of oligosaccharides, but the limited availability of glycan/glycoproteins from natural sources restricts the wide adoption of this technique in large-scale glycan profiling. Though various enrichment methods have been developed, most methods relay on the weak physical affinity between glycans and adsorbents that yields insufficient enrichment efficiency. Furthermore, the lack of monitoring the extent/completeness of enrichment may lead to incomplete enrichment unless repeated sample loading and prolonged incubation are adopted, which limits sample handling throughput. Here, we report a rapid, highly efficient, and visualized approach for glycan enrichment using 1-pyrenebutyryl chloride functionalized free graphene oxide (PCGO). In this approach, glycan capturing is achieved by reversible covalent bond formation between the hydroxyl groups of glycans and the acyl chloride groups on graphene oxide (GO) introduced by π–π stacking of 1-pyrenebutyryl chloride on the GO surface. The multiple hydroxyl groups of glycans lead to cross-linking and self-assembly of free PCGO sheets into visible aggregation within 30 s, therefore achieving simple visual monitoring of the enrichment process. Improved enrichment efficiency is achieved by the large specific surface area of free PCGO and heavy functionalization of highly active 1-pyrenebutyryl chloride. Application of this method in enrichment of standard oligosaccharides or <i>N</i>-glycans released from glycoproteins results in remarkably increased MS signal intensity (approximately 50 times), S/N, and number of glycoform identified
    corecore