40 research outputs found

    Extracting Protocol Format as State Machine via Controlled Static Loop Analysis

    Full text link
    Reverse engineering of protocol message formats is critical for many security applications. Mainstream techniques use dynamic analysis and inherit its low-coverage problem -- the inferred message formats only reflect the features of their inputs. To achieve high coverage, we choose to use static analysis to infer message formats from the implementation of protocol parsers. In this work, we focus on a class of extremely challenging protocols whose formats are described via constraint-enhanced regular expressions and parsed using finite-state machines. Such state machines are often implemented as complicated parsing loops, which are inherently difficult to analyze via conventional static analysis. Our new technique extracts a state machine by regarding each loop iteration as a state and the dependency between loop iterations as state transitions. To achieve high, i.e., path-sensitive, precision but avoid path explosion, the analysis is controlled to merge as many paths as possible based on carefully-designed rules. The evaluation results show that we can infer a state machine and, thus, the message formats, in five minutes with over 90% precision and recall, far better than state of the art. We also applied the state machines to enhance protocol fuzzers, which are improved by 20% to 230% in terms of coverage and detect ten more zero-days compared to baselines

    FoxO gene family evolution in vertebrates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Forkhead box, class O (FoxO) belongs to the large family of forkhead transcription factors that are characterized by a conserved forkhead box DNA-binding domain. To date, the FoxO group has four mammalian members: FoxO1, FoxO3a, FoxO4 and FoxO6, which are orthologs of DAF16, an insulin-responsive transcription factor involved in regulating longevity of worms and flies. The degree of homology between these four members is high, especially in the forkhead domain, which contains the DNA-binding interface. Yet, mouse FoxO knockouts have revealed that each FoxO gene has its unique role in the physiological process. Whether the functional divergences are primarily due to adaptive selection pressure or relaxed selective constraint remains an open question. As such, this study aims to address the evolutionary mode of FoxO, which may lead to the functional divergence.</p> <p>Results</p> <p>Sequence similarity searches have performed in genome and scaffold data to identify homologues of FoxO in vertebrates. Phylogenetic analysis was used to characterize the family evolutionary history by identifying two duplications early in vertebrate evolution. To determine the mode of evolution in vertebrates, we performed a rigorous statistical analysis with FoxO gene sequences, including relative rate ratio tests, branch-specific <it>d</it><sub><it>N</it></sub>/<it>d</it><sub><it>S </it></sub>ratio tests, site-specific <it>d</it><sub><it>N</it></sub>/<it>d</it><sub><it>S </it></sub>ratio tests, branch-site <it>d</it><sub><it>N</it></sub>/<it>d</it><sub><it>S </it></sub>ratio tests and clade level amino acid conservation/variation patterns analysis. Our results suggest that FoxO is constrained by strong purifying selection except four sites in FoxO6, which have undergone positive Darwinian selection. The functional divergence in this family is best explained by either relaxed purifying selection or positive selection.</p> <p>Conclusion</p> <p>We present a phylogeny describing the evolutionary history of the FoxO gene family and show that the genes have evolved through duplications followed by purifying selection except for four sites in FoxO6 fixed by positive selection lie mostly within the non-conserved optimal PKB motif in the C-terminal part. Relaxed selection may play important roles in the process of functional differentiation evolved through gene duplications as well.</p

    Nova+^+: Generative Language Models for Binaries

    Full text link
    Generative large language models (LLMs) pre-trained on code have shown impressive effectiveness in code generation, program repair, and document analysis. However, existing generative LLMs focus on source code and are not specialized for binaries. There are three main challenges for LLMs to model and learn binary code: hex-decimal values, complex global dependencies, and compiler optimization levels. To bring the benefit of LLMs to the binary domain, we develop Nova and Nova+^+, which are LLMs pre-trained on binary corpora. Nova is pre-trained with the standard language modeling task, showing significantly better capability on five benchmarks for three downstream tasks: binary code similarity detection (BCSD), binary code translation (BCT), and binary code recovery (BCR), over GPT-3.5 and other existing techniques. We build Nova+^+ to further boost Nova using two new pre-training tasks, i.e., optimization generation and optimization level prediction, which are designed to learn binary optimization and align equivalent binaries. Nova+^+ shows overall the best performance for all three downstream tasks on five benchmarks, demonstrating the contributions of the new pre-training tasks

    ReCGiP, a database of reproduction candidate genes in pigs based on bibliomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Reproduction in pigs is one of the most economically important traits. To improve the reproductive performances, numerous studies have focused on the identification of candidate genes. However, it is hard for one to read all literatures thoroughly to get information. So we have developed a database providing candidate genes for reproductive researches in pig by mining and processing existing biological literatures in human and pigs, named as ReCGiP.</p> <p>Description</p> <p>Based on text-mining and comparative genomics, ReCGiP presents diverse information of reproduction-relevant genes in human and pig. The genes were sorted by the degree of relevance with the reproduction topics and were visualized in a gene's co-occurrence network where two genes were connected if they were co-cited in a PubMed abstract. The 'hub' genes which had more 'neighbors' were thought to be have more important functions and could be identified by the user in their web browser. In addition, ReCGiP provided integrated GO annotation, OMIM and biological pathway information collected from the Internet. Both pig and human gene information can be found in the database, which is now available.</p> <p>Conclusions</p> <p>ReCGiP is a unique database providing information on reproduction related genes for pig. It can be used in the area of the molecular genetics, the genetic linkage map, and the breeding of the pig and other livestock. Moreover, it can be used as a reference for human reproduction research.</p

    Detecting Backdoors in Pre-trained Encoders

    Full text link
    Self-supervised learning in computer vision trains on unlabeled data, such as images or (image, text) pairs, to obtain an image encoder that learns high-quality embeddings for input data. Emerging backdoor attacks towards encoders expose crucial vulnerabilities of self-supervised learning, since downstream classifiers (even further trained on clean data) may inherit backdoor behaviors from encoders. Existing backdoor detection methods mainly focus on supervised learning settings and cannot handle pre-trained encoders especially when input labels are not available. In this paper, we propose DECREE, the first backdoor detection approach for pre-trained encoders, requiring neither classifier headers nor input labels. We evaluate DECREE on over 400 encoders trojaned under 3 paradigms. We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs. Our method consistently has a high detection accuracy even if we have only limited or no access to the pre-training dataset.Comment: Accepted at CVPR 2023. Code is available at https://github.com/GiantSeaweed/DECRE

    Regression-based approach for testing the association between multi-region haplotype configuration and complex trait

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is quite common that the genetic architecture of complex traits involves many genes and their interactions. Therefore, dealing with multiple unlinked genomic regions simultaneously is desirable.</p> <p>Results</p> <p>In this paper we develop a regression-based approach to assess the interactions of haplotypes that belong to different unlinked regions, and we use score statistics to test the null hypothesis of non-genetic association. Additionally, multiple marker combinations at each unlinked region are considered. The multiple tests are settled via the <it>minP </it>approach. The <it>P </it>value of the "best" multi-region multi-marker configuration is corrected via Monte-Carlo simulations. Through simulation studies, we assess the performance of the proposed approach and demonstrate its validity and power in testing for haplotype interaction association.</p> <p>Conclusion</p> <p>Our simulations showed that, for binary trait without covariates, our proposed methods prove to be equal and even more powerful than htr and hapcc which are part of the FAMHAP program. Additionally, our model can be applied to a wider variety of traits and allow adjustment for other covariates. To test the validity, our methods are applied to analyze the association between four unlinked candidate genes and pig meat quality.</p

    Assessment of Autozygosity Derived From Runs of Homozygosity in Jinhua Pigs Disclosed by Sequencing Data

    Get PDF
    Jinhua pig, a well-known Chinese indigenous breed, has evolved as a pig breed with excellent meat quality, greater disease resistance, and higher prolificacy. The reduction in the number of Jinhua pigs over the past years has raised concerns about inbreeding. Runs of homozygosity (ROH) along the genome have been applied to quantify individual autozygosity to improve the understanding of inbreeding depression and identify genes associated with traits of interest. Here, we investigated the occurrence and distribution of ROH using next-generation sequencing data to characterize autozygosity in 202 Jinhua pigs, as well as to identify the genomic regions with high ROH frequencies within individuals. The average inbreeding coefficient, based on ROH longer than 1 Mb, was 0.168 ± 0.052. In total, 18,690 ROH were identified in all individuals, among which shorter segments (1–5 Mb) predominated. Individual ROH autosome coverage ranged from 5.32 to 29.14% in the Jinhua population. On average, approximately 16.8% of the whole genome was covered by ROH segments, with the lowest coverage on SSC11 and the highest coverage on SSC17. A total of 824 SNPs (about 0.5%) and 11 ROH island regions were identified (occurring in over 45% of the samples). Genes associated with reproduction (HOXA3, HOXA7, HOXA10, and HOXA11), meat quality (MYOD1, LPIN3, and CTNNBL1), appetite (NUCB2) and disease resistance traits (MUC4, MUC13, MUC20, LMLN, ITGB5, HEG1, SLC12A8, and MYLK) were identified in ROH islands. Moreover, several quantitative trait loci for ham weight and ham fat thickness were detected. Genes in ROH islands suggested, at least partially, a selection for economic traits and environmental adaptation, and should be subject of future investigation. These findings contribute to the understanding of the effects of environmental and artificial selection in shaping the distribution of functional variants in the pig genome

    Large-Scale Qualitative and Quantitative Assessment of Dityrosine Crosslinking Omics in Response to Endogenous and Exogenous Hydrogen Peroxide in <i>Escherichia coli</i>

    No full text
    Excessive hydrogen peroxide causes oxidative stress in cells. The oxidation of two tyrosine residues in proteins can generate o,o′-dityrosine, a putative biomarker for protein oxidation, which plays critical roles in a variety of organisms. Thus far, few studies have investigated dityrosine crosslinking under endogenous or exogenous oxidative conditions at the proteome level, and its physiological function remains largely unknown. In this study, to investigate qualitative and quantitative dityrosine crosslinking, two mutant Escherichia coli strains and one mutant strain supplemented with H2O2 were used as models for endogenous and exogenous oxidative stress, respectively. By integrating high-resolution liquid chromatography—mass spectrometry and bioinformatic analysis, we created the largest dityrosine crosslinking dataset in E. coli to date, identifying 71 dityrosine crosslinks and 410 dityrosine loop links on 352 proteins. The dityrosine-linked proteins are mainly involved in taurine and hypotaurine metabolism, citrate cycle, glyoxylate, dicarboxylate metabolism, carbon metabolism, etc., suggesting that dityrosine crosslinking may play a critical role in regulating the metabolic pathways in response to oxidative stress. In conclusion, we have reported the most comprehensive dityrosine crosslinking in E. coli for the first time, which is of great significance in revealing its function in oxidative stress

    Interlayer Difference of Bilayer-Stacked MoS<sub>2</sub> Structure: Probing by Photoluminescence and Raman Spectroscopy

    No full text
    This work reports the interlayer difference of exciton and phonon performance between the top and bottom layer of a bilayer-stacked two-dimensional materials structure (BSS). Through photoluminescence (PL) and Raman spectroscopy, we find that, compared to that of the bottom layer, the top layer of BSS demonstrates PL redshift, Raman E 2 g 1 mode redshift, and lower PL intensity. Spatial inhomogeneity of PL and Raman are also observed in the BSS. Based on theoretical analysis, these exotic effects can be attributed to substrate-coupling-induced strain and doping. Our findings provide pertinent insight into film&#8722;substrate interaction, and are of great significance to researches on bilayer-stacked structures including twisted bilayer structure, Van der Waals hetero- and homo-structure

    The Molecular Evolutionary Patterns of the Insulin/FOXO Signaling Pathway

    Get PDF
    The insulin/insulin growth factor-1 (IGF1)/FOXO (IIF) signal transduction pathway plays a core role in the endocrine system. Although the components of this pathway have been well characterized, the evolutionary pattern remains poorly understood. Here, we perform a comprehensive analysis to study whether the differences of signaling transduction elements exist as well as to determine whether the genes are subject to equivalent evolutionary forces and how natural selection shapes the evolution pattern of proteins in an interacting system. Our results demonstrate that most IIF pathway components are present throughout all animal phyla investigated here, and they are under strong selective constraint. Remarkably, we detect that the components in the middle of the pathway undergo stronger purifying selection, which is different from previous similar reports. We also find that the d N /d S may be influenced by quite complicated factors including codon bias, protein length among others
    corecore