490 research outputs found
Control energy of complex networks towards distinct mixture states
Controlling complex networked systems is a real-world puzzle that remains largely unsolved. Despite recent progress in understanding the structural characteristics of network control energy, target state and system dynamics have not been explored. We examine how varying the final state mixture affects the control energy of canonical and conformity-incorporated dynamical systems. We find that the control energy required to drive a network to an identical final state is lower than that required to arrive a non-identical final state. We also demonstrate that it is easier to achieve full control in a conformity-based dynamical network. Finally we determine the optimal control strategy in terms of the network hierarchical structure. Our work offers a realistic understanding of the control energy within the final state mixture and sheds light on controlling complex systems.This work was funded by The National Natural Science Foundation of China (Grant Nos. 61763013, 61703159, 61403421), The Natural Science Foundation of Jiangxi Province (No. 20171BAB212017), The Measurement and Control of Aircraft at Sea Laboratory (No. FOM2016OF010), and China Scholarship Council (201708360048). The Boston University Center for Polymer Studies is supported by NSF Grants PHY-1505000, CMMI-1125290, and CHE-1213217, and by DTRA Grant HDTRA1-14-1-0017. (61763013 - National Natural Science Foundation of China; 61703159 - National Natural Science Foundation of China; 61403421 - National Natural Science Foundation of China; 20171BAB212017 - Natural Science Foundation of Jiangxi Province; FOM2016OF010 - Measurement and Control of Aircraft at Sea Laboratory; 201708360048 - China Scholarship Council; PHY-1505000 - NSF; CMMI-1125290 - NSF; CHE-1213217 - NSF; HDTRA1-14-1-0017 - DTRA)Published versio
Refining Decompiled C Code with Large Language Models
A C decompiler converts an executable into source code. The recovered C
source code, once re-compiled, is expected to produce an executable with the
same functionality as the original executable. With over twenty years of
development, C decompilers have been widely used in production to support
reverse engineering applications. Despite the prosperous development of C
decompilers, it is widely acknowledged that decompiler outputs are mainly used
for human consumption, and are not suitable for automatic recompilation. Often,
a substantial amount of manual effort is required to fix the decompiler outputs
before they can be recompiled and executed properly.
This paper is motived by the recent success of large language models (LLMs)
in comprehending dense corpus of natural language. To alleviate the tedious,
costly and often error-prone manual effort in fixing decompiler outputs, we
investigate the feasibility of using LLMs to augment decompiler outputs, thus
delivering recompilable decompilation. Note that different from previous
efforts that focus on augmenting decompiler outputs with higher readability
(e.g., recovering type/variable names), we focus on augmenting decompiler
outputs with recompilability, meaning to generate code that can be recompiled
into an executable with the same functionality as the original executable.
We conduct a pilot study to characterize the obstacles in recompiling the
outputs of the de facto commercial C decompiler -- IDA-Pro. We then propose a
two-step, hybrid approach to augmenting decompiler outputs with LLMs. We
evaluate our approach on a set of popular C test cases, and show that our
approach can deliver a high recompilation success rate to over 75% with
moderate effort, whereas none of the IDA-Pro's original outputs can be
recompiled. We conclude with a discussion on the limitations of our approach
and promising future research directions
Comparing One with Many -- Solving Binary2source Function Matching Under Function Inlining
Binary2source function matching is a fundamental task for many security
applications, including Software Component Analysis (SCA). The "1-to-1"
mechanism has been applied in existing binary2source matching works, in which
one binary function is matched against one source function. However, we
discovered that such mapping could be "1-to-n" (one query binary function maps
multiple source functions), due to the existence of function inlining.
To help conduct binary2source function matching under function inlining, we
propose a method named O2NMatcher to generate Source Function Sets (SFSs) as
the matching target for binary functions with inlining. We first propose a
model named ECOCCJ48 for inlined call site prediction. To train this model, we
leverage the compilable OSS to generate a dataset with labeled call sites
(inlined or not), extract several features from the call sites, and design a
compiler-opt-based multi-label classifier by inspecting the inlining
correlations between different compilations. Then, we use this model to predict
the labels of call sites in the uncompilable OSS projects without compilation
and obtain the labeled function call graphs of these projects. Next, we regard
the construction of SFSs as a sub-tree generation problem and design root node
selection and edge extension rules to construct SFSs automatically. Finally,
these SFSs will be added to the corpus of source functions and compared with
binary functions with inlining. We conduct several experiments to evaluate the
effectiveness of O2NMatcher and results show our method increases the
performance of existing works by 6% and exceeds all the state-of-the-art works
A robust QTL mapping procedure
In quantitative-trait linkage studies using experimental crosses, the conventional normal location-shift model or other parameterizations may be unnecessarily restrictive. We generalize the mapping problem to a genuine nonparametric setup and provide a robust estimation procedure for the situation where the underlying phenotype distributions are completely unspecified. Classical Wilcoxon-Mann-Whitney statistics are employed for point and interval estimation of QTL positions and effects
Conservation and variation in mitochondrial genomes of gastropods Oncomelania hupensis and Tricula hortensis, intermediate host snails of Schistosoma in China
The complete mitochondrial genomes of intermediate host snails for Schistosoma in China were sequenced, including the sub-species Oncomelania hupensis hupensis in two types, and O. hupensis robertsoni, intermediate hosts for 5, japonicum, and Tricula hortensis, the intermediate host of S. sinensium. Four genomes have completely the same gene order as in other caenogastropods, containing 13 protein-coding genes and 22 transfer RNAs. The gene size, start codon and termination codon are mostly the same for all protein-coding genes. However, pairwise sequence alignments revealed quite different degrees of variation. The ribbed-shelled O. hupensis hupensis and the smooth-shelled but with varix O. hupensis hupensis had a lower level of genetic distance (3.1% for protein-coding genes), but the coden usages differed obviously in the mitochondrial genomes of these two types of snails, implying that their genetic difference may be larger than previously recognized. The mean genetic distance between O. hupensis hupensis and O. hupensis robertsoni was 12% for protein-coding genes, indicating a higher degree of genetic difference. In consideration of the difference in morphology and distribution, we considered that O. hupensis hupensis and O. hupensis robertsoni can be considered as separate species. The ribbed-shelled O. hupensis hupensis and smooth-shelled O. hupensis robertsoni were phylogenetically clustered together within a same clade, which was then clustered with T. hortensis, confirming their close relationship. However, species or sub-species in the Oncomelania from southeastern Asian countries should be included in future study in order to resolve the phylogenetic relationship and origination of all snails in the genus. (C) 2010 Elsevier Inc. All rights reserved
- …