1,436 research outputs found
An optimized TOPS+ comparison method for enhanced TOPS models
This article has been made available through the Brunel Open Access Publishing Fund.Background
Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+.
Results
We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method.
Conclusions
Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Binary code analysis allows analyzing binary code without having access to
the corresponding source code. A binary, after disassembly, is expressed in an
assembly language. This inspires us to approach binary analysis by leveraging
ideas and techniques from Natural Language Processing (NLP), a rich area
focused on processing text of various natural languages. We notice that binary
code analysis and NLP share a lot of analogical topics, such as semantics
extraction, summarization, and classification. This work utilizes these ideas
to address two important code similarity comparison problems. (I) Given a pair
of basic blocks for different instruction set architectures (ISAs), determining
whether their semantics is similar or not; and (II) given a piece of code of
interest, determining if it is contained in another piece of assembly code for
a different ISA. The solutions to these two problems have many applications,
such as cross-architecture vulnerability discovery and code plagiarism
detection. We implement a prototype system INNEREYE and perform a comprehensive
evaluation. A comparison between our approach and existing approaches to
Problem I shows that our system outperforms them in terms of accuracy,
efficiency and scalability. And the case studies utilizing the system
demonstrate that our solution to Problem II is effective. Moreover, this
research showcases how to apply ideas and techniques from NLP to large-scale
binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium
201
The estimation of Human Capital in structural models with flexible specification
The present paper focuses on statistical models for estimating Human Capital (HC) at disaggregated level (worker, household, graduates). The more recent literature on HC as a latent variable states that HC can be reasonably considered a broader multi-dimensional non-observable construct, depending on several and interrelate causes, and indirectly measured by many observed indicators. In this perspective, latent variable models have been assuming a prominent role in the social science literature for the study of the interrelationships among phenomena. However, traditional estimation methods are prone to different limitations, as stringent distributional assumptions, improper solutions, and factor score indeterminacy for Covariance Structure Analysis and the lack of a global optimization procedure for the Partial Least Squares approach. To avoid these limitations, new approaches to structural equation modelling, based on Component Analysis, which estimates latent variables as exact linear combinations of observed variables minimizing a single criterion, were proposed in literature. However, these methods are limited to model particular types of relationship among sets of variables. In this paper, we propose a class of models in such a way that it enables to specify and fit a variety of relationships among latent variables and endogenous indicators. Specifically, we extend this new class of models to allow for covariate effects on the endogenous indicators. Finally, an application aimed to measure, in a realistic structural model, the causal impact of formal Human capital (HC), accumulated during Higher education, on the initial earnings for University of Milan (Italy) graduates is illustrated.
- …