41,287 research outputs found
Oii-web: An interactive online programming contest training system
In this paper we report our experience, related to the online training for the
Italian and International Olympiads in Informatics. We developed an interactive online
system, based on CMS, the grading system used in several major programming contests
including the International Olympiads in Informatics (IOI), and used it in three distinct
context: training students for the Italian Olympiads in Informatics (OII), training teachers
in order to be able to assist students for the OII, and training the Italian team for the
IOI. The system, that is freely available, proved to be a game changer for the whole italian
olympiads in informatics ecosystem: in one year, we almost doubled the participation to
OII, from 13k to 21k secondary school students.
The system is developed basing on the Contest Management System (CMS, http://cms-
dev.github.io/), so it is highly available to extensions supporting, for instance, the pro-
duction of feedback on problems solutions submitted by trainees. The system is also freely
available, with the idea of allowing for support to alternative necessities and developmen
DHLP 1&2: Giraph based distributed label propagation algorithms on heterogeneous drug-related networks
Background and Objective: Heterogeneous complex networks are large graphs
consisting of different types of nodes and edges. The knowledge extraction from
these networks is complicated. Moreover, the scale of these networks is
steadily increasing. Thus, scalable methods are required. Methods: In this
paper, two distributed label propagation algorithms for heterogeneous networks,
namely DHLP-1 and DHLP-2 have been introduced. Biological networks are one type
of the heterogeneous complex networks. As a case study, we have measured the
efficiency of our proposed DHLP-1 and DHLP-2 algorithms on a biological network
consisting of drugs, diseases, and targets. The subject we have studied in this
network is drug repositioning but our algorithms can be used as general methods
for heterogeneous networks other than the biological network. Results: We
compared the proposed algorithms with similar non-distributed versions of them
namely MINProp and Heter-LP. The experiments revealed the good performance of
the algorithms in terms of running time and accuracy.Comment: Source code available for Apache Giraph on Hadoo
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Binary code analysis allows analyzing binary code without having access to
the corresponding source code. A binary, after disassembly, is expressed in an
assembly language. This inspires us to approach binary analysis by leveraging
ideas and techniques from Natural Language Processing (NLP), a rich area
focused on processing text of various natural languages. We notice that binary
code analysis and NLP share a lot of analogical topics, such as semantics
extraction, summarization, and classification. This work utilizes these ideas
to address two important code similarity comparison problems. (I) Given a pair
of basic blocks for different instruction set architectures (ISAs), determining
whether their semantics is similar or not; and (II) given a piece of code of
interest, determining if it is contained in another piece of assembly code for
a different ISA. The solutions to these two problems have many applications,
such as cross-architecture vulnerability discovery and code plagiarism
detection. We implement a prototype system INNEREYE and perform a comprehensive
evaluation. A comparison between our approach and existing approaches to
Problem I shows that our system outperforms them in terms of accuracy,
efficiency and scalability. And the case studies utilizing the system
demonstrate that our solution to Problem II is effective. Moreover, this
research showcases how to apply ideas and techniques from NLP to large-scale
binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium
201
Identifying Web Tables - Supporting a Neglected Type of Content on the Web
The abundance of the data in the Internet facilitates the improvement of
extraction and processing tools. The trend in the open data publishing
encourages the adoption of structured formats like CSV and RDF. However, there
is still a plethora of unstructured data on the Web which we assume contain
semantics. For this reason, we propose an approach to derive semantics from web
tables which are still the most popular publishing tool on the Web. The paper
also discusses methods and services of unstructured data extraction and
processing as well as machine learning techniques to enhance such a workflow.
The eventual result is a framework to process, publish and visualize linked
open data. The software enables tables extraction from various open data
sources in the HTML format and an automatic export to the RDF format making the
data linked. The paper also gives the evaluation of machine learning techniques
in conjunction with string similarity functions to be applied in a tables
recognition task.Comment: 9 pages, 4 figure
- …