73 research outputs found
Two-Dimensional Source Coding by Means of Subblock Enumeration
A technique of lossless compression via substring enumeration (CSE) attains
compression ratios as well as popular lossless compressors for one-dimensional
(1D) sources. The CSE utilizes a probabilistic model built from the circular
string of an input source for encoding the source.The CSE is applicable to
two-dimensional (2D) sources such as images by dealing with a line of pixels of
2D source as a symbol of an extended alphabet. At the initial step of the CSE
encoding process, we need to output the number of occurrences of all symbols of
the extended alphabet, so that the time complexity increase exponentially when
the size of source becomes large. To reduce the time complexity, we propose a
new CSE which can encode a 2D source in block-by-block instead of line-by-line.
The proposed CSE utilizes the flat torus of an input 2D source as a
probabilistic model for encoding the source instead of the circular string of
the source. Moreover, we analyze the limit of the average codeword length of
the proposed CSE for general sources.Comment: 5 pages, Submitted to ISIT201
A Universal Two-Dimensional Source Coding by Means of Subblock Enumeration
The technique of lossless compression via substring enumeration (CSE) is a kind of enumerative code and uses a probabilistic model built from the circular string of an input source for encoding a one-dimensional (1D) source. CSE is applicable to two-dimensional (2D) sources, such as images, by dealing with a line of pixels of a 2D source as a symbol of an extended alphabet. At the initial step of CSE encoding process, we need to output the number of occurrences of all symbols of the extended alphabet, so that the time complexity increases exponentially when the size of source becomes large. To reduce computational time, we can rearrange pixels of a 2D source into a 1D source string along a space-filling curve like a Hilbert curve. However, information on adjacent cells in a 2D source may be lost in the conversion. To reduce the time complexity and compress a 2D source without converting to a 1D source, we propose a new CSE which can encode a 2D source in a block-by-block fashion instead of in a line-by-line fashion. The proposed algorithm uses the flat torus of an input 2D source as a probabilistic model instead of the circular string of the source. Moreover, we prove the asymptotic optimality of the proposed algorithm for 2D general sources
Compression by Substring Enumeration Using Sorted Contingency Tables
This paper proposes two variants of improved Compression by Substring Enumeration (CSE) with a finite alphabet. In previous studies on CSE, an encoder utilizes inequalities which evaluate the number of occurrences of a substring or a minimal forbidden word (MFW) to be encoded. The inequalities are derived from a contingency table including the number of occurrences of a substring or an MFW. Moreover, codeword length of a substring and an MFW grows with the difference between the upper and lower bounds deduced from the inequalities, however the lower bound is not tight. Therefore, we derive a new tight lower bound based on the contingency table and consequently propose a new CSE algorithm using the new inequality. We also propose a new encoding order of substrings and MFWs based on a sorted contingency table such that both its row and column marginal total are sorted in descending order instead of a lexicographical order used in previous studies. We then propose a new CSE algorithm which is the first proposed CSE algorithm using the new encoding order. Experimental results show that compression ratios of all files of the Calgary corpus in the proposed algorithms are better than those of a previous study on CSE with a finite alphabet. Moreover, compression ratios under the second proposed CSE get better than or equal to that under a well-known compressor for 11 files amongst 14 files in the corpus
Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools
This dissertation focuses on two fundamental sorting problems: string sorting
and suffix sorting. The first part considers parallel string sorting on
shared-memory multi-core machines, the second part external memory suffix
sorting using the induced sorting principle, and the third part distributed
external memory suffix sorting with a new distributed algorithmic big data
framework named Thrill.Comment: 396 pages, dissertation, Karlsruher Instituts f\"ur Technologie
(2018). arXiv admin note: text overlap with arXiv:1101.3448 by other author
Advanced Threat Intelligence: Interpretation of Anomalous Behavior in Ubiquitous Kernel Processes
Targeted attacks on digital infrastructures are a rising threat against the confidentiality, integrity, and availability of both IT systems and sensitive data. With the emergence of advanced persistent threats (APTs), identifying and understanding such attacks has become an increasingly difficult task. Current signature-based systems are heavily reliant on fixed patterns that struggle with unknown or evasive applications, while behavior-based solutions usually leave most of the interpretative work to a human analyst.
This thesis presents a multi-stage system able to detect and classify anomalous behavior within a user session by observing and analyzing ubiquitous kernel processes. Application candidates suitable for monitoring are initially selected through an adapted sentiment mining process using a score based on the log likelihood ratio (LLR). For transparent anomaly detection within a corpus of associated events, the author utilizes star structures, a bipartite representation designed to approximate the edit distance between graphs. Templates describing nominal behavior are generated automatically and are used for the computation of both an anomaly score and a report containing all deviating events. The extracted anomalies are classified using the Random Forest (RF) and Support Vector Machine (SVM) algorithms. Ultimately, the newly labeled patterns are mapped to a dedicated APT attacker–defender model that considers objectives, actions, actors, as well as assets, thereby bridging the gap between attack indicators and detailed threat semantics. This enables both risk assessment and decision support
for mitigating targeted attacks.
Results show that the prototype system is capable of identifying 99.8% of all star structure anomalies as benign or malicious. In multi-class scenarios that seek to associate each anomaly with a distinct attack pattern belonging to a particular APT stage we achieve a solid accuracy of 95.7%. Furthermore, we demonstrate that 88.3% of observed attacks could be identified by analyzing and classifying a single ubiquitous Windows process for a mere 10 seconds, thereby eliminating the necessity to monitor each and every (unknown) application running on a system.
With its semantic take on threat detection and classification, the proposed system offers a formal as well as technical solution to an information security challenge of great significance.The financial support by the Christian Doppler Research Association, the Austrian Federal Ministry for Digital and Economic Affairs, and the National Foundation for Research, Technology and Development is gratefully acknowledged
Reference retrieval based on user induced dynamic clustering
PhD ThesisThe problem of mechanically retrieving references to
documents, as a first step to fulfilling the information
need of a researcher, is tackled through the design of an
interactive computer program. A view of reference retriev-
al is presented which embraces the browsing activity. In
fact, browsing is considered important and regarded as
ubiquitous. Thus, for successful retrieval (in many circum-
stances), a device which permits conversation is needed.
Approaches to automatic (delegated) retrieval are surveyed,
as are on-line systems which support interaction. This type
of interaction usually consists of iteration, under the
user's control, in the query formulation process.
A program has been constructed to tryout another
approach to man-machine dialogue in this field. The machine
builds a model of the user's interest, and chooses refer-
ences for display according to its current state. The model
is expressed in terms of the program's knowledge of the
network of references ans literature of the field, namely a
associated subject descriptors, authors and any other entity
of potential interest. The user need not formulate a query
- the model varies as a consequence of his reactions to
references shown to him. The model can be regarded as a
binary classification induced by the user's messages.
The program has been used experimentally with a small
collection of references and the structured vocabulary from
the kedlars system. A brief account of the program design
methodology is also given.Office for Scientific and Technical Information(OSTI
Cyber Security and Critical Infrastructures
This book contains the manuscripts that were accepted for publication in the MDPI Special Topic "Cyber Security and Critical Infrastructure" after a rigorous peer-review process. Authors from academia, government and industry contributed their innovative solutions, consistent with the interdisciplinary nature of cybersecurity. The book contains 16 articles: an editorial explaining current challenges, innovative solutions, real-world experiences including critical infrastructure, 15 original papers that present state-of-the-art innovative solutions to attacks on critical systems, and a review of cloud, edge computing, and fog's security and privacy issues
Computer Aided Verification
The open access two-volume set LNCS 11561 and 11562 constitutes the refereed proceedings of the 31st International Conference on Computer Aided Verification, CAV 2019, held in New York City, USA, in July 2019. The 52 full papers presented together with 13 tool papers and 2 case studies, were carefully reviewed and selected from 258 submissions. The papers were organized in the following topical sections: Part I: automata and timed systems; security and hyperproperties; synthesis; model checking; cyber-physical systems and machine learning; probabilistic systems, runtime techniques; dynamical, hybrid, and reactive systems; Part II: logics, decision procedures; and solvers; numerical programs; verification; distributed systems and networks; verification and invariants; and concurrency
- …