85 research outputs found

    On the Structure and Complexity of Rational Sets of Regular Languages

    Get PDF
    In a recent thread of papers, we have introduced FQL, a precise specification language for test coverage, and developed the test case generation engine FShell for ANSI C. In essence, an FQL test specification amounts to a set of regular languages, each of which has to be matched by at least one test execution. To describe such sets of regular languages, the FQL semantics uses an automata-theoretic concept known as rational sets of regular languages (RSRLs). RSRLs are automata whose alphabet consists of regular expressions. Thus, the language accepted by the automaton is a set of regular expressions. In this paper, we study RSRLs from a theoretic point of view. More specifically, we analyze RSRL closure properties under common set theoretic operations, and the complexity of membership checking, i.e., whether a regular language is an element of a RSRL. For all questions we investigate both the general case and the case of finite sets of regular languages. Although a few properties are left as open problems, the paper provides a systematic semantic foundation for the test specification language FQL

    Proactive Detection of Computer Worms Using Model Checking

    Get PDF
    Although recent estimates are speaking of 200,000 different viruses, worms, and Trojan horses, the majority of them are variants of previously existing malware. As these variants mostly differ in their binary representation rather than their functionality, they can be recognized by analyzing the program behavior, even though they are not covered by the signature databases of current antivirus tools. Proactive malware detectors mitigate this risk by detection procedures that use a single signature to detect whole classes of functionally related malware without signature updates. It is evident that the quality of proactive detection procedures depends on their ability to analyze the semantics of the binary. In this paper, we propose the use of model checkinga well-established software verification techniquefor proactive malware detection. We describe a tool that extracts an annotated control flow graph from the binary and automatically verifies it against a formal malware specification. To this end, we introduce the new specification language CTPL, which balances the high expressive power needed for malware signatures with efficient model checking algorithms. Our experiments demonstrate that our technique indeed is able to recognize variants of existing malware with a low risk of false positives. © 2006 IEEE

    Robust and Noise Resistant Wrapper Induction

    Get PDF
    Wrapper induction is the problem of automatically inferring a query from annotated web pages of the same template. This query should not only select the annotated content accurately but also other content following the same template. Beyond accurately matching the template, we consider two additional requirements: (1) wrappers should be robust against a large class of changes to the web pages, and (2) the induction process should be noise resistant, i.e., tolerate slightly erroneous (e.g., machine generated) samples. Key to our approach is a query language that is powerful enough to permit accurate selection, but limited enough to force noisy samples to be generalized into wrappers that select the likely intended items. We introduce such a language as subset of XPATH and show that even for such a restricted language, inducing optimal queries according to a suitable scoring is infeasible. Nevertheless, our wrapper induction framework infers highly robust and noise resistant queries. We evaluate the queries on snapshots from web pages that change over time as provided by the Internet Archive, and show that the induced queries are as robust as the human-made queries. The queries often survive hundreds sometimes thousands of days, with many changes to the relative position of the selected nodes (including changes on template level). This is due to the few and discriminative anchor (intermediately selected) nodes of the generated queries. The queries are highly resistant against positive noise (up to 50%) and negative noise (up to 20%)

    Proactive Detection of Computer Worms Using Model Checking

    Full text link

    PEACE-ful Web Event Extraction and Processing

    Get PDF
    Abstract. PEACE, our proposed tool, integrates complex event processing and web extraction into a unified framework to handle web event advertisements and to run a notification service atop. Its bitemporal schemata distinguish occurrence and detection time, enabling PEACE to deal with updates and delayed announcements, as often occurring on the web. To consolidate the arising event streams, PEACE combines simple events into complex ones. Depending on their occurrence and detection time, these complex events trigger actions to be executed. We demonstrate PEACE's capabilities with a business trip scenario, involving as raw events business trips, flight bookings, scheduled flights, and flight arrivals and departures. These events are scrapped from the web and combined into complex events, triggering actions to be executed, such as updating facebook status messages. Our demonstrator records and reruns event sequences at different speeds to show the system dealing with complex scenarios spanning several days

    Bitemporal Complex Event Processing of Web Event Advertisements

    Get PDF
    Abstract. The web is the largest bulletin board of the world. Events of all types, from flight arrivals to business meetings, are announced on this board. Tracking and reacting to such event announcements, however, is a tedious manual task, only slightly alleviated by email or similar notifications. Announcements are published with human readers in mind, and updates or delayed announcements are frequent. These characteristics have hampered attempts at automatic tracking. PEACE provides the first integrated framework for event processing on top of web event ads. Given a schema of events to be tracked, the framework populates this schema through compact wrappers for event announcement sources. These wrappers produce events including updates and retractions. PEACE then queries these events to detect complex events, often combining announcements from multiple sources. To deal with updates and delayed announcements, PEACE's schemas are bitemporal so as to distinguish between occurrence and detection time. This allows complex event specifications to track updates and to react to differences in occurrence and detection time. Our evaluation shows that extracting the event from an announcement dominates the processing of PEACE and that the complex event processor deals with several event announcement sources even with moderate resources. We further show, that simple restrictions on the complex event specifications suffice to guarantee that PEACE only requires a constant buffer to process arbitrarily many event announcements

    Closure properties and complexity of rational sets of regular languages

    Get PDF
    This work received funding in part by the National Research Network RiSE on Rigorous Systems Engineering (Austrian Science Fund (FWF): S11403-N23), by the Vienna Science and Technology Fund (WWTF) through grant PROSEED, by an Erwin Schrödinger Fellowship (Austrian Science Fund (FWF): J3696-N26), and by the European Research Council under the European Community's Seventh Framework Programme (FP7/2007–2013)/ERC grant agreement DIADEM no. 246858
    corecore