32,240 research outputs found
A Framework for Specifying and Monitoring User Tasks
Knowledge about user task execution can help systems better reason about when to interrupt users. To enable recognition and forecasting of task execution, we develop a novel framework for specifying and monitoring user task sequences. For task specification, our framework provides an XML-based language with tags inspired by regular expressions. For task monitoring, our framework provides an event handler that manages events from any instrumented application and a monitor that observes a user's transitions within and among specified tasks. The monitor supports multiple active tasks and multiple instances of the same task. The use of our framework will enable systems to consider a user's position within a task model when reasoning about when to interrupt
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Comparing P2PTV Traffic Classifiers
Peer-to-Peer IP Television (P2PTV) applications represent one of the fastest growing application classes on the Internet, both in terms of their popularity and in terms of the amount of traffic they generate. While network operators require monitoring tools that can effectively analyze the traffic produced by these systems, few techniques have been tested on these mostly closed-source, proprietary applications. In this paper we examine the properties of three traffic classifiers applied to the problem of identifying P2PTV traffic. We report on extensive experiments conducted on traffic traces with reliable ground truth information, highlighting the benefits and shortcomings of each approach. The results show that not only their performance in terms of accuracy can vary significantly, but also that their usability features suggest different effective aspects that can be integrate
PDF-Malware Detection: A Survey and Taxonomy of Current Techniques
Portable Document Format, more commonly known as PDF, has become, in the last 20 years, a standard for document exchange and dissemination due its portable nature and widespread adoption. The flexibility and power of this format are not only leveraged by benign users, but from hackers as well who have been working to exploit various types of vulnerabilities, overcome security restrictions, and then transform the PDF format in one among the leading malicious code spread vectors. Analyzing the content of malicious PDF files to extract the main features that characterize the malware identity and behavior, is a fundamental task for modern threat intelligence platforms that need to learn how to automatically identify new attacks. This paper surveys existing state of the art about systems for the detection of malicious PDF files and organizes them in a taxonomy that separately considers the used approaches and the data analyzed to detect the presence of malicious code. © Springer International Publishing AG, part of Springer Nature 2018
Sequence Mining and Pattern Analysis in Drilling Reports with Deep Natural Language Processing
Drilling activities in the oil and gas industry have been reported over
decades for thousands of wells on a daily basis, yet the analysis of this text
at large-scale for information retrieval, sequence mining, and pattern analysis
is very challenging. Drilling reports contain interpretations written by
drillers from noting measurements in downhole sensors and surface equipment,
and can be used for operation optimization and accident mitigation. In this
initial work, a methodology is proposed for automatic classification of
sentences written in drilling reports into three relevant labels (EVENT,
SYMPTOM and ACTION) for hundreds of wells in an actual field. Some of the main
challenges in the text corpus were overcome, which include the high frequency
of technical symbols, mistyping/abbreviation of technical terms, and the
presence of incomplete sentences in the drilling reports. We obtain
state-of-the-art classification accuracy within this technical language and
illustrate advanced queries enabled by the tool.Comment: 7 pages, 14 figures, technical repor
Quantitative Regular Expressions for Arrhythmia Detection Algorithms
Motivated by the problem of verifying the correctness of arrhythmia-detection
algorithms, we present a formalization of these algorithms in the language of
Quantitative Regular Expressions. QREs are a flexible formal language for
specifying complex numerical queries over data streams, with provable runtime
and memory consumption guarantees. The medical-device algorithms of interest
include peak detection (where a peak in a cardiac signal indicates a heartbeat)
and various discriminators, each of which uses a feature of the cardiac signal
to distinguish fatal from non-fatal arrhythmias. Expressing these algorithms'
desired output in current temporal logics, and implementing them via monitor
synthesis, is cumbersome, error-prone, computationally expensive, and sometimes
infeasible.
In contrast, we show that a range of peak detectors (in both the time and
wavelet domains) and various discriminators at the heart of today's
arrhythmia-detection devices are easily expressible in QREs. The fact that one
formalism (QREs) is used to describe the desired end-to-end operation of an
arrhythmia detector opens the way to formal analysis and rigorous testing of
these detectors' correctness and performance. Such analysis could alleviate the
regulatory burden on device developers when modifying their algorithms. The
performance of the peak-detection QREs is demonstrated by running them on real
patient data, on which they yield results on par with those provided by a
cardiologist.Comment: CMSB 2017: 15th Conference on Computational Methods for Systems
Biolog
Applications of Machine Learning to Threat Intelligence, Intrusion Detection and Malware
Artificial Intelligence (AI) and Machine Learning (ML) are emerging technologies with applications to many fields. This paper is a survey of use cases of ML for threat intelligence, intrusion detection, and malware analysis and detection. Threat intelligence, especially attack attribution, can benefit from the use of ML classification. False positives from rule-based intrusion detection systems can be reduced with the use of ML models. Malware analysis and classification can be made easier by developing ML frameworks to distill similarities between the malicious programs. Adversarial machine learning will also be discussed, because while ML can be used to solve problems or reduce analyst workload, it also introduces new attack surfaces
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
- âŠ