213 research outputs found

    Documents as functions

    Get PDF
    Treating variable data documents as functions over their data bindings opens opportunities for building more powerful, robust and flexible document architectures to meet the needs arising from the confluence of developments in document engineering, digital printing technologies and marketing analysis. This thesis describes a combination of several XML-based technologies both to represent and to process variable documents and their data, leading to extensible, high-quality and 'higher-order' document generation solutions. The architecture (DDF) uses XML uniformly throughout the documents and their processing tools with interspersing of different semantic spaces being achieved through namespacing. An XML-based functional programming language (XSLT) is used to describe all intra-document variability and for implementing most of the tools. Document layout intent is declared within a document as a hierarchical set of combinators attached to a tree-based graphical presentation. Evaluation of a document bound to an instance of data involves using a compiler to create an executable from the document, running this with the data instance as argument to create a new document with layout intent described, followed by resolution of that layout by an extensible layout processor. The use of these technologies, with design paradigms and coding protocols, makes it possible to construct documents that not only have high flexibility and quality, but also perform in higher-order ways. A document can be partially bound to data and evaluated, modifying its presentation and still remaining variably responsive to future data. Layout intent can be re-satisfied as presentation trees are modified by programmatic sections embedded within them. The key enablers are described and illustrated through example

    A Survey of Paraphrasing and Textual Entailment Methods

    Full text link
    Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

    Leveraging Semantic Annotations for Event-focused Search & Summarization

    Get PDF
    Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: • We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. • We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. • To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.Im heutigen Big Data Zeitalters existieren überwältigende Mengen an Textinformationen, die über mehrere Quellen verteilt sind und ein hohes Maß an Redundanz haben. Durch diese Gegebenheiten ist eine Retroperspektive auf vergangene Ereignisse für Konsumenten nur schwer möglich. Eine plausible Lösung ist die Verknüpfung semantisch ähnlicher, aber über mehrere Quellen verteilter Informationen, um dadurch eine Struktur zu erzwingen, die mehrere Zugriffspfade auf relevante Informationen, bietet. Vor diesem Hintergrund benutzt diese Dissertation Wikipedia und Onlinenachrichten als zwei prominente, aber dennoch grundverschiedene Informationsquellen, um die folgenden drei Probleme anzusprechen: • Wir adressieren ein Verknüpfungsproblem, um Wikipedia-Auszüge mit Nachrichtenartikeln zu verbinden und das Problem in eine Information-Retrieval-Aufgabe umzuwandeln. Unser neuartiger Ansatz integriert Zeit- und Geobezüge sowie Entitäten mit Text, um relevante Dokumente, die mit einem gegebenen Auszug verknüpft werden können, zu identifizieren. • Wir befassen uns mit einer unüberwachten Extraktionsmethode zur automatischen Zusammenfassung von Texten aus mehreren Dokumenten um Ereigniszusammenfassungen mit fester Länge zu generieren, was eine effiziente Aufnahme von Informationen aus großen Dokumentenmassen ermöglicht. Unser neuartiger Ansatz schlägt eine ganzzahlige lineare Optimierungslösung vor, die globale Inferenzen über Text, Zeit, Geolokationen und mit Ereignis-verbundenen Entitäten zieht. • Um den zeitlichen Fokus kurzer Ereignisbeschreibungen abzuschätzen, stellen wir einen semi-überwachten Ansatz vor, der die Redundanz innerhalb einer langzeitigen Dokumentensammlung ausnutzt, um genaue probabilistische Zeitmodelle abzuschätzen. Umfangreiche experimentelle Auswertungen zeigen die Wirksamkeit und Tragfähigkeit unserer vorgeschlagenen Ansätze zur Erreichung des größeren Ziels

    Merging remotely sensed data with geophysical models

    Get PDF
    Thesis (Ph.D.) University of Alaska Fairbanks, 1996Geophysical models are usually derived from the idealistic viewpoint that all required external parameters are, in principle, measurable. The models are then driven with the best available data for those parameters. In some cases, there are few measurements available, because of factors such as the location of the phenomena modeled. Satellite imagery provides a synoptic overview of a particular environment, supplying spatial and temporal variability as well as spectral data, making this an ideal source of data for some models. In other cases, although frequent satellite image observations are available, they are of little use to the modeler, because they do not provide values for the parameters demanded by the model. This thesis contains two examples of geophysical models that were derived expressly to utilize measurements and qualitative observations taken from satellite images as the major driving elements of the model. The methodology consists of designing a model such that it can be 'run' by numerical data extracted from image data sets, and using the image data for verification of the model or adjustment of parameters. The first example is a thermodynamic model of springtime removal of nearshore ice from an Arctic river delta area, using the Mackenzie River as a study site. In this example, a multi-date sequence of AVHRR images is used to provide the spatial and temporal patterns of melt, allowing the required physical observations in the model to be parameterized and tested. The second example is a dynamic model simulating the evolution of a volcanic ash cloud under the influence of atmospheric winds. In this case, AVHRR images are used to determine the position and size of the ash cloud as a function of time, allowing tuning of parameters and verification of the model

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse

    Execution level Java software and hardware for the NPS autonomous underwater vehicle

    Get PDF
    Autonomous underwater vehicles (AUVs) have a great potential use for the United States Marine Corps and United States Navy. When performing amphibious operations, underwater mines present a danger for the forces going ashore. The use of underwater vehicles for the detection of this mines and signaling to the Amphibious Ready Group is very attractive. With advancements in hardware and object oriented language technology, more complicated and robust software can be developed. The Naval Postgraduate School Center for AUV Research has been designing, building, operating, and researching AUVs since 1987. Each generation of vehicles has provided substantially increased in operational capabilities and level of sophistication in the hardware and software respectively. With the advancement in real-time computer languages support, object oriented technology, and cost efficient and high performance hardware, this thesis lays the foundations to develop a software system for the execution level using the Java language. We look into the Java Real-Time specifications and extension to familiarize with the capabilities of Java for realtime support, and study Java boards and its application for embedded real-time systems. We developed an object-oriented design for the execution level control software and implemented the design in Java. A testing phase is still under work.http://archive.org/details/executionlevelja109455509Captain, United States Marine CorpsApproved for public release; distribution is unlimited

    Automated Digital Forensic Triage: Rapid Detection of Anti-Forensic Tools

    Get PDF
    We live in the information age. Our world is interconnected by digital devices and electronic communication. As such, criminals are finding opportunities to exploit our information rich electronic data. In 2014, the estimated annual cost from computer-related crime was more than 800 billion dollars. Examples include the theft of intellectual property, electronic fraud, identity theft and the distribution of illicit material. Digital forensics grew out of necessity to combat computer crime and involves the investigation and analysis of electronic data after a suspected criminal act. Challenges in digital forensics exist due to constant changes in technology. Investigation challenges include exponential growth in the number of cases and the size of targets; for example, forensic practitioners must analyse multi-terabyte cases comprised of numerous digital devices. A variety of applied challenges also exist, due to continual technological advancements; for example, anti-forensic tools, including the malicious use of encryption or data wiping tools, hinder digital investigations by hiding or removing the availability of evidence. In response, the objective of the research reported here was to automate the effective and efficient detection of anti-forensic tools. A design science research methodology was selected as it provides an applied research method to design, implement and evaluate an innovative Information Technology (IT) artifact to solve a specified problem. The research objective require that a system be designed and implemented to perform automated detection of digital artifacts (e.g., data files and Windows Registry entries) on a target data set. The goal of the system is to automatically determine if an anti-forensic tool is present, or absent, in order to prioritise additional in-depth investigation. The system performs rapid forensic triage, suitable for execution against multiple investigation targets, providing an analyst with high-level information regarding potential malicious anti-forensic tool usage. The system is divided into two main stages: 1) Design and implementation of a solution to automate creation of an application profile (application software reference set) of known unique digital artifacts; and 2) Digital artifact matching between the created reference set and a target data set. Two tools were designed and implemented: 1) A live differential analysis tool, named LiveDiff, to reverse engineer application software with a specific emphasis on digital forensic requirements; 2) A digital artifact matching framework, named Vestigium, to correlate digital artifact metadata and detect anti-forensic tool presence. In addition, a forensic data abstraction, named Application Profile XML (APXML), was designed to store and distribute digital artifact metadata. An associated Application Programming Interface (API), named apxml.py, was authored to provide automated processing of APXML documents. Together, the tools provided an automated triage system to detect anti-forensic tool presence on an investigation target. A two-phase approach was employed in order to assess the research products. The first phase of experimental testing involved demonstration in a controlled laboratory environment. First, the LiveDiff tool was used to create application profiles for three anti-forensic tools. The automated data collection and comparison procedure was more effective and efficient than previous approaches. Two data reduction techniques were tested to remove irrelevant operating system noise: application profile intersection and dynamic blacklisting were found to be effective in this regard. Second, the profiles were used as input to Vestigium and automated digital artifact matching was performed against authored known data sets. The results established the desired system functionality and demonstration then led to refinements of the system, as per the cyclical nature of design science. The second phase of experimental testing involved evaluation using two additional data sets to establish effectiveness and efficiency in a real-world investigation scenario. First, a public data set was subjected to testing to provide research reproducibility, as well as to evaluate system effectiveness in a variety of complex detection scenarios. Results showed the ability to detect anti-forensic tools using a different version than that included in the application profile and on a different Windows operating system version. Both are scenarios where traditional hash set analysis fails. Furthermore, Vestigium was able to detect residual and deleted information, even after a tool had been uninstalled by the user. The efficiency of the system was determined and refinements made, resulting in an implementation that can meet forensic triage requirements. Second, a real-world data set was constructed using a collection of second-hand hard drives. The goal was to test the system using unpredictable and diverse data to provide more robust findings in an uncontrolled environment. The system detected one anti-forensic tool on the data set and processed all input data successfully without error, further validating system design and implementation. The key outcome of this research is the design and implementation of an automated system to detect anti-forensic tool presence on a target data set. Evaluation suggested the solution was both effective and efficient, adhering to forensic triage requirements. Furthermore, techniques not previously utilised in forensic analysis were designed and applied throughout the research: dynamic blacklisting and profile intersection removed irrelevant operating system noise from application profiles; metadata matching methods resulted in efficient digital artifact detection and path normalisation aided full path correlation in complex matching scenarios. The system was subjected to rigorous experimental testing on three data sets that comprised more than 10 terabytes of data. The ultimate outcome is a practically implemented solution that has been executed on hundreds of forensic disk images, thousands of Windows Registry hives, more than 10 million data files, and approximately 50 million Registry entries. The research has resulted in the design of a scalable triage system implemented as a set of computer forensic tools

    The IP Law Book Review, vol. 2 #1, September 2011

    Get PDF
    Reviews and Reviewers: MAKING AND UNMAKING INTELLECTUAL PROPERTY: CREATIVE PRODUCTION IN LEGAL AND CULTURAL PERSPECTIVE edited by Mario Biagioli, Peter Jaszi, and Martha Woodmansee. Reviewed by Rebecca Tushnet, Georgetown University Law School COPYRIGHT LAW AND THE PUBLIC INTEREST IN THE NINETEENTH CENTURY by Isabella Alexander. Reviewed by H. Tomas Gomez-Arostegui, Lewis & Clark Law School THE GLOBAL GOVERNANCE OF KNOWLEDGE: PATENT OFFICES AND THEIR CLIENTS by Peter Drahos. Reviewed by Margo A. Bagley, University of Virginia School of Law TRADEMARK LAW AND THEORY: A HANDBOOK OF CONTEMPORARY RESEARCH edited by Graeme B. Dinwoodie and Mark D. Janis. Reviewed by Leah Chan Grinvald, Saint Louis University School of Law FIGURES OF INVENTION: A HISTORY OF MODERN PATENT LAW by Alain Pottage and Brad Sherman. Reviewed by Oren Bracha, The University of Texas School of La

    Semantic multimedia modelling & interpretation for annotation

    Get PDF
    The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user’s high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems

    Exploiting Context-Dependent Quality Metadata for Linked Data Source Selection

    Get PDF
    The traditional Web is evolving into the Web of Data which consists of huge collections of structured data over poorly controlled distributed data sources. Live queries are needed to get current information out of this global data space. In live query processing, source selection deserves attention since it allows us to identify the sources which might likely contain the relevant data. The thesis proposes a source selection technique in the context of live query processing on Linked Open Data, which takes into account the context of the request and the quality of data contained in the sources to enhance the relevance (since the context enables a better interpretation of the request) and the quality of the answers (which will be obtained by processing the request on the selected sources). Specifically, the thesis proposes an extension of the QTree indexing structure that had been proposed as a data summary to support source selection based on source content, to take into account quality and contextual information. With reference to a specific case study, the thesis also contributes an approach, relying on the Luzzu framework, to assess the quality of a source with respect to for a given context (according to different quality dimensions). An experimental evaluation of the proposed techniques is also provide
    corecore