279,663 research outputs found

    An Investigation into world wide web publishing with the hypertext markup language

    Get PDF
    The purpose of this thesis project was to test and to demonstrate the World Wide Web as a publishing vehicle by creating a Web presence for the School of Printing Management and Sciences. In order to reach this goal, a full understanding of the Hypertext Markup Language must first be realized. Once this is accomplished, issues regarding integration of mixed-media elements within an HTML document were investigated. Once a prototype of the HTML document was accomplished, the mixed-media elements were tested and evaluated for proper integration and contextual cohesiveness. Many issues regarding implementation of mixed-media elements, such as file size and file format were addressed upon testing. One of the additional goals of this project is a comprehensive description of the methodology for creating and maintaining a World Wide Web publishing presence. This addresses: navigational software, structuring HTML documents, hyper text linking, HTML style issues and limitations, effective integration of mixedmedia elements, inline and external image issues, testing documents, advertising documents, strategies for determining proper file sizes and formats of mixedmedia elements, integrating supplemental programs, World Wide Web Server issues, installing HTML and mixed-media files onto a World Wide Web Server, etc. The Web site located at (http://www.rit.edu/~spms) served as the vehicle for the investigation. Results of the study revealed the issues of providing data that services users across a wide range of computer systems, with different bandwidth restrictions, utilizing a myriad of computer software. Specific standards apply to An Investigation into World Wide Web Publishing with the Hypertext Markup Language alleviate much of the guesswork, however, publishing on the Internet remains to be as challenging as it is rewarding. The Web\u27s format and the opportunity to reach millions of potential customers is creating new types of publishing ventures in true gold-rush fashion. The Web is being touted as the fourth medium, and some suggest it will have as great an impact on society as print, radio and television. The growth of the Web is explosive and will assuredly continue to blossom. Upon completion of this study, the author remains skeptical whether the World Wide Web is the medium of the future. It has, however, created a trend which will forever reshape the publishing world and the way information seekers receive their data. Publishing will change from a commodity based market where prices are based upon cost, and shift to a service market where prices are based upon the value of the information. Each reader requiring selected information tailored to their specific choice will pay for what they select no more paying for an entire magazine or newspaper and reading only one article. The future of information dissemination is electronic, interactive and selective. Whether the delivery mechanism will be the World Wide Web remains to be seen

    Investigation into the use of the World Wide Web as an interface for distributing electronic documents to and from a remote digital color printing site

    Get PDF
    The World Wide Web and Internet are the most talked-about and fastest-growing mediums for information and electronic document distribution. Their growth has, and will continue to have, a great impact on all forms of media, due to their potential to reach millions of individuals. This project demonstrates the capabilities of the World Wide Web to perform, not only as a publishing vehicle, but as a means for communication and document distribution to a digital color printing facility. In order to show this, a Web site was built that incorporated the utilities needed for the successful exchange of data, such as links to additional software applications available on the Web, downloadable ICC Color Management profiles of the digital color press, a hypertext job estimate/information form, an uploadable FTP server, and directions on how to use the service and create the appropriate files. The result is a functional Web-based printing facility that eliminates the restrictions associated with geographical boundaries. The test to see if this site functioned properly was the successful implementation of the aforementioned applications and tools to create actual documents. Those documents, when put through the developed workflow, must exhibit the designers\u27 original intent when reproduced on a remote digital press and when compared to their originals reproduced on that same press. The written portion of this thesis documents the procedures and rationale behind the methodology used

    Applying Model Checking Techniques to Temporal Queries over WorldWideWeb

    Get PDF
    We propose an idea of the use of temporal logic formula for the investigation of the World Wide Web(WWW). Semantic Web is known as a next generation web technology, and constructed over RDF formula. The semantics of RDF is given by Knowledge Interchange Format which expressive power is equal to the First Order Logic (FOL). Because the expressive power of the general FOL is not enough for describing temporal properties, we extend the FOL to FO^2 or to other useful temporal logics using the Kripke Structure extracted from RDF documents. The maximum advantage of our method is to provide the means for queries of database system using rich, powerful temporal operators, and also give guarantees of correctness of a system if the system passed examinations imposed by model checking about temporal properties of systems

    Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

    Get PDF
    Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention.Comment: LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty. An Appendix at http://umiacs.umd.edu/~resnik/amta98/amta98_appendix.html contains test dat

    Investigating people: a qualitative analysis of the search behaviours of open-source intelligence analysts

    Get PDF
    The Internet and the World Wide Web have become integral parts of the lives of many modern individuals, enabling almost instantaneous communication, sharing and broadcasting of thoughts, feelings and opinions. Much of this information is publicly facing, and as such, it can be utilised in a multitude of online investigations, ranging from employee vetting and credit checking to counter-terrorism and fraud prevention/detection. However, the search needs and behaviours of these investigators are not well documented in the literature. In order to address this gap, an in-depth qualitative study was carried out in cooperation with a leading investigation company. The research contribution is an initial identification of Open-Source Intelligence investigator search behaviours, the procedures and practices that they undertake, along with an overview of the difficulties and challenges that they encounter as part of their domain. This lays the foundation for future research in to the varied domain of Open-Source Intelligence gathering

    Knowledge extraction from unstructured data and classification through distributed ontologies

    Get PDF
    The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

    International Legal Collections at U.S. Academic Law School Libraries

    Get PDF
    This study examines how law librarians are participating in the process of creating new fields of international legal research and training. It investigates the current state of international legal collections at twelve public and private U.S. academic law school libraries, illuminating in the process some of the significant shifts that characterize the nature of professional librarianship and information science in the twenty-first century. Included in the study is a discussion of the reference works, research guides, and databases that make up these international legal collections. This is followed by a brief assessment of the trends and challenges that librarians face who work in the field of professional legal education and scholarship

    Image retrieval by hypertext links

    Get PDF
    This paper presents a model for retrieval of images from a large World Wide Web based collection. Rather than considering complex visual recognition algorithms, the model presented is based on combining evidence of the text content and hypertext structure of the Web. The paper shows that certain types of query are amply served by this form of representation. It also presents a novel means of gathering relevance judgements

    On the evolution of digital evidence: novel approaches for cyber investigation

    Get PDF
    2012-2013Nowadays Internet is the fulcrum of our world, and the World Wide Web is the key to access it. We develop relationships on social networks and entrust sensitive documents to online services. Desktop applications are being replaced by fully-fledged web-applications that can be accessed from any devices. This is possible thanks to new web technologies that are being introduced at a very fast pace. However, these advances come at a price. Today, the web is the principal means used by cyber-criminals to perform attacks against people and organizations. In a context where information is extremely dynamic and volatile, the fight against cyber-crime is becoming more and more difficult. This work is divided in two main parts, both aimed at fueling research against cybercrimes. The first part is more focused on a forensic perspective and exposes serious limitations of current investigation approaches when dealing with modern digital information. In particular, it shows how it is possible to leverage common Internet services in order to forge digital evidence, which can be exploited by a cyber-criminal to claim an alibi. Hereinafter, a novel technique to track cyber-criminal activities on the Internet is proposed, aimed at the acquisition and analysis of information from highly dynamic services such as online social networks. The second part is more concerned about the investigation of criminal activities on the web. Aiming at raising awareness for upcoming threats, novel techniques for the obfuscation of web-based attacks are presented. These attacks leverage the same cuttingedge technology used nowadays to build pleasant and fully-featured web applications. Finally, a comprehensive study of today’s top menaces on the web, namely exploit kits, is presented. The result of this study has been the design of new techniques and tools that can be employed by modern honeyclients to better identify and analyze these menaces in the wild. [edited by author]XII n.s
    corecore