115,422 research outputs found

    Modular Web Queries — From Rules to Stores

    Get PDF
    Even with all the progress in Semantic technology, accessing Web data remains a challenging issue with new Web query languages and approaches appearing regularly. Yet most of these languages, including W3C approaches such as XQuery and SPARQL, do little to cope with the explosion of the data size and schemata diversity and richness on the Web. In this paper we propose a straightforward step toward the improvement of this situation that is simple to realize and yet effective: Advanced module systems that make partitioning of (a) the evaluation and (b) the conceptual design of complex Web queries possible. They provide the query programmer with a powerful, but easy to use high-level abstraction for packaging, encapsulating, and reusing conceptually related parts (in our case, rules) of a Web query. The proposed module system combines ease of use thanks to a simple core concept, the partitioning of rules and their consequences in flexible “stores”, with ease of deployment thanks to a reduction semantics. We focus on extending the rule-based Semantic Web query language Xcerpt with such a module system though the same approach can be applied to other (rule-based) languages as well

    Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

    Full text link
    Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to this diversity, previous work has typically relied on domain-specific recipes with limited sharing of the underlying data, model architectures, and objectives. We present Pix2Struct, a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct is pretrained by learning to parse masked screenshots of web pages into simplified HTML. The web, with its richness of visual elements cleanly reflected in the HTML structure, provides a large source of pretraining data well suited to the diversity of downstream tasks. Intuitively, this objective subsumes common pretraining signals such as OCR, language modeling, image captioning. In addition to the novel pretraining strategy, we introduce a variable-resolution input representation and a more flexible integration of language and vision inputs, where language prompts such as questions are rendered directly on top of the input image. For the first time, we show that a single pretrained model can achieve state-of-the-art results in six out of nine tasks across four domains: documents, illustrations, user interfaces, and natural images

    CSR COMMUNICATION MODEL OFFICIAL WEBSITE AS A MEANS OF STAKEHOLDERS DIALOGUE Ati Harmoni1,4, Sri Wulan Windu Ratih2,4, Purwanti3,4

    Get PDF
    This paper investigates the ways in which the corporations included in Bisnis-27 Index disclose information on CSR through their official websites and to identify management requirement of CSR communication through website and to evaluate level of use of website features based on the framework of Media Richness Theory. The study was conducted by observing information about CSR presented on the web. Survey was followed by doing case studies on two companies among them. The case study was to identify the management’s web based CSR communication needs through the framework of Media Richness. CSR communication requirement by management in terms of timeliness, presentation and organization, accessibility, and interaction, while the feature of the websites that is capable to facilitate the needs of management for CSR communication were immediacy, multiple cues, language variety, multiple addressability, personal sources, computer processable memory, externally recordable, and concurrency. The study showed that the CSR communication requirements that are considered critical by management resulting in the use of the web features moderate to high, while the communication requirements that are considered non-critical by management resulting in low to moderate usage of web features

    The Internet as a Service Channel in the Public Sector : A substitute or complement of traditional service channels?

    Get PDF
    The Internet has been used as a channel for public service delivery since the mid 1990’s. During the first years of its existence it was believed to be the service channel of the future, making all other channels obsolete. But until now, the telephone and face-to-face contact remain being used more frequently and are rated higher. By comparing various studies that have recently been conducted in a number of countries, this paper suggests that the characteristics of the channel make it a suitable channel for basic transactions and simple information provision, and that the telephone and face-to-face contact remain prevalent for at least ambiguous and complex tasks. Therefore the Internet might be a complementary channel rather than a substitute of traditional channels. Research findings are interpreted by means of Media Richness Theory, the Social Influence model and Channel Expansion Theory

    Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

    Get PDF
    This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

    Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection

    Full text link
    In this paper, we describe our submission to SemEval-2019 Task 4 on Hyperpartisan News Detection. Our system relies on a variety of engineered features originally used to detect propaganda. This is based on the assumption that biased messages are propagandistic in the sense that they promote a particular political cause or viewpoint. We trained a logistic regression model with features ranging from simple bag-of-words to vocabulary richness and text readability features. Our system achieved 72.9% accuracy on the test data that is annotated manually and 60.8% on the test data that is annotated with distant supervision. Additional experiments showed that significant performance improvements can be achieved with better feature pre-processing.Comment: Hyperpartisanship, propaganda, news media, fake news, SemEval-201

    A theory of contracts for web services

    Get PDF
    <p>Contracts are behavioural descriptions of Web services. We devise a theory of contracts that formalises the compatibility of a client to a service, and the safe replacement of a service with another service. The use of contracts statically ensures the successful completion of every possible interaction between compatible clients and services.</p> <p>The technical device that underlies the theory is the definition of filters, which are explicit coercions that prevent some possible behaviours of services and, in doing so, they make services compatible with different usage scenarios. We show that filters can be seen as proofs of a sound and complete subcontracting deduction system which simultaneously refines and extends Hennessy's classical axiomatisation of the must testing preorder. The relation is decidable and the decision algorithm is obtained via a cut-elimination process that proves the coherence of subcontracting as a logical system.</p> <p>Despite the richness of the technical development, the resulting approach is based on simple ideas and basic intuitions. Remarkably, its application is mostly independent of the language used to program the services or the clients. We also outline the possible practical impact of such a work and the perspectives of future research it opens.</p&gt
    corecore