Search CORE

1,457 research outputs found

A personalized web page content filtering model based on segmentation

Author: Aghila G.
Kuppusamy K. S.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 08/02/2012
Field of study

In the view of massive content explosion in World Wide Web through diverse sources, it has become mandatory to have content filtering tools. The filtering of contents of the web pages holds greater significance in cases of access by minor-age people. The traditional web page blocking systems goes by the Boolean methodology of either displaying the full page or blocking it completely. With the increased dynamism in the web pages, it has become a common phenomenon that different portions of the web page holds different types of content at different time instances. This paper proposes a model to block the contents at a fine-grained level i.e. instead of completely blocking the page it would be efficient to block only those segments which holds the contents to be blocked. The advantages of this method over the traditional methods are fine-graining level of blocking and automatic identification of portions of the page to be blocked. The experiments conducted on the proposed model indicate 88% of accuracy in filtering out the segments.Comment: 11 Pages, 6 Figure

arXiv.org e-Print Archive

Crossref

Automated information extraction from web APIs documentation

Author: A. Sheth
C. Pedrinaci
K. Gomadam
N. Steinmetz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

A fundamental characteristic of Web APIs is the fact that, de facto, providers hardly follow any standard practices while implementing, publishing, and documenting their APIs. As a consequence, the discovery and use of these services by third parties is significantly hampered. In order to achieve further automation while exploiting Web APIs we present an approach for automatically extracting relevant technical information from the Web pages documenting them. In particular we have devised two algorithms that automatically extract technical details such as operation names, operation descriptions or URI templates from the documentation of Web APIs adopting either RPC or RESTful interfaces. The algorithms devised, which exploit advanced DOM processing as well as state of the art Information Extraction and Natural Language Processing techniques, have been evaluated against a detailed dataset exhibiting a high precision and recall–around 90% for both REST and RPC APIs outperforming state of the art information extraction algorithms

Crossref

Open Research Online (The Open University)

Extracting Informative Textual Parts from Web Pages Containing User-Generated Content

Author: Katsimpras Georgios
Pappas Nikolaos
Stamatatos Efstathios
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

The vast amount of user-generated content on the Web has increased the need for handling the problem of automatically processing content in web pages. The segmentation of web pages and noise (non-informative segment) removal are important pre-processing steps in a variety of applications such as sentiment analysis, text summarization and information retrieval. Currently, these two tasks tend to be handled separately or are handled together without emphasizing the diversity of the web corpora and the web page type detection. We present a unified approach that is able to provide robust identification of informative textual parts in web pages along with accurate type detection. The proposed algorithm takes into account visual and non-visual characteristics of a web page and is able to remove noisy parts from three major categories of pages which contain user-generated content (News, Blogs, Discussions). Based on a human annotated corpus consisting of diverse topics, domains and templates, we demonstrate the learning abilities of our algorithm, we examine its efectiveness in extracting the informative textual parts and its usage as a rule-based classifier for web page type detection in a realistic web setting

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Currency security and forensics: a survey

Author: Garhwal A
J. Chambers
Kankanhalli M
Yan W-Q
Publication venue: Springer
Publication date: 31/12/2013
Field of study

By its definition, the word currency refers to an agreed medium for exchange, a nation’s currency is the formal medium enforced by the elected governing entity. Throughout history, issuers have faced one common threat: counterfeiting. Despite technological advancements, overcoming counterfeit production remains a distant future. Scientific determination of authenticity requires a deep understanding of the raw materials and manufacturing processes involved. This survey serves as a synthesis of the current literature to understand the technology and the mechanics involved in currency manufacture and security, whilst identifying gaps in the current literature. Ultimately, a robust currency is desire

AUT Scholarly Commons

Human interaction with digital ink : legibility measurement and structural analysis

Author: Butler Timothy S
Publication venue: University of Hertfordshire
Publication date: 01/01/2003
Field of study

Literature suggests that it is possible to design and implement pen-based computer interfaces that resemble the use of pen and paper. These interfaces appear to allow users freedom in expressing ideas and seem to be familiar and easy to use. Different ideas have been put forward concerning this type of interface, however despite the commonality of aims and problems faced, there does not appear to be a common approach to their design and implementation. This thesis aims to progress the development of pen-based computer interfaces that resemble the use of pen and paper. To do this, a conceptual model is proposed for interfaces that enable interaction with "digital ink". This conceptual model is used to organize and analyse the broad range of literature related to pen-based interfaces, and to identify topics that are not sufficiently addressed by published research. Two issues highlighted by the model: digital ink legibility and digital ink structuring, are then investigated. In the first investigation, methods are devised to objectively and subjectively measure the legibility of handwritten script. These methods are then piloted in experiments that vary the horizontal rendering resolution of handwritten script displayed on a computer screen. Script legibility is shown to decrease with rendering resolution, after it drops below a threshold value. In the second investigation, the clustering of digital ink strokes into words is addressed. A method of rating the accuracy of clustering algorithms is proposed: the percentage of words spoiled. The clustering error rate is found to vary among different writers, for a clustering algorithm using the geometric features of both ink strokes, and the gaps between them. The work contributes a conceptual interface model, methods of measuring digital ink legibility, and techniques for investigating stroke clustering features, to the field of digital ink interaction research

CiteSeerX

University of Hertfordshire Research Archive

Automated Classification and Localization of Daily Deal Content from the Web

Author: Bagheri Ebrahim
Cuzzola John
Gasevic Dragan
Jovanovic Jelena
Publication venue: 'Elsevier BV'
Publication date: 30/06/2015
Field of study

Edinburgh Research Explorer

Mining information interaction behavior:Academic papers and enterprise emails

Author: Li X.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Mining information interaction behavior:Academic papers and enterprise emails

Author: Li X.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Computer-aided music therapy evaluation : investigating and testing the music therapy logbook prototype 1 system

Author: Streeter Elaine
Publication venue
Publication date: 01/01/1987
Field of study

This thesis describes the investigation and testing of a prototype music therapy practice evaluation system: Music Therapy Logbook, Prototype 1. Such a system is intended to be used by music therapists as an aid to their existing evaluation techniques. The investigation of user needs, the multi-disciplinary team work, the pre-field and field recording tests, and the computational music analysis tests are each presented in turn, preceded by an in depth literature review on historical and existing music therapy evaluation methods. A final chapter presents investigative design work for proposed user interface software pages for the Music Therapy Logbook system. Four surveys are presented (n = 6, n = 10, n = 44, n =125). These gathered information on current music therapy evaluation methods, therapists‘ suggested functions for the system, and therapists‘ attitudes towards using the proposed automatic and semi-automatic music therapy evaluation functions, some of which were tested during the research period. The results indicate enthusiasm for using the system to; record individual music therapy sessions, create written notes linked to recordings and undertake automatic and/or semi-automatic computer aided music therapy analysis; the main purpose of which is to quantify changes in a therapist‘s and patient‘s use of music over time, (Streeter, 2010). Simulated music therapy improvisations were recorded and analysed. The system was then used by a music therapist working in a neuro-disability unit, to record individual therapy sessions with patients with acquired brain injuries. These recordings constitute the first music therapy audio recordings employing multi-track audio recording techniques, using existing radio microphone technology. The computational music analysis tests applied to the recordings are the first such tests to be applied to recordings of music therapy sessions in which an individual patient played acoustic, rather than MIDI, instruments. The findings prove it is possible to gather objective evidence of changes in a patient‘s and therapist‘s use of music over time, using the Music Therapy Logbook Prototype 1 system.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

ESE - Salento University Publishing

Università del Salento: ESE - Salento University Publishing

OpenGrey Repository