Search CORE

1,726 research outputs found

On- Device Information Extraction from Screenshots in form of tags

Author: Agarwal Ankur
Changmai Benu
Goyal Manoj
Kumar Sumit
Mohanty Debi
Moharana Sukumar
Ramena Gopi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/01/2020
Field of study

We propose a method to make mobile screenshots easily searchable. In this paper, we present the workflow in which we: 1) preprocessed a collection of screenshots, 2) identified script presentin image, 3) extracted unstructured text from images, 4) identifiedlanguage of the extracted text, 5) extracted keywords from the text, 6) identified tags based on image features, 7) expanded tag set by identifying related keywords, 8) inserted image tags with relevant images after ranking and indexed them to make it searchable on device. We made the pipeline which supports multiple languages and executed it on-device, which addressed privacy concerns. We developed novel architectures for components in the pipeline, optimized performance and memory for on-device computation. We observed from experimentation that the solution developed can reduce overall user effort and improve end user experience while searching, whose results are published

arXiv.org e-Print Archive

Crossref

Recommended from our members

Automating Content Extraction of HTML Documents

Author: Grimm Peter
Gupta Suhit
Kaiser Gail E.
Starren Justin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

Web pages often contain clutter (such as unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction of 'useful and relevant' content from web pages has many applications, including cell phone and PDA browsing, speech rendering for the visually impaired, and text summarization. Most approaches to making content more readable involve changing font size or removing HTML and data components such as images, which takes away from a webpage's inherent look and feel. Unlike 'Content Reformatting', which aims to reproduce the entire webpage in a more convenient form, our solution directly addresses 'Content Extraction'. We have developed a framework that employs an easily extensible set of techniques. It incorporates advantages of previous work on content extraction. Our key insight is to work with DOM trees, a W3C specified interface that allows programs to dynamically access document structure, rather than with raw HTML markup. We have implemented our approach in a publicly available Web proxy to extract content from HTML web pages. This proxy can be used both centrally, administered for groups of users, as well as by individuals for personal browsers. We have also, after receiving feedback from users about the proxy, created a revised version with improved performance and accessibility in mind

Columbia University Academic Commons

Medical Analysis Question and Answering Application for Internet Enabled Mobile Devices

Author: Nguyen Loc
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2011
Field of study

Mobile devices such as smart phones, the iPhone, and the iPad have become more popular in recent years. With access to the Internet through cellular or WIFI networks, these mobile devices can make use of the great source of information available on the Internet. Unlike a desktop or laptop computer, an Internet enabled mobile device is designed to be carried around and available to the owner almost instantly at any moment of the day. Despite having such great advantage and potential, searching for information with a mobile device remains a difficult task. Mobile device users have to juggle between different search engines through the web browser to seek useful information. The small screen of most mobile devices makes it very difficult to interact with the search engines using the device\u27s web browser. This project focuses on the development of a mobile application that attempts to make information searching easier for mobile device users. In addition, the application is designed to answer search queries that are related to the medical field. Search query in the medical field requires more selective results and that the results come from trusted 4 sources before it can be used. The application addresses this problem by introducing a database of community knowledge designed to store credible medically related articles that could help answer medical specific search queries. This would help minimize the need to look for answers from the Internet, which often could contain unreliable information. In addition to answering search queries, the application also allows user to store personal medical records and conveniently share that information as needed using email

SJSU ScholarWorks

Retrieving Ambiguous Sounds Using Perceptual Timbral Attributes in Audio Production Environments

Author: Correya Albin Andrew
Publication venue
Publication date
Field of study

For over an decade, one of the well identified problem within audio production environments is the effective retrieval and management of sound libraries. Most of the self-recorded and commercially produced sound libraries are usually well structured in terms of meta-data and textual descriptions and thus allowing traditional text-based retrieval approaches to obtain satisfiable results. However, traditional information retrieval techniques pose limitations in retrieving ambiguous sound collections (ie. sounds with no identifiable origin, foley sounds, synthesized sound effects, abstract sounds) due to the difficulties in textual descriptions and the complex psychoacoustic nature of the sound. Early psychoacoustical studies propose perceptual acoustical qualities as an effective way of describing these category of sounds [1]. In Music Information Retrieval (MIR) studies, this problem were mostly studied and explored in context of content-based audio retrieval. However, we observed that most of the commercial available systems in the market neither integrated advanced content-based sound descriptions nor the visualization and interface design approaches evolved in the last years. Our research was mainly aimed to investigate two things; 1. Development of audio retrieval system incorporating high level timbral features as search parameters. 2. Investigate user-centered approach in integrating these features into audio production pipelines using expert-user studies. In this project, We present an prototype which is similar to traditional sound browsers (list-based browsing) with an added functionality of filtering and ranking sounds by perceptual timbral features such as brightness, depth, roughness and hardness. Our main focus was on the retrieval process by timbral features. Inspiring from the recent focus on user-centered systems ([2], [3]) in the MIR community, in-depth interviews and qualitative evaluation of the system were conducted with expert-user in order to identify the underlying problems. Our studies observed the potential applications of high-level perceptual timbral features in audio production pipelines using a probe system and expert-user studies. We also outlined future guidelines and possible improvements to the system from the outcomes of this research

ZENODO

Owl Eyes: Spotting UI Display Issues via Visual Understanding

Author: Atif
Berson Alex
Breiman Leo
Chen Chunyang
Chen Jieshan
Chen Jieshan
Chen Sen
Chunyang Chen
Gao Yi
Goodfellow Ian
Ioffe Sergey
Jieshan Chen
Ki Taeyeon
Kotsiantis Sotiris B
LeCun Yann
Manning Christopher D
Martin James
Moran Kevin
Moran Kevin
Oda Yusuke
Siegel Sidney
Simard Patrice Y.
Simonyan Karen
Wang Junjie
Zhang Tao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/09/2020
Field of study

Graphical User Interface (GUI) provides a visual bridge between a software application and end users, through which they can interact with each other. With the development of technology and aesthetics, the visual effects of the GUI are more and more attracting. However, such GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, blurred screen, missing image always occur during GUI rendering on different devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a novel approach, OwlEye, based on deep learning for modelling visual information of the GUI screenshot. Therefore, OwlEye can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. We manually construct a large-scale labelled dataset with 4,470 GUI screenshots with UI display issues and develop a heuristics-based data augmentation method for boosting the performance of our OwlEye. The evaluation demonstrates that our OwlEye can achieve 85% precision and 84% recall in detecting UI display issues, and 90% accuracy in localizing these issues. We also evaluate OwlEye with popular Android apps on Google Play and F-droid, and successfully uncover 57 previously-undetected UI display issues with 26 of them being confirmed or fixed so far.Comment: Accepted to 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 20

arXiv.org e-Print Archive

Crossref

DOM-based Content Extraction of HTML Documents

Author: Grimm Peter
Gupta Suhit
Kaiser Gail E.
Neistadt David
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2002
Field of study

Web pages often contain clutter around the body of the article as well as distracting features that take away from the true information that the user is pursuing. This can range from pop-up ads to flashy banners to unnecessary images and links scattered around the screen. Extraction of 'useful and relevant' content from web pages, has many applications ranging from lightweight environments, like cell phone and PDA browsing, to speech rendering for the visually impaired, to text summarization Most approaches to removing the clutter or making the content more readable involves either changing the size of the font or simply removing certain HTML-denoted components like images, thus taking away from the webpage's inherent look and feel. Unlike Content Reformatting, which aims to reproduce the entire webpage in a more convenient form, our solution directly addresses Content Extraction. We have developed a framework that employs an easily extensible set of techniques that incorporate advantages of previous work on content extraction while limiting the disadvantages. Our key insight is to work with the Document Object Model tree (after parsing and correcting the HTML), rather than with raw HTML markup. We have implemented our approach in a publicly available Web proxy that anyone can use to extract content from HTML web pages for their own purposes

CiteSeerX

Crossref

Columbia University Academic Commons