Search CORE

25,078 research outputs found

MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten Compound Characters

Author: Ferdous Jannatul
Hossain Syed Akhter
Karmaker Suvrajit
Rabby A K M Shahariar Azad
Publication venue
Publication date: 06/05/2020
Field of study

At present, recognition of the Bangla handwriting compound character has been an essential issue for many years. In recent years there have been application-based researches in machine learning, and deep learning, which is gained interest, and most notably is handwriting recognition because it has a tremendous application such as Bangla OCR. MatrriVasha, the project which can recognize Bangla, handwritten several compound characters. Currently, compound character recognition is an important topic due to its variant application, and helps to create old forms, and information digitization with reliability. But unfortunately, there is a lack of a comprehensive dataset that can categorize all types of Bangla compound characters. MatrriVasha is an attempt to align compound character, and it's challenging because each person has a unique style of writing shapes. After all, MatrriVasha has proposed a dataset that intends to recognize Bangla 120(one hundred twenty) compound characters that consist of 2552(two thousand five hundred fifty-two) isolated handwritten characters written unique writers which were collected from within Bangladesh. This dataset faced problems in terms of the district, age, and gender-based written related research because the samples were collected that includes a verity of the district, age group, and the equal number of males, and females. As of now, our proposed dataset is so far the most extensive dataset for Bangla compound characters. It is intended to frame the acknowledgment technique for handwritten Bangla compound character. In the future, this dataset will be made publicly available to help to widen the research.Comment: 19 fig, 2 tabl

arXiv.org e-Print Archive

A survey of comics research in computer science

Author: Augereau Olivier
Iwata Motoi
Kise Koichi
Publication venue
Publication date: 15/04/2018
Field of study

Graphical novels such as comics and mangas are well known all over the world. The digital transition started to change the way people are reading comics, more and more on smartphones and tablets and less and less on paper. In the recent years, a wide variety of research about comics has been proposed and might change the way comics are created, distributed and read in future years. Early work focuses on low level document image analysis: indeed comic books are complex, they contains text, drawings, balloon, panels, onomatopoeia, etc. Different fields of computer science covered research about user interaction and content generation such as multimedia, artificial intelligence, human-computer interaction, etc. with different sets of values. We propose in this paper to review the previous research about comics in computer science, to state what have been done and to give some insights about the main outlooks

arXiv.org e-Print Archive

Directory of Open Access Journals

SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering

Author: Dou Wensheng
Gao Chushu
Huang Tao
Wang Jie
Wei Jun
Xu Liang
Zhong Hua
Publication venue
Publication date: 27/04/2017
Field of study

Version information plays an important role in spreadsheet understanding, maintaining and quality improving. However, end users rarely use version control tools to document spreadsheet version information. Thus, the spreadsheet version information is missing, and different versions of a spreadsheet coexist as individual and similar spreadsheets. Existing approaches try to recover spreadsheet version information through clustering these similar spreadsheets based on spreadsheet filenames or related email conversation. However, the applicability and accuracy of existing clustering approaches are limited due to the necessary information (e.g., filenames and email conversation) is usually missing. We inspected the versioned spreadsheets in VEnron, which is extracted from the Enron Corporation. In VEnron, the different versions of a spreadsheet are clustered into an evolution group. We observed that the versioned spreadsheets in each evolution group exhibit certain common features (e.g., similar table headers and worksheet names). Based on this observation, we proposed an automatic clustering algorithm, SpreadCluster. SpreadCluster learns the criteria of features from the versioned spreadsheets in VEnron, and then automatically clusters spreadsheets with the similar features into the same evolution group. We applied SpreadCluster on all spreadsheets in the Enron corpus. The evaluation result shows that SpreadCluster could cluster spreadsheets with higher precision and recall rate than the filename-based approach used by VEnron. Based on the clustering result by SpreadCluster, we further created a new versioned spreadsheet corpus VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the other two spreadsheet corpora FUSE and EUSES. The results show that SpreadCluster can cluster the versioned spreadsheets in these two corpora with high precision.Comment: 12 pages, MSR 201

arXiv.org e-Print Archive

Crossref

The State-of-the-Art of Set Visualization

Author: Alper
Alsallakh
Bailey
Baron
Basole
Bothorel
Brandes
Caldas
Cantor
Cheng
Chow
Cleveland
Cole
Collins
Dice
Dinkla
Dinkla
Dwyer
Dörk
Eklund
Flower
Flower
Flower
Freiler
Gansner
Ganter
Gottfried
Greenacre
Gurr
Hamers
Henry Riche
Hofmann
Howse
Kestler
Kim
Koffka
Kosara
Krzywinski
Lenz
Lex
Lex
Meulemans
Micallef
Micallef
Micallef
Mäkinen
Nikulenkov
Oelke
Palmer
Park
Rodgers
Rodgers
Rodgers
Rodgers
Rodgers
Ruskey
Sadana
Schulz
Simonetto
Stapleton
Stapleton
Stapleton
Stapleton
Stapleton
Stasko
Steinberger
Tarnita
Treisman
Tunkelang
Tversky
Urbas
Urbas
Vehlow
Venn
Ware
Wertheimer
Wilkinson
Wille
Xu
Zhou
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Sets comprise a generic data model that has been used in a variety of data analysis problems. Such problems involve analysing and visualizing set relations between multiple sets defined over the same collection of elements. However, visualizing sets is a non-trivial problem due to the large number of possible relations between them. We provide a systematic overview of state-of-the-art techniques for visualizing different kinds of set relations. We classify these techniques into six main categories according to the visual representations they use and the tasks they support. We compare the categories to provide guidance for choosing an appropriate technique for a given problem. Finally, we identify challenges in this area that need further research and propose possible directions to address these challenges. Further resources on set visualization are available at http://www.setviz.net

Crossref

Kent Academic Repository

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref

Searching and organizing images across languages

Author: Clough P.
Sanderson M.
Shou X.M.
Publication venue
Publication date: 01/01/2005
Field of study

With the continual growth of users on the Web from a wide range of countries, supporting such users in their search of cultural heritage collections will grow in importance. In the next few years, the growth areas of Internet users will come from the Indian sub-continent and China. Consequently, if holders of cultural heritage collections wish their content to be viewable by the full range of users coming to the Internet, the range of languages that they need to support will have to grow. This paper will present recent work conducted at the University of Sheffield (and now being implemented in BRICKS) on how to use automatic translation to provide search and organisation facilities for a historical image search engine. The system allows users to search for images in seven different languages, providing means for the user to examine translated image captions and browse retrieved images organised by categories written in their native language

White Rose Research Online

Engineering simulations for cancer systems biology

Author: Andrews Paul S.
Bown James L.
Deeni Yusuf Y.
Goltsov Alexey
Idowu Michael A.
Polac Fiona A.C.
Sampson Adam T.
Shovman Mark
Stepney Susan
Publication venue
Publication date: 01/11/2012
Field of study

Computer simulation can be used to inform in vivo and in vitro experimentation, enabling rapid, low-cost hypothesis generation and directing experimental design in order to test those hypotheses. In this way, in silico models become a scientific instrument for investigation, and so should be developed to high standards, be carefully calibrated and their findings presented in such that they may be reproduced. Here, we outline a framework that supports developing simulations as scientific instruments, and we select cancer systems biology as an exemplar domain, with a particular focus on cellular signalling models. We consider the challenges of lack of data, incomplete knowledge and modelling in the context of a rapidly changing knowledge base. Our framework comprises a process to clearly separate scientific and engineering concerns in model and simulation development, and an argumentation approach to documenting models for rigorous way of recording assumptions and knowledge gaps. We propose interactive, dynamic visualisation tools to enable the biological community to interact with cellular signalling models directly for experimental design. There is a mismatch in scale between these cellular models and tissue structures that are affected by tumours, and bridging this gap requires substantial computational resource. We present concurrent programming as a technology to link scales without losing important details through model simplification. We discuss the value of combining this technology, interactive visualisation, argumentation and model separation to support development of multi-scale models that represent biologically plausible cells arranged in biologically plausible structures that model cell behaviour, interactions and response to therapeutic interventions

CiteSeerX

Abertay Research Portal

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

An End-User Development Perspective on State-of-the-Art Web Development Tools

Author: Howarth Jonathan
Perez-Quinones Manuel A.
Rode Jochen
Rosson Mary Beth
Publication venue
Publication date: 01/01/2005
Field of study

We reviewed and analyzed nine commercially available web development tools from the perspective of suitability for end-user development to compare and contrast alternative and best-of-breed approaches for particular problem areas within web application development (Getting Started, Workflow, Level of Abstraction, Layout, Database, Application Logic, Testing and Debugging, Learning and Scaling, Security, Collaboration, and Deployment). End-user development involves the creation of dynamic websites with support for features like authentication, conditional display, and searching/sorting by casual web developers who have some experience creating static websites but little or no programming knowledge. We found that current tools do not lack functionality, but rather have a variety of problems in ease of use for end users who are nonprogrammers. In particular, while many tools offer wizards and other features designed to facilitate specific aspects of end-user development, none of the tools that we reviewed supports a holistic approach to web application development. We discuss the implications of these problems and conclude with recommendations for the design of improved web development tools that would lower the entry barrier into web programming

Computer Science Technical Reports @Virginia Tech