Search CORE

2,258 research outputs found

Automatic Palaeographic Exploration of Genizah Manuscripts

Author: Choueka Yaakov
Dershowitz Nachum
German Tanya
Potikha Liza
Shweka Roni
Wolf Lior
Publication venue: Books on Demand (BoD)
Publication date: 01/01/2011
Field of study

The Cairo Genizah is a collection of hand-written documents containing approximately 350,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in some 75 libraries and private collections worldwide, but there is an ongoing effort to document and catalogue all extant fragments. Palaeographic information plays a key role in the study of the Genizah collection. Script style, and–more specifically–handwriting, can be used to identify fragments that might originate from the same original work. Such matched fragments, commonly referred to as “joins”, are currently identified manually by experts, and presumably only a small fraction of existing joins have been discovered to date. In this work, we show that automatic handwriting matching functions, obtained from non-specific features using a corpus of writing samples, can perform this task quite reliably. In addition, we explore the problem of grouping various Genizah documents by script style, without being provided any prior information about the relevant styles. The automatically obtained grouping agrees, for the most part, with the palaeographic taxonomy. In cases where the method fails, it is due to apparent similarities between related scripts

Kölner UniversitätsPublikationsServer

Possibility Theory-Based Approach to Spam Email Detection

Author: Ma Wanli
Nguyen Thien
Sharma Dharmendra
Tran Dat
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Crossref

University of Canberra Research Repository

Recommended from our members

Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes

Author: Tsai Chun-Yu
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube-like videos into short visual segments (812sec) and textual words (less than 10 terms). In the 2013 Trecvid Multimedia Event Recounting competition, this system placed first in recognition time efficiency, while remaining above average in description accuracy. Secondly, we demonstrate the summarization of large amounts of online international news videos. In order to understand an international event such as Ebola virus, AirAsia Flight 8501 and Zika virus comprehensively, we present a novel and efficient constrained tensor factorization algorithm that first represents a video archive of multimedia news stories concerning a news event as a sparse tensor of order 4. The dimensions correspond to extracted visual memes, verbal tags, time periods, and cultures. The iterative algorithm approximately but accurately extracts coherent quad-clusters, each of which represents a significant summary of an important independent aspect of the news event. We give examples of quad-clusters extracted from tensors with at least 108 entries derived from international news coverage. We show the method is fast, can be tuned to give preferences to any subset of its four dimensions, and exceeds three existing methods in performance. Thirdly, noting that the co-occurrence of visual memes and tags in our summarization result is sparse, we show how to model cross-cultural visual meme influence based on normalized PageRank, which more accurately captures the rates at which visual memes are reposted in a specified time period in a specified culture. Lastly, we establish the correspondences of videos and text descriptions in different cultures by reliable visual cues, detect culture-specific tags for visual memes and then annotate videos in a cultural settings. Starting with any video with less text or no text in one culture (say, US), we select candidate annotations in the text of another culture (say, China) to annotate US video. Through analyzing the similarity of images annotated by those candidates, we can derive a set of proper tags from the viewpoints of another culture (China). We illustrate cultural-based annotation examples by segments of international news. We evaluate the generated tags by cross-cultural tag frequency, tag precision, and user studies

Columbia University Academic Commons

CCLAP: Controllable Chinese Landscape Painting Generation via Latent Diffusion Model

Author: Bai Jinfeng
Ji Zhilong
Shan Shiguang
Wang Zhongqi
Zhang Jie
Publication venue
Publication date: 09/04/2023
Field of study

With the development of deep generative models, recent years have seen great success of Chinese landscape painting generation. However, few works focus on controllable Chinese landscape painting generation due to the lack of data and limited modeling capabilities. In this work, we propose a controllable Chinese landscape painting generation method named CCLAP, which can generate painting with specific content and style based on Latent Diffusion Model. Specifically, it consists of two cascaded modules, i.e., content generator and style aggregator. The content generator module guarantees the content of generated paintings specific to the input text. While the style aggregator module is to generate paintings of a style corresponding to a reference image. Moreover, a new dataset of Chinese landscape paintings named CLAP is collected for comprehensive evaluation. Both the qualitative and quantitative results demonstrate that our method achieves state-of-the-art performance, especially in artfully-composed and artistic conception. Codes are available at https://github.com/Robin-WZQ/CCLAP.Comment: 8 pages,13 figure

arXiv.org e-Print Archive

La encrucijada del patrimonio manuscrito de África Oriental: ¿Del polvo a lo digital o polvo digital?

Author: Krätli Graziano
Publication venue: 'Edicions de la Universitat de Barcelona'
Publication date: 02/02/2016
Field of study

Ever since colonial times, West Africa Arabic manuscripts have been the object of ambiguous attention on the part of administrators, conservators and scholars. As a result, collections were first concealed or disclosed (depending on the predatory or protective nature of Western attentions), then targeted by modern scientific initiatives focused on bibliographic description, content analysis and preservation. A review of their accomplishments and shortcomings will help understand how many such projects often failed to meet – or even understand – the expectations of their intended and potential users. Or if they did meet such expectations, they misunderstood or underestimated the nature of the tools they employed and the rapidly evolving technological and cultural environment that nurtures and supports them. Only by understanding these evolving trends and realities, and therefore engaging information professionals equipped with the appropriate knowledge and skills to take advantage of them, will new initiatives to preserve and document West African Arabic manuscript heritage succeed in providing continuous and relevant access to its intellectual content and material culture.Ever since colonial times, West Africa Arabic manuscripts have been the object of ambiguous attention on the part of administrators, conservators and scholars. As a result, collections were first concealed or disclosed (depending on the predatory or protective nature of Western attentions), then targeted by modern scientific initiatives focused on bibliographic description, content analysis and preservation. A review of their accomplishments and shortcomings will help understand how many such projects often failed to meet – or even understand – the expectations of their intended and potential users. Or if they did meet such expectations, they misunderstood or underestimated the nature of the tools they employed and the rapidly evolving technological and cultural environment that nurtures and supports them. Only by understanding these evolving trends and realities, and therefore engaging information professionals equipped with the appropriate knowledge and skills to take advantage of them, will new initiatives to preserve and document West African Arabic manuscript heritage succeed in providing continuous and relevant access to its intellectual content and material culture.Desde la época colonial, los manuscritos arábigos conservados en África Oriental han sido objeto de una atención ambigua por parte de las administraciones, los conservadores y los investigadores. Como resultado, las colecciones han sido, primeramente, ocultadas o reveladas (según haya sido la naturaleza depredadora o protectora de Occidente). Más tarde, han sido objeto de iniciativas científicas modernas centradas en la descripción bibliográfica, el análisis del contenido y la conservación. Una revisión tanto de los logros como de los errores ayudará a entender cómo algunos proyectos han fallado a menudo a la hora de contemplar –o incluso entender– las necesidades de los potenciales lectores. Si se han contemplado dichas expectativas, se ha malinterpretado o infravalorado la naturaleza de las herramientas que se han empleado, así como el rápido desarrollo tecnológico y cultural que las sostiene. Sólo si entendemos las realidades y tendencias inherentes y si atendemos, consecuencia, a la información que proporcionan, los profesionales, equipados con nuevas competencias y recursos, podrán desarrollar nuevas iniciativas para preservar el patrimonio documental de África Oriental. De este modo, se conseguirá permitir un acceso continuo y relevante al contenido intelectual de dicho patrimonio y a la cultura material que representa

Revistes CientÃfiques de la Universitat de Barcelona

Adaptive Feature Extraction Method for Degraded Character Recognition

Author: Junji Yamato
Minako Sawaki
Minoru Mori
Publication venue: 'IntechOpen'
Publication date: 17/08/2010
Field of study

IntechOpen

Historical stereotypes and histories of stereotypes

Author: Knights Mark
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2014
Field of study

Warwick Research Archives Portal Repository

A novel image matching approach for word spotting

Author: Shah Muhammad Ismail
Publication venue
Publication date: 01/01/2009
Field of study

Word spotting has been adopted and used by various researchers as a complementary technique to Optical Character Recognition for document analysis and retrieval. The various applications of word spotting include document indexing, image retrieval and information filtering. The important factors in word spotting techniques are pre-processing, selection and extraction of proper features and image matching algorithms. The Correlation Similarity Measure (CORR) algorithm is considered to be a faster matching algorithm, originally defined for finding similarities between binary patterns. In the word spotting literature the CORR algorithm has been used successfully to compare the GSC binary features extracted from binary word images, i.e., Gradient, Structural and Concavity (GSC) features. However, the problem with this approach is that binarization of images leads to a loss of very useful information. Furthermore, before extracting GSC binary features the word images must be skew corrected and slant normalized, which is not only difficult but in some cases impossible in Arabic and modified Arabic scripts. We present a new approach in which the Correlation Similarity Measure (CORR) algorithm has been used innovatively to compare Gray-scale word images. In this approach, binarization of images, skew correction and slant normalization of word images are not required at all. The various features, i.e., projection profiles, word profiles and transitional features are extracted from the Gray-scale word images and converted into their binary equivalents, which are compared via CORR algorithm with greater speed and higher accuracy. The experiments have been conducted on Gray-scale versions of newly created handwritten databases of Pashto and Dari languages, written in modified Arabic scripts. For each of these languages we have used 4599 words relating to 21 different word classes collected from 219 writers. The average precision rates achieved for Pashto and Dari languages were 93.18 % and 93.75 %, respectively. The time taken for matching a pair of images was 1.43 milli-seconds. In addition, we will present the handwritten databases for two well-known Indo- Iranian languages, i.e., Pashto and Dari languages. These are large databases which contain six types of data, i.e., Dates, Isolated Digits, Numeral Strings, Isolated Characters, Different Words and Special Symbols, written by native speakers of the corresponding languages

Concordia University Research Repository