10,287 research outputs found

    Counterfeit Detection with Multispectral Imaging

    Get PDF
    Multispectral imaging is becoming more practical for a variety of applications due to its ability to provide hyper specific information through a non-destructive analysis. Multispectral imaging cameras can detect light reflectance from different spectral bands of visible and nonvisible wavelengths. Based on the different amount of band reflectance, information can be deduced on the subject. Counterfeit detection applications of multispectral imaging will be decomposed and analyzed in this thesis. Relations between light reflectance and objects’ features will be addressed. The process of the analysis will be broken down to show how this information can be used to provide more insight on the object. This technology provides desired and viable information that can greatly improve multiple fields. For this paper, the multispectral imaging research process of element solution concentrations and counterfeit detection applications of multispectral imaging will be discussed. BaySpec’s OCI-M Ultra Compact Multispectral Imager is used for data collection. This camera is capable of capturing light reflectance from wavelengths of 400 – 1000 nm. Further research opportunities of developing self-automated unmanned aerial vehicles for precision agriculture and extending counterfeit detection applications will also be explored

    Chinese-Catalan: A neural machine translation approach based on pivoting and attention mechanisms

    Get PDF
    This article innovatively addresses machine translation from Chinese to Catalan using neural pivot strategies trained without any direct parallel data. The Catalan language is very similar to Spanish from a linguistic point of view, which motivates the use of Spanish as pivot language. Regarding neural architecture, we are using the latest state-of-the-art, which is the Transformer model, only based on attention mechanisms. Additionally, this work provides new resources to the community, which consists of a human-developed gold standard of 4,000 sentences between Catalan and Chinese and all the others United Nations official languages (Arabic, English, French, Russian, and Spanish). Results show that the standard pseudo-corpus or synthetic pivot approach performs better than cascade.Peer ReviewedPostprint (author's final draft

    GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION

    Get PDF
    The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, however, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed a system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The metafile information is parsed to generate zone, line, word, and character ground truth including location, font information and content in any language supported by Windows. The resulting images can be physically or synthetically degraded by our degradation modules, and used for training and evaluating Optical Character Recognition (OCR) systems. Our document image degradation methodology incorporates several often-encountered types of noise at the page and pixel levels. Examples of OCR evaluation and synthetically degraded document images are given to demonstrate the effectiveness

    Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR

    Get PDF
    The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conation step and useful in case of few language-specific resources. For English, the corpusbased stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages

    Intelligent Agents for Retrieving Chinese Web Financial News

    Get PDF
    As the popularity of World Wide Web increases, many newspapers expand their services by providing news information on the Web in order to be competitive and increase benefit. The Web provides real time dissemination of financial news to investors. However, most investors find it difficult to search for the financial information of interest from the huge Web information space. Most of the commercial search engines are not user friendly and do not provide any tailor-made intelligent agents to search for relevant Web documents on behalf of users. Users have to exert a lot of effort to submit an appropriate query to obtain the information they want. Intelligent agents that learn user preferences and monitor the postings of Web information providers are desired. In this paper, we present an intelligent agent that utilizes user profiles and user feedback to search for the Chinese Web financial news articles on behalf of users. A Chinese indexing component is developed to index the continuously fetched Chinese financial news articles. User profiles capture the basic knowledge of user preferences based on the sources of news articles, the regions of the news reported, categories of industries related, the listed companies, and user specified keywords. User feedback captures the semantics of the user rated news articles. The search engine will rank the top 20 news articles that users are most interested in based on these inputs. Experiments were conducted to measure the performance of the agents based on the inputs from user profile and user feedback
    corecore