66 research outputs found

    Optimizing the performance of a server-based classification for a large business document flow

    Get PDF
    The document categorization problem in the case of a large business document flow is considered. Textual and visual embeddings were employed for classification. Textual embeddings were extracted via OCR Tesseract. The Viola and Jones method was applied to generate visual embeddings. This paper describes the performance optimization technology for the implemented classification algorithm. Servers with Intel CPUs were used for the algorithm execution. For single-threaded implementation, high-level and low-level optimizations were performed. High-level optimization was based on the parametrization of the recognition algorithms and the employment of intermediate data. Low-level optimization was carried out via compiler tools allowing for an extended set of SIMD instructions. The implementation of parallelization with several multithreaded applications on multiple servers was also described. The proposed solution was tested using own test data sets of business documents. The proposed method can be applied in modern information systems to analyze the content of a large flow of digital document images.

    Optimizing the performance of a server-based classification for a large business document flow

    Get PDF
    The document categorization problem in the case of a large business document flow is considered. Textual and visual embeddings were employed for classification. Textual embeddings were extracted via OCR Tesseract. The Viola and Jones method was applied to generate visual embeddings. This paper describes the performance optimization technology for the implemented classification algorithm. Servers with Intel CPUs were used for the algorithm execution. For single-threaded implementation, high-level and low-level optimizations were performed. High-level optimization was based on the parametrization of the recognition algorithms and the employment of intermediate data. Low-level optimization was carried out via compiler tools allowing for an extended set of SIMD instructions. The implementation of parallelization with several multithreaded applications on multiple servers was also described. The proposed solution was tested using own test data sets of business documents. The proposed method can be applied in modern information systems to analyze the content of a large flow of digital document images

    Efficiently identifying top k similar entities

    Get PDF
    With the rapid growth in genomic studies, more and more successful researches are being produced that integrate tools and technologies from interdisciplinary sciences. Computational biology or bioinformatics is one such field that successfully applies computational tools to capture and transcribe biological data. Specifically in genomic studies, detection and analysis of co-occurring mutations is an leading area of study. Concurrently, in the recent years, computer science and information technology have seen an increased interest in the area association analysis and co-occurrence computation. The traditional method of finding top similar entities involves examining every possible pair of entities, which leads to a prohibitive quadratic time complexity. Most of the existing approaches also require a similarity measure and threshold beforehand to retrieve the top similar entities. These parameters are not always easy to tune. Heuristically, an adaptive method can have wider applications for identifying the top most similar pair of mutations (or entities in general). In this thesis, we have presented an algorithm to efficiently identify top k similar pair of mutations using co-occurrence as the similarity measure. Our approach used an upperbound condition to iteratively prune the search space and tackled the quadratic complexity. The empirical evaluations show that the proposed approach shows the computational efficiency in terms of execution time and accuracy of our approach particularly in large size datasets. In addition, we also evaluate the impact of various parameters like input size, k on the execution time in top k approaches. This study concludes that systematic pruning of the search space using an adaptive threshold condition optimizes the process of identifying top similar pair of entities

    Detection and management of redundancy for information retrieval

    Get PDF
    The growth of the web, authoring software, and electronic publishing has led to the emergence of a new type of document collection that is decentralised, amorphous, dynamic, and anarchic. In such collections, redundancy is a significant issue. Documents can spread and propagate across such collections without any control or moderation. Redundancy can interfere with the information retrieval process, leading to decreased user amenity in accessing information from these collections, and thus must be effectively managed. The precise definition of redundancy varies with the application. We restrict ourselves to documents that are co-derivative: those that share a common heritage, and hence contain passages of common text. We explore document fingerprinting, a well-known technique for the detection of co-derivative document pairs. Our new lossless fingerprinting algorithm improves the effectiveness of a range of document fingerprinting approaches. We empirically show that our algorithm can be highly effective at discovering co-derivative document pairs in large collections. We study the occurrence and management of redundancy in a range of application domains. On the web, we find that document fingerprinting is able to identify widespread redundancy, and that this redundancy has a significant detrimental effect on the quality of search results. Based on user studies, we suggest that redundancy is most appropriately managed as a postprocessing step on the ranked list and explain how and why this should be done. In the genomic area of sequence homology search, we explain why the existing techniques for redundancy discovery are increasingly inefficient, and present a critique of the current approaches to redundancy management. We show how document fingerprinting with a modified version of our algorithm provides significant efficiency improvements, and propose a new approach to redundancy management based on wildcards. We demonstrate that our scheme provides the benefits of existing techniques but does not have their deficiencies. Redundancy in distributed information retrieval systems - where different parts of the collection are searched by autonomous servers - cannot be effectively managed using traditional fingerprinting techniques. We thus propose a new data structure, the grainy hash vector, for redundancy detection and management in this environment. We show in preliminary tests that the grainy hash vector is able to accurately detect a good proportion of redundant document pairs while maintaining low resource usage

    ZEBRAFISH AS AN INNOVATIVE MODEL TO SCREEN THE BEHAVIOURAL EFFECTS OF NOVEL DRUGS

    Get PDF
    Zebrafish (Danio rerio) is an emerging animal model alternative to rodents for studying human diseases. Its typical shoaling behaviour (tight aggregation of individuals) consisting of forming a tight group in which fish swim together, may represent an excellent model to study social behaviour. Zebrafish appear to be a good model to study learning and memory, too. The neuropeptides oxytocin (OT) and arginine vasopressin (AVP) are two of the most-studied brain signaling molecules encoding information relevant to social behaviour. Isotocin (ISO) and vasotocin (AVT) are the equivalent neurohypophiseal hormones in fish, regulating reproductive and social behaviour. On this basis, we studied the effect of both OT and AVP in comparison with ISO and AVT, on shoaling, fear response to predator and learning and memory. Social behaviour was studied using mutant zebrafish Nacre. Since these peptides are known to affect anxiety in humans and rodents, the same compounds were also tested on fear response to predator, using Astronotus Ocellatus as stimulus fish. OT (2-40 ng/kg), ISO (0.1-10 ng/kg), AVP (0.5-40 ng/kg) and AVT (0.001-20 ng/kg) were given i.m. 10 min before each test. AVT/AVP were more potent to elicit anxiolytic than social effect while ISO and OT were equally potent. To investigate the mechanism of action, different antagonists were given 10 min before each peptide: the OT receptor antagonist Desgly (0.00001-1 ng/kg), the V1a receptor subtype AVP antagonist SR 49059 (0.00001-20 ng/kg) and the V1b receptor subtype antagonist SSR 149415 (0.00001-1 ng/kg). In both tests, treatment with all the peptides increased social preference and decreased fear response in a dose-dependent manner interpolated by symmetrical parabolas. Pre-treatment with SR 49059, SSR 149415 and Desgly dose-dependently blocked the pro-social and anxiolytic effect induced by each peptide. The less selective antagonist appeared to be SSR 149415. All the neuropeptides did not induce any change in swimming activity. Neuronal nicotinic acetylcholine receptors (nAChRs) play a modulatory role in cognition and zebrafish provide a preclinical model to study these cognitive processes. On the other hand, nicotinic receptor has been characterized in this teleost fish. Using a T-maze task, we investigated the effect of cholinergic drugs on spatial memory in zebrafish. Nicotine (0.0002-0.2 mg/kg), given i.p. 20 min before the test, improved the mean running time difference, showing an inverted U dose-response function. Selective and non selective nAChR antagonists, injected i.p. 10 min before nicotine, were used to study the receptor subunits, involved in spatial memory. Nicotine-induced cognitive enhancement was reduced by the selective nAChR subtype antagonists, MLA (0.01 mg/kg) for \u3b17 subunit, MII (0.1 mg/kg) for \u3b16\u3b22 subunit, Dh\u3b2E (0.01 mg/kg) for the \u3b14\u3b22 subunit, the non selective antagonist mecamylamine (0.1 mg/kg) and the muscarinic antagonist scopolamine (0.025 mg/kg), with Dh\u3b2E being more active than MLA or MII. No change in swimming activity was observed for all the nicotinic drugs. Another important cognitive process is the selective attention. It can be assessed in rodents with the novel object recognition (NOR) test. In the standard version of this test, the selection of objects to be used is critical. To overcome the limitation of NOR, we created a modified version of NOR, the virtual object recognition test (VORT) in mice where 3D objects were replaced with stationary geometrical 2D shapes and presented on two Ipods 3.5-inch widescreen displays. A comparable discrimination index as NOR was shown in VORT. 2D shapes that could be highly discriminated and some which could not, were identified. Mice were able to distinguish among different movements (horizontal, vertical or oblique). In fact, the shapes previously found not distinguishable when stationary were better discriminated when moving. Secondly, we focused our attention on zebrafish, which have a good capability to learn and a better visual acuity. Based on this abilities, we investigated in VORT if zebrafish, like mice, were able to discriminate different geometrical 2D shapes (circle, square or triangle), when presented on Ipod-screens, placed at the sides of a water tank. To evaluate the possibility that moving 2D shapes increased the attention of zebrafish, specific movements were applied to the same geometrical shapes. We found that zebrafish, like mice, were able to discriminate different geometrical 2D shapes both stationary and with different movements. In particular, the discrimination index of shapes, previously not discriminate, increased when they were moving. Finally, we investigated if memory performance could be improved by treatment with nicotine both in mice (0.1 mg/kg) and in zebrafish (0.02 mg/kg) or worsened by scopolamine (0.25 mg/kg for mice and 0.025 mg/kg for zebrafish) or by mecamylamine (1 mg/kg). Nicotine improved discrimination index for stationary shapes previously not discriminated while anticholinergic drugs impaired episodic memory in both species. Taken together, these findings showed the pro-social and anxiolytic properties of OT/AVP system mediated by different receptors and confirmed the important role of cholinergic system in the processes of acquisition and memory consolidation in zebrafish similar to mammals. Moreover, we showed, for the first time, both mice and zebrafish could discriminate not only geometrical shapes but also different movements in VORT, allowing a direct comparison between animal model and human to study attention. Zebrafish opens a new avenue of research to rapidly screen new compounds for the treatment of abnormal social behaviours (including autism or schizophrenia) and neurodegenerative diseases

    Preface

    Get PDF

    Artificial intelligence and its application in architectural design

    Get PDF
    No abstract available.No abstract available

    Spokane Intercollegiate Research Conference 2021

    Get PDF

    16th SC@RUG 2019 proceedings 2018-2019

    Get PDF

    16th SC@RUG 2019 proceedings 2018-2019

    Get PDF
    corecore