3 research outputs found

    Korean-Chinese Person Name Translation for Cross Language Information Retrieval

    Get PDF
    PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 200

    New similarity measures for ligand-based virtual screening

    Get PDF
    The process of drug discovery using virtual screening techniques relies on “molecular similarity principle” which states that structurally similar molecules tend to have similar physicochemical and biological properties in comparison to other dissimilar molecules. Most of the existing virtual screening methods use similarity measures such as the standard Tanimoto coefficient. However, these conventional similarity measures are inadequate, and their results are not satisfactory to researchers. This research investigated new similarity measures. It developed a novel similarity measure and molecules ranking method to retrieve molecules more efficiently. Firstly, a new similarity measure was derived from existing similarity measures, besides focusing on preferred similarity concepts. Secondly, new similarity measures were developed by reweighting some bit-strings, where features present in the compared molecules, and features not present in both compared molecules were given strong consideration. The final approach investigated ranking methods to develop a substitutional ranking method. The study compared the similarity measures and ranking methods with benchmark coefficients such as Tanimoto, Cosine, Dice, and Simple Matching (SM). The approaches were tested using standard data sets such as MDL Drug Data Report (MDDR), Directory of Useful Decoys (DUD) and Maximum Unbiased Validation (MUV). The overall results of this research showed that the new similarity measures and ranking methods outperformed the conventional industry- standard Tanimoto-based similarity search approach. The similarity measures are thus likely to support lead optimization and lead identification process better than methods based on Tanimoto coefficients

    Data Structures for Fast Access Control in ECM Systems

    Get PDF
    While many access control models have been proposed, little work has been done on the efficiency of access control systems. Because the access control sub-system of an Enterprise Content Management (ECM) system may be a bottleneck, we investigate the representation of permissions to improve its efficiency. Observing that there are many browsing-oriented permission request queries, we choose to implement a subject-oriented representation (i.e., maintaining a permission list for each subject). Additionally, we notice that with breadth-first ID numbering we may encounter many contiguous IDs under one object (e.g., folder) . To optimize the efficiency taking into account the above two characteristics, this thesis presents a space-efficient data structure specifically tailored for representing permission lists in ECM systems. Besides the space efficiency, checking, granting or revocation of a permission is very fast using our data structure. It also supports fast union of two or more permission lists (determining the effective permissions inherited from users' groups). In addition, our data structure is scalable to support any increase in the number of objects and subjects. We evaluate our representation by comparing it against the bitmap based representation and a hash table based representation while using random ID numbering and breadth-first numbering, respectively. Our experimental tests on both synthetic and real-world data show that the hash table outperforms our representation for regular permission queries (i.e., querying permissions on a single object each time) as well as browsing-oriented queries with random ID numbering. However, our tests also show that 1) our representation supports faster browsing-oriented queries with breadth-first ID numbering applied while consuming only half the space when compared to the hash table based representation, and 2) our representation is much more space and time efficient than the bitmap based representation for our application
    corecore