4,999 research outputs found

    The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists

    Get PDF
    The DAVID gene functional classification tool uses a novel fuzzy clustering algorithm to condense a list of genes or associated biological terms into organized classes of related genes or biology, called biological modules

    Metadata Architecture for Digital Libraries: Conceptual framework for Indian Digital Libraries

    Get PDF
    This paper describes approach of development of Metadata solution for digital library architecture for resource description and retrieval. This deals with the concept of Metadata [2], the different Metadata standards (Dublin core in particular [5]), Digital library environment, computer network capabilities etc. This paper also discusses two of the Digital Library architecture protocols, for resource description and retrieval. They are STARTS (Stanford Protocol Proposal for Internet Retrieval and Search) [8] and SODA (Smart Objects and Dump Archives)[13] architecture to arrive at a possible protocol that would help to build Indian Digital Libraries [5]. While proposing the new architecture the existing Indian environment with respect to information sources and user's query of the information sources [5.1], which are feasible for launch of this protocol for information processing and retrieval has been dealt with. This is a pilot study which the author has done while doing his Fulbright fellowship in the College of Library Information Studies, University of Maryland, College Park, MD during 1999-2000

    Next Generation of Product Search and Discovery

    Get PDF
    Online shopping has become an important part of people’s daily life with the rapid development of e-commerce. In some domains such as books, electronics, and CD/DVDs, online shopping has surpassed or even replaced the traditional shopping method. Compared with traditional retailing, e-commerce is information intensive. One of the key factors to succeed in e-business is how to facilitate the consumers’ approaches to discover a product. Conventionally a product search engine based on a keyword search or category browser is provided to help users find the product information they need. The general goal of a product search system is to enable users to quickly locate information of interest and to minimize users’ efforts in search and navigation. In this process human factors play a significant role. Finding product information could be a tricky task and may require an intelligent use of search engines, and a non-trivial navigation of multilayer categories. Searching for useful product information can be frustrating for many users, especially those inexperienced users. This dissertation focuses on developing a new visual product search system that effectively extracts the properties of unstructured products, and presents the possible items of attraction to users so that the users can quickly locate the ones they would be most likely interested in. We designed and developed a feature extraction algorithm that retains product color and local pattern features, and the experimental evaluation on the benchmark dataset demonstrated that it is robust against common geometric and photometric visual distortions. Besides, instead of ignoring product text information, we investigated and developed a ranking model learned via a unified probabilistic hypergraph that is capable of capturing correlations among product visual content and textual content. Moreover, we proposed and designed a fuzzy hierarchical co-clustering algorithm for the collaborative filtering product recommendation. Via this method, users can be automatically grouped into different interest communities based on their behaviors. Then, a customized recommendation can be performed according to these implicitly detected relations. In summary, the developed search system performs much better in a visual unstructured product search when compared with state-of-art approaches. With the comprehensive ranking scheme and the collaborative filtering recommendation module, the user’s overhead in locating the information of value is reduced, and the user’s experience of seeking for useful product information is optimized

    Constructing 3D faces from natural language interface

    Get PDF
    This thesis presents a system by which 3D images of human faces can be constructed using a natural language interface. The driving force behind the project was the need to create a system whereby a machine could produce artistic images from verbal or composed descriptions. This research is the first to look at constructing and modifying facial image artwork using a natural language interface. Specialised modules have been developed to control geometry of 3D polygonal head models in a commercial modeller from natural language descriptions. These modules were produced from research on human physiognomy, 3D modelling techniques and tools, facial modelling and natural language processing. [Continues.

    Style Classification of Rabbinic Literature for Detection of Lost Midrash Tanhuma Material

    Full text link
    Midrash collections are complex rabbinic works that consist of text in multiple languages, which evolved through long processes of unstable oral and written transmission. Determining the origin of a given passage in such a compilation is not always straightforward and is often a matter of dispute among scholars, yet it is essential for scholars' understanding of the passage and its relationship to other texts in the rabbinic corpus. To help solve this problem, we propose a system for classification of rabbinic literature based on its style, leveraging recent advances in natural language processing for Hebrew texts. Additionally, we demonstrate how this method can be applied to uncover lost material from a specific midrash genre, Tan\d{h}uma-Yelammedenu, that has been preserved in later anthologies

    Text Segmentation in Web Images Using Colour Perception and Topological Features

    Get PDF
    The research presented in this thesis addresses the problem of Text Segmentation in Web images. Text is routinely created in image form (headers, banners etc.) on Web pages, as an attempt to overcome the stylistic limitations of HTML. This text however, has a potentially high semantic value in terms of indexing and searching for the corresponding Web pages. As current search engine technology does not allow for text extraction and recognition in images, the text in image form is ignored. Moreover, it is desirable to obtain a uniform representation of all visible text of a Web page (for applications such as voice browsing or automated content analysis). This thesis presents two methods for text segmentation in Web images using colour perception and topological features. The nature of Web images and the implicit problems to text segmentation are described, and a study is performed to assess the magnitude of the problem and establish the need for automated text segmentation methods. Two segmentation methods are subsequently presented: the Split-and-Merge segmentation method and the Fuzzy segmentation method. Although approached in a distinctly different way in each method, the safe assumption that a human being should be able to read the text in any given Web Image is the foundation of both methods’ reasoning. This anthropocentric character of the methods along with the use of topological features of connected components, comprise the underlying working principles of the methods. An approach for classifying the connected components resulting from the segmentation methods as either characters or parts of the background is also presented

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Information Management and Improvement of Citation Indices

    Get PDF
    Bibliometrics and citation analysis have become an important set of methods for library and information science, as well as an exceptional source of information and knowledge for many other areas. Their main sources are citation indices, which are bibliographic databases like Web of Science, Scopus, Google Scholar, etc. However, bibliographical databases lack perfection and standardization. There are several software tools that perform useful information management and bibliometric analysis importing data from them. A comparison has been carried out to identify which of them perform certain pre-processing tasks. Usually, they are not strong enough to detect all the duplications, mistakes, misspellings and variant names, leaving to the user the tedious and time-consuming task of correcting the data. Furthermore, some of them do not import datasets from different citation indices, but mainly from Web of Science (WoS). A new software tool, called STICCI.eu (Software Tool for Improving and Converting Citation Indices - enhancing uniformity), which is freely available online, has been created to solve these problems. STICCI.eu is able to do conversions between bibliographical citation formats (WoS, Scopus, CSV, BibTex, RIS), correct the usual mistakes appearing in those databases, detect duplications, misspellings, etc., identify and transform the full or abbreviated titles of the journals, homogenize toponymical names of countries and relevant cities or regions and list the processed data in terms of the most cited authors, journals, references, etc

    Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment

    Get PDF
    Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. We propose a framework for retrieval of mathematical notation using symbol pairs extracted from visual and semantic representations of mathematical expressions on the symbolic domain for retrieval of text documents. We further adapt our model for retrieval of mathematical notation on images and lecture videos. Graph-based representations are used on each modality to describe math formulas. For symbolic formula retrieval, where the structure is known, we use symbol layout trees and operator trees. For image-based formula retrieval, since the structure is unknown we use a more general Line of Sight graph representation. Paths of these graphs define symbol pairs tuples that are used as the entries for our inverted index of mathematical notation. Our retrieval framework uses a three-stage approach with a fast selection of candidates as the first layer, a more detailed matching algorithm with similarity metric computation in the second stage, and finally when relevance assessments are available, we use an optional third layer with linear regression for estimation of relevance using multiple similarity scores for final re-ranking. Our model has been evaluated using large collections of documents, and preliminary results are presented for videos and cross-modal search. The proposed framework can be adapted for other domains like chemistry or technical diagrams where two visually similar elements from a collection are usually related to each other
    • …
    corecore