159 research outputs found

    HIERARCHICAL LEARNING OF DISCRIMINATIVE FEATURES AND CLASSIFIERS FOR LARGE-SCALE VISUAL RECOGNITION

    Get PDF
    Enabling computers to recognize objects present in images has been a long standing but tremendously challenging problem in the field of computer vision for decades. Beyond the difficulties resulting from huge appearance variations, large-scale visual recognition poses unprecedented challenges when the number of visual categories being considered becomes thousands, and the amount of images increases to millions. This dissertation contributes to addressing a number of the challenging issues in large-scale visual recognition. First, we develop an automatic image-text alignment method to collect massive amounts of labeled images from the Web for training visual concept classifiers. Specif- ically, we first crawl a large number of cross-media Web pages containing Web images and their auxiliary texts, and then segment them into a collection of image-text pairs. We then show that near-duplicate image clustering according to visual similarity can significantly reduce the uncertainty on the relatedness of Web images’ semantics to their auxiliary text terms or phrases. Finally, we empirically demonstrate that ran- dom walk over a newly proposed phrase correlation network can help to achieve more precise image-text alignment by refining the relevance scores between Web images and their auxiliary text terms. Second, we propose a visual tree model to reduce the computational complexity of a large-scale visual recognition system by hierarchically organizing and learning the classifiers for a large number of visual categories in a tree structure. Compared to previous tree models, such as the label tree, our visual tree model does not require training a huge amount of classifiers in advance which is computationally expensive. However, we experimentally show that the proposed visual tree achieves results that are comparable or even better to other tree models in terms of recognition accuracy and efficiency. Third, we present a joint dictionary learning (JDL) algorithm which exploits the inter-category visual correlations to learn more discriminative dictionaries for image content representation. Given a group of visually correlated categories, JDL simul- taneously learns one common dictionary and multiple category-specific dictionaries to explicitly separate the shared visual atoms from the category-specific ones. We accordingly develop three classification schemes to make full use of the dictionaries learned by JDL for visual content representation in the task of image categoriza- tion. Experiments on two image data sets which respectively contain 17 and 1,000 categories demonstrate the effectiveness of the proposed algorithm. In the last part of the dissertation, we develop a novel data-driven algorithm to quantitatively characterize the semantic gaps of different visual concepts for learning complexity estimation and inference model selection. The semantic gaps are estimated directly in the visual feature space since the visual feature space is the common space for concept classifier training and automatic concept detection. We show that the quantitative characterization of the semantic gaps helps to automatically select more effective inference models for classifier training, which further improves the recognition accuracy rates

    Integrating Protein Data Resources through Semantic Web Services

    Get PDF
    Understanding the function of every protein is one major objective of bioinformatics. Currently, a large amount of information (e.g., sequence, structure and dynamics) is being produced by experiments and predictions that are associated with protein function. Integrating these diverse data about protein sequence, structure, dynamics and other protein features allows further exploration and establishment of the relationships between protein sequence, structure, dynamics and function, and thereby controlling the function of target proteins. However, information integration in protein data resources faces challenges at technology level for interfacing heterogeneous data formats and standards and at application level for semantic interpretation of dissimilar data and queries. In this research, a semantic web services infrastructure, called Web Services for Protein data resources (WSP), for flexible and user-oriented integration of protein data resources, is proposed. This infrastructure includes a method for modeling protein web services, a service publication algorithm, an efficient service discovery (matching) algorithm, and an optimal service chaining algorithm. Rather than relying on syntactic matching, the matching algorithm discovers services based on their similarity to the requested service. Therefore, users can locate services that semantically match their data requirements even if they are syntactically distinctive. Furthermore, WSP supports a workflow-based approach for service integration. The chaining algorithm is used to select and chain services, based on the criteria of service accuracy and data interoperability. The algorithm generates a web services workflow which automatically integrates the results from individual services.A number of experiments are conducted to evaluate the performance of the matching algorithm. The results reveal that the algorithm can discover services with reasonable performance. Also, a composite service, which integrates protein dynamics and conservation, is experimented using the WSP infrastructure

    CHR Grammars

    Full text link
    A grammar formalism based upon CHR is proposed analogously to the way Definite Clause Grammars are defined and implemented on top of Prolog. These grammars execute as robust bottom-up parsers with an inherent treatment of ambiguity and a high flexibility to model various linguistic phenomena. The formalism extends previous logic programming based grammars with a form of context-sensitive rules and the possibility to include extra-grammatical hypotheses in both head and body of grammar rules. Among the applications are straightforward implementations of Assumption Grammars and abduction under integrity constraints for language analysis. CHR grammars appear as a powerful tool for specification and implementation of language processors and may be proposed as a new standard for bottom-up grammars in logic programming. To appear in Theory and Practice of Logic Programming (TPLP), 2005Comment: 36 pp. To appear in TPLP, 200

    USING RESTRICTED NATURAL LANGUAGE FOR DATA RETRIEVAL: A PLAN FOR FIELD EVALUATION

    Get PDF
    One strategy that has been proposed for dealing with the growing backlog for development of applications is to give casual users languages for interacting directly with databases. Yet, there is little agreement on the form such languages should take. Should they be natural-like, conforming closely to a user's native tongue or should they be structured to take advantage of the characteristics of formal languages? This paper presents the rationale for and design of a field evaluation of natural language for data retrieval. The natural language system and application are described along with the research design of the project. The results of the first part of the study, a laboratory experiment to investigate whether users perform better with an artificial or natural language, suggest that after equal amounts of training no difference in subject performance is found between languages using a paper and pencil test . The insights gained to date are summarized.Information Systems Working Papers Serie

    Sentiment analysis of clinical narratives: A scoping review

    Get PDF
    A clinical sentiment is a judgment, thought or attitude promoted by an observation with respect to the health of an individual. Sentiment analysis has drawn attention in the healthcare domain for secondary use of data from clinical narratives, with a variety of applications including predicting the likelihood of emerging mental illnesses or clinical outcomes. The current state of research has not yet been summarized. This study presents results from a scoping review aiming at providing an overview of sentiment analysis of clinical narratives in order to summarize existing research and identify open research gaps. The scoping review was carried out in line with the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) guideline. Studies were identified by searching 4 electronic databases (e.g., PubMed, IEEE Xplore) in addition to conducting backward and forward reference list checking of the included studies. We extracted information on use cases, methods and tools applied, used datasets and performance of the sentiment analysis approach. Of 1,200 citations retrieved, 29 unique studies were included in the review covering a period of 8 years. Most studies apply general domain tools (e.g. TextBlob) and sentiment lexicons (e.g. SentiWordNet) for realizing use cases such as prediction of clinical outcomes; others proposed new domain-specific sentiment analysis approaches based on machine learning. Accuracy values between 71.5-88.2% are reported. Data used for evaluation and test are often retrieved from MIMIC databases or i2b2 challenges. Latest developments related to artificial neural networks are not yet fully considered in this domain. We conclude that future research should focus on developing a gold standard sentiment lexicon, adapted to the specific characteristics of clinical narratives. Efforts have to be made to either augment existing or create new high-quality labeled data sets of clinical narratives. Last, the suitability of state-of-the-art machine learning methods for natural language processing and in particular transformer-based models should be investigated for their application for sentiment analysis of clinical narratives

    Traveling of Requirements in the Development of Packaged Software: An Investigation of Work Design and Uncertainty

    Get PDF
    Software requirements, and how they are constructed, shared and translated across software organizations, express uncertainties that software developers need to address through appropriate structuring of the process and the organization at large. To gain new insights into this important phenomenon, we rely on theory of work design and the travelling metaphor to undertake an in-depth qualitative inquiry into recurrent development of packaged software for the utility industry. Using the particular context of software provider GridCo, we examine how requirements are constructed, shared, and translated as they travel across vertical and horizontal boundaries. In revealing insights into these practices, we contribute to theory by conceptualizing how requirements travel, not just locally, but across organizations and time, thereby uncovering new knowledge about the responses to requirement uncertainty in development of packaged software. We also contribute to theory by providing narrative accounts of in situ requirements processes and by revealing practical consequences of organization structure on managing uncertainty
    • …
    corecore