Search CORE

11,283 research outputs found

Aerospace Medicine and Biology: A continuing bibliography, supplement 191

Author
Publication venue
Publication date
Field of study

A bibliographical list of 182 reports, articles, and other documents introduced into the NASA scientific and technical information system in February 1979 is presented

NASA Technical Reports Server

Candidate terms for a thesaurus : a case study of sources of terms in the field of Library and Information Science

Author: Chandran D.
Publication venue: DRTC
Publication date: 01/01/1975
Field of study

The choice of candidate terms from different sources of information such as dictionary, encyclopaedia, textbook, indexing and abstracting periodicals, classification schemes, are discussed. The availability of such sources in the field of library and information science and their helpfulness in the choice of candidate terms and in fixing the interrelationship between them, have been discussed. It is observed that the reference sources such as dictionary and encyclopaedia, textbooks, and classifica- tion schemes provide terms which are stabilised in the field, whereas the indexing and abstracting services provide terms of recent origin and current usage. Thus a thesaurus for information retrieval should judiciously choose candidate terms from a variety of sources

Librarians' Digital Library

Human-in-the-Loop Learning From Crowdsourcing and Social Media

Author: Liu Tong
Publication venue: RIT Scholar Works
Publication date: 01/06/2020
Field of study

Computational social studies using public social media data have become more and more popular because of the large amount of user-generated data available. The richness of social media data, coupled with noise and subjectivity, raise significant challenges for computationally studying social issues in a feasible and scalable manner. Machine learning problems are, as a result, often subjective or ambiguous when humans are involved. That is, humans solving the same problems might come to legitimate but completely different conclusions, based on their personal experiences and beliefs. When building supervised learning models, particularly when using crowdsourced training data, multiple annotations per data item are usually reduced to a single label representing ground truth. This inevitably hides a rich source of diversity and subjectivity of opinions about the labels. Label distribution learning associates for each data item a probability distribution over the labels for that item, thus it can preserve diversities of opinions, beliefs, etc. that conventional learning hides or ignores. We propose a humans-in-the-loop learning framework to model and study large volumes of unlabeled subjective social media data with less human effort. We study various annotation tasks given to crowdsourced annotators and methods for aggregating their contributions in a manner that preserves subjectivity and disagreement. We introduce a strategy for learning label distributions with only five-to-ten labels per item by aggregating human-annotated labels over multiple, semantically related data items. We conduct experiments using our learning framework on data related to two subjective social issues (work and employment, and suicide prevention) that touch many people worldwide. Our methods can be applied to a broad variety of problems, particularly social problems. Our experimental results suggest that specific label aggregation methods can help provide reliable representative semantics at the population level

RIT Scholar Works

Digital ecosystems

Author: Briscoe Gerard
Briscoe Gerard
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/07/2009
Field of study

We view Digital Ecosystems to be the digital counterparts of biological ecosystems, which are considered to be robust, self-organising and scalable architectures that can automatically solve complex, dynamic problems. So, this work is concerned with the creation, investigation, and optimisation of Digital Ecosystems, exploiting the self-organising properties of biological ecosystems. First, we created the Digital Ecosystem, a novel optimisation technique inspired by biological ecosystems, where the optimisation works at two levels: a first optimisation, migration of agents which are distributed in a decentralised peer-to-peer network, operating continuously in time; this process feeds a second optimisation based on evolutionary computing that operates locally on single peers and is aimed at finding solutions to satisfy locally relevant constraints. We then investigated its self-organising aspects, starting with an extension to the definition of Physical Complexity to include the evolving agent populations of our Digital Ecosystem. Next, we established stability of evolving agent populations over time, by extending the Chli-DeWilde definition of agent stability to include evolutionary dynamics. Further, we evaluated the diversity of the software agents within evolving agent populations, relative to the environment provided by the user base. To conclude, we considered alternative augmentations to optimise and accelerate our Digital Ecosystem, by studying the accelerating effect of a clustering catalyst on the evolutionary dynamics of our Digital Ecosystem, through the direct acceleration of the evolutionary processes. We also studied the optimising effect of targeted migration on the ecological dynamics of our Digital Ecosystem, through the indirect and emergent optimisation of the agent migration patterns. Overall, we have advanced the understanding of creating Digital Ecosystems, the self-organisation that occurs within them, and the optimisation of their Ecosystem-Oriented Architecture

Spiral - Imperial College Digital Repository

SENTIMENT STRENGTH AND TOPIC RECOGNITION IN SENTIMENT ANALYSIS

Author: Adeborna Esi A.R.
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2023
Field of study

Current sentiment analysis methods focus on determining the sentiment polarities (negative, neutral or positive) in users’ sentiments. However, in order to correctly classify users’ sentiments into their right polarities, the strengths of these sentiments must be considered. In addition to classifying users’ sentiments into their correct polarities, it is important to determine the sources and topics under which users’ sentiments fall. Sentiment strength helps as to understand the levels of customer satisfaction toward products and services. Sentiment topics on the other hand, helps to determine the specific product/service areas associated with user sentiments. This paper proposes two sentiment analysis approaches. First an approach which determines the sentiment strength expressed by consumers in terms of a scale (highly positive, +5 to highly negative, -5) is proposed. The approach includes a novel algorithm to compute the strength of sentiment polarity for each text by including the weights of the words used in the texts. Second, a sentiment mining approach which detects sentiment topic from text is proposed. The approach includes a sentiment topic recognition model that is based on Correlated Topics Models (CTM) with Variational Expectation-Maximization (VEM) algorithm. Finally, the effectiveness and efficiency of these models is validated using airline data from Twitter and customer review dataset from amazon.com --Abstract, p. ii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

A survey of visual preprocessing and shape representation techniques

Author: Olshausen Bruno A.
Publication venue
Publication date
Field of study

Many recent theories and methods proposed for visual preprocessing and shape representation are summarized. The survey brings together research from the fields of biology, psychology, computer science, electrical engineering, and most recently, neural networks. It was motivated by the need to preprocess images for a sparse distributed memory (SDM), but the techniques presented may also prove useful for applying other associative memories to visual pattern recognition. The material of this survey is divided into three sections: an overview of biological visual processing; methods of preprocessing (extracting parts of shape, texture, motion, and depth); and shape representation and recognition (form invariance, primitives and structural descriptions, and theories of attention)

NASA Technical Reports Server

Acquiring and Maintaining Knowledge by Natural Multimodal Dialog

Author: Holzapfel Hartwig
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

KITopen

Data mining in soft computing framework: a survey

Author: Mitra P.
Mitra S.
Pal S. K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included

Article Segmentation in Digitised Newspapers

Author: Naoum Andrew
Publication venue: Faculty of Engineering and Information Technologies, School of Computer Science
Publication date: 01/01/2020
Field of study

Digitisation projects preserve and make available vast quantities of historical text. Among these, newspapers are an invaluable resource for the study of human culture and history. Article segmentation identifies each region in a digitised newspaper page that contains an article. Digital humanities, information retrieval (IR), and natural language processing (NLP) applications over digitised archives improve access to text and allow automatic information extraction. The lack of article segmentation impedes these applications. We contribute a thorough review of the existing approaches to article segmentation. Our analysis reveals divergent interpretations of the task, and inconsistent and often ambiguously defined evaluation metrics, making comparisons between systems challenging. We solve these issues by contributing a detailed task definition that examines the nuances and intricacies of article segmentation that are not immediately apparent. We provide practical guidelines on handling borderline cases and devise a new evaluation framework that allows insightful comparison of existing and future approaches. Our review also reveals that the lack of large datasets hinders meaningful evaluation and limits machine learning approaches. We solve these problems by contributing a distant supervision method for generating large datasets for article segmentation. We manually annotate a portion of our dataset and show that our method produces article segmentations over characters nearly as well as costly human annotators. We reimplement the seminal textual approach to article segmentation (Aiello and Pegoretti, 2006) and show that it does not generalise well when evaluated on a large dataset. We contribute a framework for textual article segmentation that divides the task into two distinct phases: block representation and clustering. We propose several techniques for block representation and contribute a novel highly-compressed semantic representation called similarity embeddings. We evaluate and compare different clustering techniques, and innovatively apply label propagation (Zhu and Ghahramani, 2002) to spread headline labels to similar blocks. Our similarity embeddings and label propagation approach substantially outperforms Aiello and Pegoretti but still falls short of human performance. Exploring visual approaches to article segmentation, we reimplement and analyse the state-of-the-art Bansal et al. (2014) approach. We contribute an innovative 2D Markov model approach that captures reading order dependencies and reduces the structured labelling problem to a Markov chain that we decode with Viterbi (1967). Our approach substantially outperforms Bansal et al., achieves accuracy as good as human annotators, and establishes a new state of the art in article segmentation. Our task definition, evaluation framework, and distant supervision dataset will encourage progress in the task of article segmentation. Our state-of-the-art textual and visual approaches will allow sophisticated IR and NLP applications over digitised newspaper archives, supporting research in the digital humanities

Sydney eScholarship

Human-Centered Content-Based Image Retrieval

Author: Broek Egon L. van den
Publication venue: Nijmegen Institute for Cognition and Information (NICI), Radboud University Nijmegen, Nijmegen, The Netherlands
Publication date: 01/01/2005
Field of study

Retrieval of images that lack a (suitable) annotations cannot be achieved through (traditional) Information Retrieval (IR) techniques. Access through such collections can be achieved through the application of computer vision techniques on the IR problem, which is baptized Content-Based Image Retrieval (CBIR). In contrast with most purely technological approaches, the thesis Human-Centered Content-Based Image Retrieval approaches the problem from a human/user centered perspective. Psychophysical experiments were conducted in which people were asked to categorize colors. The data gathered from these experiments was fed to a Fast Exact Euclidean Distance (FEED) transform (Schouten & Van den Broek, 2004), which enabled the segmentation of color space based on human perception (Van den Broek et al., 2008). This unique color space segementation was exploited for texture analysis and image segmentation, and subsequently for full-featured CBIR. In addition, a unique CBIR-benchmark was developed (Van den Broek et al., 2004, 2005). This benchmark was used to explore what and how several parameters (e.g., color and distance measures) of the CBIR process influence retrieval results. In contrast with other research, users judgements were assigned as metric. The online IR and CBIR system Multimedia for Art Retrieval (M4ART) (URL: http://www.m4art.org) has been (partly) founded on the techniques discussed in this thesis. References: - Broek, E.L. van den, Kisters, P.M.F., and Vuurpijl, L.G. (2004). The utilization of human color categorization for content-based image retrieval. Proceedings of SPIE (Human Vision and Electronic Imaging), 5292, 351-362. [see also Chapter 7] - Broek, E.L. van den, Kisters, P.M.F., and Vuurpijl, L.G. (2005). Content-Based Image Retrieval Benchmarking: Utilizing Color Categories and Color Distributions. Journal of Imaging Science and Technology, 49(3), 293-301. [see also Chapter 8] - Broek, E.L. van den, Schouten, Th.E., and Kisters, P.M.F. (2008). Modeling Human Color Categorization. Pattern Recognition Letters, 29(8), 1136-1144. [see also Chapter 5] - Schouten, Th.E. and Broek, E.L. van den (2004). Fast Exact Euclidean Distance (FEED) transformation. In J. Kittler, M. Petrou, and M. Nixon (Eds.), Proceedings of the 17th IEEE International Conference on Pattern Recognition (ICPR 2004), Vol 3, p. 594-597. August 23-26, Cambridge - United Kingdom. [see also Appendix C

VU Research Portal

Radboud Repository

University of Twente Research Information