786 research outputs found
Recent Advances in Social Data and Artificial Intelligence 2019
The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace
Two-Level Text Classification Using Hybrid Machine Learning Techniques
Nowadays, documents are increasingly being associated with multi-level
category hierarchies rather than a flat category scheme. To access these
documents in real time, we need fast automatic methods to navigate these
hierarchies. Today’s vast data repositories such as the web also contain many
broad domains of data which are quite distinct from each other e.g. medicine,
education, sports and politics. Each domain constitutes a subspace of the data
within which the documents are similar to each other but quite distinct from the
documents in another subspace. The data within these domains is frequently
further divided into many subcategories.
Subspace Learning is a technique popular with non-text domains such as
image recognition to increase speed and accuracy. Subspace analysis lends
itself naturally to the idea of hybrid classifiers. Each subspace can be
processed by a classifier best suited to the characteristics of that particular
subspace. Instead of using the complete set of full space feature dimensions,
classifier performances can be boosted by using only a subset of the
dimensions.
This thesis presents a novel hybrid parallel architecture using separate
classifiers trained on separate subspaces to improve two-level text
classification. The classifier to be used on a particular input and the relevant
feature subset to be extracted is determined dynamically by using a novel
method based on the maximum significance value. A novel vector
representation which enhances the distinction between classes within the
subspace is also developed. This novel system, the Hybrid Parallel Classifier,
was compared against the baselines of several single classifiers such as the
Multilayer Perceptron and was found to be faster and have higher two-level
classification accuracies. The improvement in performance achieved was even
higher when dealing with more complex category hierarchies
Robust Machine Learning by Transforming and Augmenting Imperfect Training Data
Machine Learning (ML) is an expressive framework for turning data into
computer programs. Across many problem domains -- both in industry and policy
settings -- the types of computer programs needed for accurate prediction or
optimal control are difficult to write by hand. On the other hand, collecting
instances of desired system behavior may be relatively more feasible. This
makes ML broadly appealing, but also induces data sensitivities that often
manifest as unexpected failure modes during deployment. In this sense, the
training data available tend to be imperfect for the task at hand. This thesis
explores several data sensitivities of modern machine learning and how to
address them. We begin by discussing how to prevent ML from codifying prior
human discrimination measured in the training data, where we take a fair
representation learning approach. We then discuss the problem of learning from
data containing spurious features, which provide predictive fidelity during
training but are unreliable upon deployment. Here we observe that insofar as
standard training methods tend to learn such features, this propensity can be
leveraged to search for partitions of training data that expose this
inconsistency, ultimately promoting learning algorithms invariant to spurious
features. Finally, we turn our attention to reinforcement learning from data
with insufficient coverage over all possible states and actions. To address the
coverage issue, we discuss how causal priors can be used to model the
single-step dynamics of the setting where data are collected. This enables a
new type of data augmentation where observed trajectories are stitched together
to produce new but plausible counterfactual trajectories.Comment: A thesis submitted in conformity with the requirements for the degree
of Doctor of Philosophy, Department of Computer Science, University of
Toront
Expert Finding in Disparate Environments
Providing knowledge workers with access to experts and communities-of-practice is central to expertise sharing, and crucial to effective organizational performance, adaptation, and even survival. However, in complex work environments, it is difficult to know who knows what across heterogeneous groups, disparate locations, and asynchronous work. As such, where expert finding has traditionally been a manual operation there is increasing interest in policy and technical infrastructure that makes work visible and supports automated tools for locating expertise.
Expert finding, is a multidisciplinary problem that cross-cuts knowledge management, organizational analysis, and information retrieval. Recently, a number of expert finders have emerged; however, many tools are limited in that they are extensions of traditional information retrieval systems and exploit artifact information primarily. This thesis explores a new class of expert finders that use organizational context as a basis for assessing expertise and for conferring trust in the system. The hypothesis here is that expertise can be inferred through assessments of work behavior and work derivatives (e.g., artifacts).
The Expert Locator, developed within a live organizational environment, is a model-based prototype that exploits organizational work context. The system associates expertise ratings with expert’s signaling behavior and is extensible so that signaling behavior from multiple activity space contexts can be fused into aggregate retrieval scores. Post-retrieval analysis supports evidence review and personal network browsing, aiding users in both detection and selection. During operational evaluation, the prototype generated high-precision searches across a range of topics, and was sensitive to organizational role; ranking true experts (i.e., authorities) higher than brokers providing referrals. Precision increased with the number of activity spaces used in the model, but varied across queries. The highest performing queries are characterized by high specificity terms, and low organizational diffusion amongst retrieved experts; essentially, the highest rated experts are situated within organizational niches
30th International Conference on Information Modelling and Knowledge Bases
Information modelling is becoming more and more important topic for researchers, designers, and users of information systems. The amount and complexity of information itself, the number of abstraction levels of information, and the size of databases and knowledge bases are continuously growing. Conceptual modelling is one of the sub-areas of information modelling. The aim of this conference is to bring together experts from different areas of computer science and other disciplines, who have a common interest in understanding and solving problems on information modelling and knowledge bases, as well as applying the results of research to practice. We also aim to recognize and study new areas on modelling and knowledge bases to which more attention should be paid. Therefore philosophy and logic, cognitive science, knowledge management, linguistics and management science are relevant areas, too. In the conference, there will be three categories of presentations, i.e. full papers, short papers and position papers
Advances in knowledge discovery and data mining Part II
19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
- …