251 research outputs found
Fuzzy Interval-Valued Multi Criteria Based Decision Making for Ranking Features in Multi-Modal 3D Face Recognition
Soodamani Ramalingam, 'Fuzzy interval-valued multi criteria based decision making for ranking features in multi-modal 3D face recognition', Fuzzy Sets and Systems, In Press version available online 13 June 2017. This is an Open Access paper, made available under the Creative Commons license CC BY 4.0 https://creativecommons.org/licenses/by/4.0/This paper describes an application of multi-criteria decision making (MCDM) for multi-modal fusion of features in a 3D face recognition system. A decision making process is outlined that is based on the performance of multi-modal features in a face recognition task involving a set of 3D face databases. In particular, the fuzzy interval valued MCDM technique called TOPSIS is applied for ranking and deciding on the best choice of multi-modal features at the decision stage. It provides a formal mechanism of benchmarking their performances against a set of criteria. The technique demonstrates its ability in scaling up the multi-modal features.Peer reviewedProo
Robust Modeling of Epistemic Mental States
This work identifies and advances some research challenges in the analysis of
facial features and their temporal dynamics with epistemic mental states in
dyadic conversations. Epistemic states are: Agreement, Concentration,
Thoughtful, Certain, and Interest. In this paper, we perform a number of
statistical analyses and simulations to identify the relationship between
facial features and epistemic states. Non-linear relations are found to be more
prevalent, while temporal features derived from original facial features have
demonstrated a strong correlation with intensity changes. Then, we propose a
novel prediction framework that takes facial features and their nonlinear
relation scores as input and predict different epistemic states in videos. The
prediction of epistemic states is boosted when the classification of emotion
changing regions such as rising, falling, or steady-state are incorporated with
the temporal features. The proposed predictive models can predict the epistemic
states with significantly improved accuracy: correlation coefficient (CoERR)
for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for
Certain 0.854, and for Interest 0.913.Comment: Accepted for Publication in Multimedia Tools and Application, Special
Issue: Socio-Affective Technologie
PROCEEDING THE 2nd INTERNATIONAL SEMINAR ON LINGUISTICS (ISOL-2): Language and Civilization
ISOL is a biennial international seminar held by the Linguistics Graduate Program of Faculty of Humanity, Andalas University in collaboration with the Linguistic Society of Indonesia (MLI), Unand Chapter. ISOL aims to provide a discussion platform for linguists and language observers across Indonesia. Its main objective is to enhance the exchange of research and new approaches in language studies. The seminar is open to interested people from outside of Indonesia. The theme of the 2nd ISOL is Language and Civilization.
Civilization is the process by which a society or place reaches an advanced stage of social development and organization. It is also defined as the society, culture, and way of life of a particular area. Over time, the word civilization has come to imply something beyond the organization. It refers to a particular shared way of thinking about the world as well as a reflection on that world in art, literature, drama and a host of other cultural happenings. Language is itself a social construct – a component of social reality. Thus, like all social constructs and conventions, it can be changed.
A civilization is any complex state society which is characterized by urban development, social stratification, symbolic communication forms and a perceived separation from and domination over the natural environment. To advance civilization is to construct a new social reality which emerges through language. In other words, social reality is the operational expression of words and the meanings of them that society has agreed upon. Language is itself a social construct – a component of social reality. Thus, like all social constructs and conventions, it can be changed
Conversational artificial intelligence - demystifying statistical vs linguistic NLP solutions
yesThis paper aims to demystify the hype and attention on chatbots and its association with conversational artificial intelligence. Both are slowly emerging as a real presence in our lives from the impressive technological developments in machine learning, deep learning and natural language understanding solutions. However, what is under the hood, and how far and to what extent can chatbots/conversational artificial intelligence solutions work – is our question. Natural language is the most easily understood knowledge representation for people, but certainly not the best for computers because of its inherent ambiguous, complex and dynamic nature. We will critique the knowledge representation of heavy statistical chatbot solutions against linguistics alternatives. In order to react intelligently to the user, natural language solutions must critically consider other factors such as context, memory, intelligent understanding, previous experience, and personalized knowledge of the user. We will delve into the spectrum of conversational interfaces and focus on a strong artificial intelligence concept. This is explored via a text based conversational software agents with a deep strategic role to hold a conversation and enable the mechanisms need to plan, and to decide what to do next, and manage the dialogue to achieve a goal. To demonstrate this, a deep linguistically aware and knowledge aware text based conversational agent (LING-CSA) presents a proof-of-concept of a non-statistical conversational AI solution
동적 멀티모달 데이터 학습을 위한 심층 하이퍼네트워크
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 장병탁.Recent advancements in information communication technology has led the explosive increase of data. Dissimilar to traditional data which are structured and unimodal, in particular, the characteristics of recent data generated from dynamic environments are summarized as
high-dimensionality, multimodality, and structurelessness as well as huge-scale size. The learning from non-stationary multimodal data is essential for solving many difficult problems in artificial intelligence. However, despite many successful reports, existing machine learning methods have mainly focused on solving practical
problems represented by large-scaled but static databases, such as image classification, tagging, and retrieval.
Hypernetworks are a probabilistic graphical model representing empirical distribution, using a hypergraph structure that is a large collection of many hyperedges encoding the associations among variables. This representation allows the model to be suitable for characterizing the complex relationships between features with a population of building blocks. However, since a hypernetwork is represented by a huge combinatorial feature space, the model requires a large number of hyperedges for handling the multimodal large-scale data and thus faces the scalability problem.
In this dissertation, we propose a deep architecture of
hypernetworks for dealing with the scalability issue for learning from multimodal data with non-stationary properties such as videos, i.e., deep hypernetworks. Deep hypernetworks handle the issues through the abstraction at multiple levels using a hierarchy of multiple hypergraphs. We use a stochastic method based on
Monte-Carlo simulation, a graph MC, for efficiently constructing hypergraphs representing the empirical distribution of the observed data. The structure of a deep hypernetwork continuously changes as the learning proceeds, and this flexibility is contrasted to other
deep learning models. The proposed model incrementally learns from the data, thus handling the nonstationary properties such as concept drift. The abstract representations in the learned models play roles
of multimodal knowledge on data, which are used for the
content-aware crossmodal transformation including vision-language conversion. We view the vision-language conversion as a machine translation, and thus formulate the vision-language translation in terms of the statistical machine translation. Since the knowledge on the video stories are used for translation, we call this story-aware
vision-language translation.
We evaluate deep hypernetworks on large-scale vision-language multimodal data including benmarking datasets and cartoon video series. The experimental results show the deep hypernetworks effectively represent visual-linguistic information abstracted at multiple levels of the data contents as well as the associations between vision and language. We explain how the introduction of a hierarchy deals with the scalability and non-stationary properties. In addition, we present the story-aware vision-language translation on cartoon videos by generating scene images from sentences and descriptive subtitles from scene images. Furthermore, we discuss the
meaning of our model for lifelong learning and the improvement direction for achieving human-level artificial intelligence.1 Introduction
1.1 Background and Motivation
1.2 Problems to be Addressed
1.3 The Proposed Approach and its Contribution
1.4 Organization of the Dissertation
2 RelatedWork
2.1 Multimodal Leanring
2.2 Models for Learning from Multimodal Data
2.2.1 Topic Model-Based Multimodal Leanring
2.2.2 Deep Network-based Multimodal Leanring
2.3 Higher-Order Graphical Models
2.3.1 Hypernetwork Models
2.3.2 Bayesian Evolutionary Learning of Hypernetworks
3 Multimodal Hypernetworks for Text-to-Image Retrievals
3.1 Overview
3.2 Hypernetworks for Multimodal Associations
3.2.1 Multimodal Hypernetworks
3.2.2 Incremental Learning of Multimodal Hypernetworks
3.3 Text-to-Image Crossmodal Inference
3.3.1 Representatation of Textual-Visual Data
3.3.2 Text-to-Image Query Expansion
3.4 Text-to-Image Retrieval via Multimodal Hypernetworks
3.4.1 Data and Experimental Settings
3.4.2 Text-to-Image Retrieval Performance
3.4.3 Incremental Learning for Text-to-Image Retrieval
3.5 Summary
4 Deep Hypernetworks for Multimodal Cocnept Learning from Cartoon Videos
4.1 Overview
4.2 Visual-Linguistic Concept Representation of Catoon Videos
4.3 Deep Hypernetworks for Modeling Visual-Linguistic Concepts
4.3.1 Sparse Population Coding
4.3.2 Deep Hypernetworks for Concept Hierarchies
4.3.3 Implication of Deep Hypernetworks on Cognitive Modeling
4.4 Learning of Deep Hypernetworks
4.4.1 Problem Space of Deep Hypernetworks
4.4.2 Graph Monte-Carlo Simulation
4.4.3 Learning of Concept Layers
4.4.4 Incremental Concept Construction
4.5 Incremental Concept Construction from Catoon Videos
4.5.1 Data Description and Parameter Setup
4.5.2 Concept Representation and Development
4.5.3 Character Classification via Concept Learning
4.5.4 Vision-Language Conversion via Concept Learning
4.6 Summary
5 Story-awareVision-LanguageTranslation usingDeepConcept Hiearachies
5.1 Overview
5.2 Vision-Language Conversion as a Machine Translation
5.2.1 Statistical Machine Translation
5.2.2 Vision-Language Translation
5.3 Story-aware Vision-Language Translation using Deep Concept Hierarchies
5.3.1 Story-aware Vision-Language Translation
5.3.2 Vision-to-Language Translation
5.3.3 Language-to-Vision Translation
5.4 Story-aware Vision-Language Translation on Catoon Videos
5.4.1 Data and Experimental Setting
5.4.2 Scene-to-Sentence Generation
5.4.3 Sentence-to-Scene Generation
5.4.4 Visual-Linguistic Story Summarization of Cartoon Videos
5.5 Summary
6 Concluding Remarks
6.1 Summary of the Dissertation
6.2 Directions for Further Research
Bibliography
한글초록Docto
Linguistic and Structural Basis of Engineering Design Knowledge
Artefact descriptions are the primary carriers of engineering design
knowledge that is both an outcome and a driver of the design process. While an
artefact could be described in different connotations, the design process
requires a description to embody engineering design knowledge, which is
expressed in the text through intricate placement of entities and
relationships. As large-language models learn from all kinds of text merely as
a sequence of characters/tokens, these are yet to generate text that embodies
explicit engineering design facts. Existing ontological design theories are
less likely to guide the large-language models whose applications are currently
limited to ideation and learning purposes. In this article, we explicate
engineering design knowledge as knowledge graphs from a large sample of 33,881
patent documents. We examine the constituents of these knowledge graphs to
understand the linguistic and structural basis of engineering design knowledge.
In terms of linguistic basis, we observe that entities and relationships could
be generalised to 64 and 24 linguistic syntaxes. While relationships mainly
capture attributes ('of'), structure ('in', 'with'), purpose ('to', 'for'),
hierarchy ('include'), exemplification ('such as'), and behaviour ('to',
'from'), the hierarchical relationships could specifically be identified using
75 unique syntaxes. To understand the structural basis, we draw inspiration
from various studies on biological/ecological networks and discover motifs from
patent knowledge graphs. We identify four 3-node and four 4-node patterns that
could further be converged and simplified into sequence [->...->], aggregation
[->...]. Expected to guide large-language model
based design tools, we propose few regulatory precepts for concretising
abstract entities and relationships within subgraphs, while explicating
hierarchical structures
- …