31,975 research outputs found
A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s
A systematic review of data quality issues in knowledge discovery tasks
Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafĂo mas fundamental es la exploraciĂłn de los grandes volĂșmenes de datos y la extracciĂłn de conocimiento Ăștil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisiĂłn sistemĂĄtica de los asuntos de calidad de datos en las ĂĄreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrĂcola conocida como la roya del cafĂ©.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust
FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification
This paper introduces a novel real-time Fuzzy Supervised Learning with Binary
Meta-Feature (FSL-BM) for big data classification task. The study of real-time
algorithms addresses several major concerns, which are namely: accuracy, memory
consumption, and ability to stretch assumptions and time complexity. Attaining
a fast computational model providing fuzzy logic and supervised learning is one
of the main challenges in the machine learning. In this research paper, we
present FSL-BM algorithm as an efficient solution of supervised learning with
fuzzy logic processing using binary meta-feature representation using Hamming
Distance and Hash function to relax assumptions. While many studies focused on
reducing time complexity and increasing accuracy during the last decade, the
novel contribution of this proposed solution comes through integration of
Hamming Distance, Hash function, binary meta-features, binary classification to
provide real time supervised method. Hash Tables (HT) component gives a fast
access to existing indices; and therefore, the generation of new indices in a
constant time complexity, which supersedes existing fuzzy supervised algorithms
with better or comparable results. To summarize, the main contribution of this
technique for real-time Fuzzy Supervised Learning is to represent hypothesis
through binary input as meta-feature space and creating the Fuzzy Supervised
Hash table to train and validate model.Comment: FICC201
Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors
Knowledge bases provide applications with the benefit of easily accessible,
systematic relational knowledge but often suffer in practice from their
incompleteness and lack of knowledge of new entities and relations. Much work
has focused on building or extending them by finding patterns in large
unannotated text corpora. In contrast, here we mainly aim to complete a
knowledge base by predicting additional true relationships between entities,
based on generalizations that can be discerned in the given knowledgebase. We
introduce a neural tensor network (NTN) model which predicts new relationship
entries that can be added to the database. This model can be improved by
initializing entity representations with word vectors learned in an
unsupervised fashion from text, and when doing this, existing relations can
even be queried for entities that were not present in the database. Our model
generalizes and outperforms existing models for this problem, and can classify
unseen relationships in WordNet with an accuracy of 75.8%
- âŠ