46,965 research outputs found
Knowledge data discovery and data mining in a design environment
Designers, in the process of satisfying design requirements, generally encounter difficulties in, firstly, understanding the problem and secondly, finding a solution [Cross 1998]. Often the process of understanding the problem and developing a feasible solution are developed simultaneously by proposing a solution to gauge the extent to which the solution satisfies the specific requirements. Support for future design activities has long been recognised to exist in the form of past design cases, however the varying degrees of similarity and dissimilarity found between previous and current design requirements and solutions has restrained the effectiveness of utilising past design solutions. The knowledge embedded within past designs provides a source of experience with the potential to be utilised in future developments provided that the ability to structure and manipulate that knowledgecan be made a reality. The importance of providing the ability to manipulate past design knowledge, allows the ranging viewpoints experienced by a designer, during a design process, to be reflected and supported. Data Mining systems are gaining acceptance in several domains but to date remain largely unrecognised in terms of the potential to support design activities. It is the focus of this paper to introduce the functionality possessed within the realm of Data Mining tools, and to evaluate the level of support that may be achieved in manipulating and utilising experiential knowledge to satisfy designers' ranging perspectives throughout a product's development
ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space
Studying the function of proteins is important for understanding the
molecular mechanisms of life. The number of publicly available protein
structures has increasingly become extremely large. Still, the determination of
the function of a protein structure remains a difficult, costly, and time
consuming task. The difficulties are often due to the essential role of spatial
and topological structures in the determination of protein functions in living
cells. In this paper, we propose ProtNN, a novel approach for protein function
prediction. Given an unannotated protein structure and a set of annotated
proteins, ProtNN finds the nearest neighbor annotated structures based on
protein-graph pairwise similarities. Given a query protein, ProtNN finds the
nearest neighbor reference proteins based on a graph representation model and a
pairwise similarity between vector embedding of both query and reference
protein-graphs in structural and topological spaces. ProtNN assigns to the
query protein the function with the highest number of votes across the set of k
nearest neighbor reference proteins, where k is a user-defined parameter.
Experimental evaluation demonstrates that ProtNN is able to accurately classify
several datasets in an extremely fast runtime compared to state-of-the-art
approaches. We further show that ProtNN is able to scale up to a whole PDB
dataset in a single-process mode with no parallelization, with a gain of
thousands order of magnitude of runtime compared to state-of-the-art
approaches
Deep learning for time series classification: a review
Time Series Classification (TSC) is an important and challenging problem in
data mining. With the increase of time series data availability, hundreds of
TSC algorithms have been proposed. Among these methods, only a few have
considered Deep Neural Networks (DNNs) to perform this task. This is surprising
as deep learning has seen very successful applications in the last years. DNNs
have indeed revolutionized the field of computer vision especially with the
advent of novel deeper architectures such as Residual and Convolutional Neural
Networks. Apart from images, sequential data such as text and audio can also be
processed with DNNs to reach state-of-the-art performance for document
classification and speech recognition. In this article, we study the current
state-of-the-art performance of deep learning algorithms for TSC by presenting
an empirical study of the most recent DNN architectures for TSC. We give an
overview of the most successful deep learning applications in various time
series domains under a unified taxonomy of DNNs for TSC. We also provide an
open source deep learning framework to the TSC community where we implemented
each of the compared approaches and evaluated them on a univariate TSC
benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By
training 8,730 deep learning models on 97 time series datasets, we propose the
most exhaustive study of DNNs for TSC to date.Comment: Accepted at Data Mining and Knowledge Discover
Mining web data for competency management
We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We
discuss the problems associated with evaluating
unsupervised learners and report our initial evaluation
experiments
- …