2,929 research outputs found
Classifying pairs with trees for supervised biological network inference
Networks are ubiquitous in biology and computational approaches have been
largely investigated for their inference. In particular, supervised machine
learning methods can be used to complete a partially known network by
integrating various measurements. Two main supervised frameworks have been
proposed: the local approach, which trains a separate model for each network
node, and the global approach, which trains a single model over pairs of nodes.
Here, we systematically investigate, theoretically and empirically, the
exploitation of tree-based ensemble methods in the context of these two
approaches for biological network inference. We first formalize the problem of
network inference as classification of pairs, unifying in the process
homogeneous and bipartite graphs and discussing two main sampling schemes. We
then present the global and the local approaches, extending the later for the
prediction of interactions between two unseen network nodes, and discuss their
specializations to tree-based ensemble methods, highlighting their
interpretability and drawing links with clustering techniques. Extensive
computational experiments are carried out with these methods on various
biological networks that clearly highlight that these methods are competitive
with existing methods.Comment: 22 page
NeuroSVM: A Graphical User Interface for Identification of Liver Patients
Diagnosis of liver infection at preliminary stage is important for better
treatment. In todays scenario devices like sensors are used for detection of
infections. Accurate classification techniques are required for automatic
identification of disease samples. In this context, this study utilizes data
mining approaches for classification of liver patients from healthy
individuals. Four algorithms (Naive Bayes, Bagging, Random forest and SVM) were
implemented for classification using R platform. Further to improve the
accuracy of classification a hybrid NeuroSVM model was developed using SVM and
feed-forward artificial neural network (ANN). The hybrid model was tested for
its performance using statistical parameters like root mean square error (RMSE)
and mean absolute percentage error (MAPE). The model resulted in a prediction
accuracy of 98.83%. The results suggested that development of hybrid model
improved the accuracy of prediction. To serve the medicinal community for
prediction of liver disease among patients, a graphical user interface (GUI)
has been developed using R. The GUI is deployed as a package in local
repository of R platform for users to perform prediction.Comment: 9 pages, 6 figure
Deep tree-ensembles for multi-output prediction
Recently, deep neural networks have expanded the state-of-art in various
scientific fields and provided solutions to long standing problems across
multiple application domains. Nevertheless, they also suffer from weaknesses
since their optimal performance depends on massive amounts of training data and
the tuning of an extended number of parameters. As a countermeasure, some
deep-forest methods have been recently proposed, as efficient and low-scale
solutions. Despite that, these approaches simply employ label classification
probabilities as induced features and primarily focus on traditional
classification and regression tasks, leaving multi-output prediction
under-explored. Moreover, recent work has demonstrated that tree-embeddings are
highly representative, especially in structured output prediction. In this
direction, we propose a novel deep tree-ensemble (DTE) model, where every layer
enriches the original feature set with a representation learning component
based on tree-embeddings. In this paper, we specifically focus on two
structured output prediction tasks, namely multi-label classification and
multi-target regression. We conducted experiments using multiple benchmark
datasets and the obtained results confirm that our method provides superior
results to state-of-the-art methods in both tasks
An Overview of the Use of Neural Networks for Data Mining Tasks
In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks
EC3: Combining Clustering and Classification for Ensemble Learning
Classification and clustering algorithms have been proved to be successful
individually in different contexts. Both of them have their own advantages and
limitations. For instance, although classification algorithms are more powerful
than clustering methods in predicting class labels of objects, they do not
perform well when there is a lack of sufficient manually labeled reliable data.
On the other hand, although clustering algorithms do not produce label
information for objects, they provide supplementary constraints (e.g., if two
objects are clustered together, it is more likely that the same label is
assigned to both of them) that one can leverage for label prediction of a set
of unknown objects. Therefore, systematic utilization of both these types of
algorithms together can lead to better prediction performance. In this paper,
We propose a novel algorithm, called EC3 that merges classification and
clustering together in order to support both binary and multi-class
classification. EC3 is based on a principled combination of multiple
classification and multiple clustering methods using an optimization function.
We theoretically show the convexity and optimality of the problem and solve it
by block coordinate descent method. We additionally propose iEC3, a variant of
EC3 that handles imbalanced training data. We perform an extensive experimental
analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known
standalone classifiers, 5 ensemble classifiers, and 2 existing methods that
merge classification and clustering) on 13 standard benchmark datasets. We show
that our methods outperform other baselines for every single dataset, achieving
at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than
the best baseline), more resilient to noise and class imbalance than the best
baseline method.Comment: 14 pages, 7 figures, 11 table
- …