90,817 research outputs found
CompareML: A Novel Approach to Supporting Preliminary Data Analysis Decision Making
There are a large number of machine learning algorithms as well as a wide range of libraries and services that allow one to create predictive models. With machine learning and artificial intelligence playing a major role in dealing with engineering problems, practising engineers often come to the machine learning field so overwhelmed with the multitude of possibilities that they find themselves needing to address difficulties before actually starting on carrying out any work. Datasets have intrinsic properties that make it hard to select the algorithm that is best suited to some specific objective, and the ever-increasing number of providers together make this selection even harder. These were the reasons underlying the design of CompareML, an approach to supporting the evaluation and comparison of machine learning libraries and services without deep machine learning knowledge. CompareML makes it easy to compare the performance of different models by using well-known classification and regression algorithms already made available by some of the most widely used providers. It facilitates the practical application of methods and techniques of artificial intelligence that let a practising engineer decide whether they might be used to resolve hitherto intractable problems. Thus, researchers and engineering practitioners can uncover the potential of their datasets for the inference of new knowledge by selecting the most appropriate machine learning algorithm and determining the provider best suited to their data
Active Learning with Statistical Models
For many types of machine learning algorithms, one can compute the
statistically `optimal' way to select training data. In this paper, we review
how optimal data selection techniques have been used with feedforward neural
networks. We then show how the same principles may be used to select data for
two alternative, statistically-based learning architectures: mixtures of
Gaussians and locally weighted regression. While the techniques for neural
networks are computationally expensive and approximate, the techniques for
mixtures of Gaussians and locally weighted regression are both efficient and
accurate. Empirically, we observe that the optimality criterion sharply
decreases the number of training examples the learner needs in order to achieve
good performance.Comment: See http://www.jair.org/ for any accompanying file
Sentiment Analysis using an ensemble of Feature Selection Algorithms
To determine the opinion of any person experiencing any services or buying any product, the usage of Sentiment Analysis, a continuous research in the field of text mining, is a common practice. It is a process of using computation to identify and categorize opinions expressed in a piece of text. Individuals post their opinion via reviews, tweets, comments or discussions which is our unstructured information. Sentiment analysis gives a general conclusion of audits which benefit clients, individuals or organizations for decision making. The primary point of this paper is to perform an ensemble approach on feature reduction methods identified with natural language processing and performing the analysis based on the results. An ensemble approach is a process of combining two or more methodologies. The feature reduction methods used are Principal Component Analysis (PCA) for feature extraction and Pearson Chi squared statistical test for feature selection. The fundamental commitment of this paper is to experiment whether combined use of cautious feature determination and existing classification methodologies can yield better accuracy
LLAMA: Leveraging Learning to Automatically Manage Algorithms
Algorithm portfolio and selection approaches have achieved remarkable
improvements over single solvers. However, the implementation of such systems
is often highly customised and specific to the problem domain. This makes it
difficult for researchers to explore different techniques for their specific
problems. We present LLAMA, a modular and extensible toolkit implemented as an
R package that facilitates the exploration of a range of different portfolio
techniques on any problem domain. It implements the algorithm selection
approaches most commonly used in the literature and leverages the extensive
library of machine learning algorithms and techniques in R. We describe the
current capabilities and limitations of the toolkit and illustrate its usage on
a set of example SAT problems
- …