782 research outputs found
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
How automated image analysis techniques help scientists in species identification and classification?
Identification of taxonomy at a specific level is time consuming and reliant upon expert ecologists. Hence the demand for automated species identification increÂased over the last two decades. Automation of data classification is primarily focussed on images while incorporating and analysing image data has recently become easier due to developments in computational technology. Research efÂforts on identification of species include specimens’ image processing, extraction of identical features, followed by classifying them into correct categories. In this paper, we discuss recent automated species identification systems, mainly for categorising and evaluating their methods. We reviewed and compared different methods in step by step scheme of automated identification and classification systems of species images. The selection of methods is influenced by many variables such as level of classification, number of training data and complexity of images. The aim of writing this paper is to provide researchers and scientists an extensive background study on work related to automated species identification, focusing on pattern recognition techniques in building such systems for biodiversity studies. (Folia Morphol 2018; 77, 2: 179–193
A New Feature Selection Method Based on Class Association Rule
Feature selection is a key process for supervised learning algorithms. It involves discarding irrelevant attributes from the training dataset from which the models are derived. One of the vital feature selection approaches is Filtering, which often uses mathematical models to compute the relevance for each feature in the training dataset and then sorts the features into descending order based on their computed scores. However, most Filtering methods face several challenges including, but not limited to, merely considering feature-class correlation when defining a feature’s relevance; additionally, not recommending which subset of features to retain. Leaving this decision to the end-user may be impractical for multiple reasons such as the experience required in the application domain, care, accuracy, and time. In this research, we propose a new hybrid Filtering method called Class Association Rule Filter (CARF) that deals with the aforementioned issues by identifying relevant features through the Class Association Rule Mining approach and then using these rules to define weights for the available features in the training dataset. More crucially, we propose a new procedure based on mutual information within the CARF method which suggests the subset of features to be retained by the end-user, hence reducing time and effort. Empirical evaluation using small, medium, and large datasets that belong to various dissimilar domains reveals that CARF was able to reduce the dimensionality of the search space when contrasted with other common Filtering methods. More importantly, the classification models devised by the different machine learning algorithms against the subsets of features selected by CARF were highly competitive in terms of various performance measures. These results indeed reflect the quality of the subsets of features selected by CARF and show the impact of the new cut-off procedure proposed
- …