108 research outputs found
Statistics in the Big Data era
It is estimated that about 90% of the currently available data have been produced over the last two years. Of these, only 0.5% is effectively analysed and used. However, this data can be a great wealth, the oil of 21st century, when analysed with the right approach. In this article, we illustrate some specificities of these data and the great interest that they can represent in many fields. Then we consider some challenges to statistical analysis that emerge from their analysis, suggesting some strategies
Recommended from our members
An incremental approach to MSE-based feature selection
Feature selection plays an important role in classification systems. Using classifier error rate as the evaluation function, feature selection is integrated with incremental training. A neural network classifier is implemented with an incremental training approach to detect and discard irrelevant features. By learning attributes one after another, our classifier can find directly the attributes that make no contribution to classification. These attributes are marked and considered for removal. Incorporated with a Minimum Squared Error (MSE) based feature ranking scheme, four batch removal methods based on classifier error rate have been developed to discard irrelevant features. These feature selection methods reduce the computational complexity involved in searching among a large number of possible solutions significantly. Experimental results show that our feature selection methods work well on several benchmark problems compared with other feature selection methods. The selected subsets are further validated by a Constructive Backpropagation (CBP) classifier, which confirms increased classification accuracy and reduced training cost
- …