Data Mining Methods for Large Data

P. Arumugam, P. Jose

Abstract


High dimensional data classification becomes a challenging task because data are large, complex to handle, heterogeneous and hierarchical. In order to reduce the data set without affecting the classifier accuracy, the feature selection plays a vital role in large datasets and which increases the efficiency of classification to choose the important features for high dimensional classification, when those features are irrelevant or correlated. Therefore feature selection is considered to be used in preprocessing before applying classifier to a data set. Thus this good choice of feature selection leads to the high classification accuracy and minimizing computational cost. Though different kinds of feature selection methods are investigated for selecting and fitting features, the best algorithm should be preferred to maximize the accuracy of the classification. In this paper, initial subset selection is based on the integration of PSO and DT. The novel approach aimed to speed up the training time and optimize the SVM classifier accuracy automatically. The proposed model is used to select minimum number of features and providing high classification accuracy of large datasets.

 

Keywords: Feature selection, decision tree, classification, PSO, SVM

Cite this Article

Arumugam P, Jose P. Data Mining Methods for Large Data. Research & Reviews: Journal of Statistics. 2018; 7(1): 30s–35sp.


Full Text:

PDF


DOI: https://doi.org/10.37591/rrjost.v7i1.824

Refbacks

  • There are currently no refbacks.