INTERACTION OF ENSEMBLE FEATURE TECHNIQUES WITH INCREMENTAL LEARNING USING STOCHASTIC GRADIENT DESCENT OPTIMIZATION ON NEWS CATEGORIZATION
Abstract
The problem of categorization lies in the number of features that exist in the text documents. Single
feature method can pose irrelevant, redundant, noise with high dimensionality, hence, increases
computational cost when treated as an independent feature. Then, become relevant when combine
with other feature techniques and creates interaction between features. In this paper, different feature
techniques are combined to enhance news categorization. Feature selection methods filter the features
but scale to high-dimensionality that results to irrelevant features and lack of understandability.
Reduction techniques extract relevant features to obtain low dimensional representation as it
combines to minimise the error and maximise the variance. In this study, the concept of the feature
reduction does not solve the presence of missing values and noise which has many features associated
with nonlinear models for large data set. Therefore, Stochastic Gradient Descent (SGD) optimization
with L1 regularization which solves the effect of missing values and noisy gradient. These results
show that SGD is least affected by dataset sparsity and it shows that SGD algorithm provides potent
predictions when handling sparse data. Finally, performance evaluations are done.