A Reliable Method to Reduce Observations and Variables when building Neural Network Models
(Gerardo Colmenares; Rafael PĂ©rez)
Abstract

This paper describes a method to reduce the number of observations and variables of large data sets so that reliable neural network models can be built using this data and the time to build these models can be reduced. This method can also be used to select, from an original data set, representative data to train, test, and validate models. This method applies stratification and principal component analysis to select representative observations and to eliminate redundant variables. The performance of neural network models built using reduced data sets provided by this method is very similar to that of neural network models built using the entire data set. The performance is also significantly better and more consistent than that of neural networks built using data sets reduced in a random fashion. A comparison using the stratification method alone and using stratification plus principal component analysis to reduce the data set is also included.