Imputación de datos faltantes del censo de población y vivienda de Uruguay utilizando técnicas de estadística espacial

Riaño, María Eugenia

Services on Demand

Journal

Article

Indicators

Cited by SciELO

SaberEs

Print version ISSN 1852-4418On-line version ISSN 1852-4222

Abstract

RIANO, María Eugenia. Missing data imputation using spatial statistics techniques applied to Uruguay census of population and housing. SaberEs [online]. 2019, vol.11, n.2, pp.153-169. ISSN 1852-4418.

Uruguay National Census was quality and coverage positively evaluated in general, attaining international standard requirements. However, the data collecting process had some difficulties. The omission are concentrated in segments socioeconomically vulnerable. This could have an impact over the algorithm performed by the government to select the beneficiary population of cash-transfer programs. The heterogeneous spatial pattern of the target population and of the omission itself makes necessary define regions for the imputation of the missing data. Regions are obtained by means of spatial oblique decision trees. Spatial Autorregresive models are adjusted for each region. The models are assessed using cross-validation methods. Results are compared with the performance of a global model for the whole map. Except by one region, models that minimize cross-validation's errors show a similar lag in each region. The cross-validation error for the global model is quite similar. Nevertheless, spatial autocorrelation is detected according to the Moran test for residuals. Hence, the data imputation is performed by regions, with local SAR models, selecting the lag according to the cross-validation error. Results show that target population is underestimated approximately by a 5% over the total obtained with census data.

Keywords : Classification and Regression trees; Cross-validation; SAR models.

· abstract in Spanish · text in Spanish