Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
Autoria: CARVALHO JUNIOR, W. de; PEREIRA, N. R.; FERNANDES FILHO, E. I.; CALDERANO FILHO, B.; PINHEIRO, H. S. K.; CHAGAS, C. da S.; BHERING, S. B.; PEREIRA, V. R.; LAWALL, S.
Resumo: Notwithstanding the importance of soil surveys, advances in digital soil mapping have mainly focused on mapping soil attributes or properties rather than developing digital maps of soil units or soil classes. The purpose of this research was to develop digital soil unit maps based on primary soil data collection in areas without previously collected soil information. The covariate variability, the random effect across the data subset and the map outputs were the focuses of this study. We used five datasets with four models (Random Forest - RF, Gradient Boosted Machine - GBM, C5.0, and multinomial log-linear model - MLR). The covariates were grouped into five datasets, where four were grouped by Region Of Interest per Class (ROIC) and one was not grouped by ROIC. To evaluate the random effect to split the dataset, we ran each model 50 times and observed the overall accuracy (OA) and kappa index, and uncertainty, majority and variety maps. The OA of Dataset01 to 04 was lower than to Dataset05 accuracy. However, map outputs of RF and GBM for Dataset01 and Dataset05 had the same majority prediction. It seems that RF and GBM produce consistent results in map outputs according to this methodology and pedologist expertise. To evaluate the uncertainty and the consistency of soil unit prediction, we used the majority maps process. Random Forest, similar to GBM, presented the best results. The increase in the number of covariates was not a guarantee of improvement in the OA or in the quality of the map output. Geographic position and distance raster did not improve the map output according to expert evaluation. Because the variance between the ROICs, when the training and validation datasets were split based on it, the subsets are quite different in relation to the covariates, and this is the reason for the worse results of this model, comparing with the Dataset05. On the other hand, when considering one complete dataset not based on ROICs, the variance of training and validation subsets is lower and produced more accurate parameters of quality.
Ano de publicação: 2020
Tipo de publicação: Artigo de periódico
Unidade: Embrapa Solos
Palavras-chave: Hillslope areas, Mapa, Mapeamento digital de solos, Random forest, Solo, Tree learners models
Observações
1 - Por padrão são exibidas publicações dos últimos 20 anos. Para encontrar publicações mais antigas, configure o filtro ano de publicação, colocando o ano a partir do qual você deseja encontrar publicações. O filtro está na coluna da esquerda na busca acima.
2 - Para ler algumas publicações da Embrapa (apenas as que estão em formato ePub), é necessário ter, no celular ou computador, um desses softwares gratuitos. Sistemas Android: Google Play Livros; IOS: iBooks; Windows e Linux: software Calibre.
Acesse outras publicações
Acesse a Base de Dados da Pesquisa Agropecuária (BDPA) para consultar o acervo completo das bibliotecas da Embrapa.