LIU Zhan, ZHOU Qing, LI Ruohan, PAN Yingli
The development of big data and the network has made it more and more convenient to obtain non-probability samples. However, it is difficult to infer the population in the case of the unknown selection probability of non-probability samples. On the other hand, the probability samples have known inclusion probability and are representative of the population. However, the target variables from the probability samples may even be missing while cost and nonresponse rate are increasing by year. Thus, how to combine the two samples to estimate the population is worth studying when existing probability samples with missing target variables and non-probability samples with complete data. To solve this problem, a nonparametric superpopulation local polynomial model based on non-probability samples is established to predict the missing target variables from probability samples, then a propensity score model is established to estimate the selection probability of non-probability samples, and further estimate the prediction error of the nonparametric superpopulation local polynomial model to obtain the population estimator finally. Simulation and empirical research results show that compared with imputation estimator and propensity score inverse weighted estimator, the absolute relative bias, standard deviation and mean square error of the proposed estimator are the smallest, regardless of whether the nonparametric superpopulation local polynomial model or the propensity score model is correctly specified or not. Besides, the corresponding bootstrap variance estimation is also small, which implies that the proposed estimator performs well.