Masked sample design variables were included for the first time on NAMCS and NHAMCS public use data files for survey year 2000. These design variables reflected the complex multi-stage sample design of the surveys and were intended for use with software such as SUDAAN that required such data for variance estimation. Following that release, NAMCS and NHAMCS public use files for 1993-1999 were re-released with masked design variables added.
Research was conducted comparing variance estimation for NAMCS and NHAMCS public use file data using different techniques, including SUDAAN’s with-replacement option, SUDAAN’s without-replacement option, generalized variance functions, and SAS PROC SURVEYMEANS. Multi-stage design variables were used to develop two new variables, CSTRATM and CPSUM, which could be used with analysis software employing an ultimate cluster design for estimating variance.The variances produced with these methods were compared with standard errors obtained for in-house files (which contain non-masked design variables), using SUDAAN’s without-replacement (WOR) option. This option takes into account the multiple sampling stages of the surveys.
The use of the masked design variables with the three software applications yielded more accurate standard error estimates than those derived using the generalized variance functions. Standard errors obtained using both full design SUDAAN and the two ultimate cluster designs with masked survey design variables tended to slightly overstate in-house standard errors, on average. This tendency resulted in conservative tests of significance for the data analyzed in the study.
The results support the adoption of the new CSTRATM and CPSUM variables for variance estimation in general, as they were found to yield acceptable results and can be used with a wide variety of software.