Variance estimation when working with complex survey data

Without an estimate of its precision, a point estimate is pointless

When working on my PhD thesis more than 10 years ago, I realised that not only National Statistical Institutes and Eurostat, but also even the most famous poverty researchers (including the late Tony Atkinson) did not publish standard errors and confidence intervals alongside their poverty estimates. When trying to estimate sampling variance of these poverty figures myself, I realised that in spite of having taken quite a few statistical courses, I had never learned about the importance of the sample design for estimating standard errors and confidence intervals. Even nowadays, many courses in statistics teach the basics of statistics, rather than providing young researchers with a sound and insightful statistical basis for analysing survey data. Given the dominance of complex samples in social science, I developed with Lorena Zardo Trindade a course on estimating the sampling variance for complex survey data, which I taught on various occasions (e.g. at the European Commission’s Joint Research Centre in Seville). The course can be accessed here, and is published under a creative commons licence.

The course assumes some knowledge of basic concepts in statistics, and is addressed to everyone working with survey data. In our experience, even researchers or students that have taken multiple courses of statistics benefit from this course, given that we take a different approach and highlight problems that are often overlooked when working with real-world samples.

Presentation

Getting the sample variance right, is essential when studying the world around us on the basis of a sample. There are quite a few misunderstandings regarding properly estimating and evaluating the statistical precision of sample estimates. In this course, we address in particular the following three issues: (1) the determinants of the sample variance and in particular the importance of taking the sample design into account; (2) the analysis of subpopulations (i.e. subgroups in your sample); (3) the comparison of point estimates (e.g. changes over time, differences between subgroups or between variables, differences between microsimulation scenarios, …).

Course materials

Use of all course materials is free, provided our contribution is clearly acknowledged on the slides, including our names and reference to this website.

Theory

  1. Introduction (ppt)
  2. The Determinants of the sampling variance (ppt)
  3. The Determinants sampling variance:_simulations (ppt)
  4. Approaches to variance estimation (ppt)
  5. Supbopulation analysis and comparing point estimates (ppt) (example independent samples in excel; example 1 in STATA; example 2 in STATA)
  6. Epilogue (ppt)

Exercises

Creative Commons License

The slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. They can be re-used, changed and shared for non-commercial use, as long as my original work is recognised and the revised work is made available under the same conditions.