PhD - Applications of Multilevel Modelling

Exploring the assumption of no correlation of explanatory variables with random effects. 

Project aims and activities

The overarching aim of my PhD project is to improve understanding about the implications of using multilevel models with social data where the random effects are correlated with explanatory variables. 

In more detail

Random effects models are used in the social sciences to handle data where cases are clustered in some way, either because of the way the data have been sampled, or because conceptually we believe that cases are grouped in some way, so that important differences between these groups might shape the processes affecting them

For example, if we are interested in the relationship between parental income and school attainment, and our data come from pupils from a range of schools, we might suppose that average outcomes will vary between schools for various reasons we can't measure (or just haven't). Perhaps one school has a sporty ethos, and another is very focused on diversity and tolerance, while a third has suffered a lot of disruption because of a building which is poorly maintained.

These differences could mean that the children within any one of these schools have outcomes which are similarly different to the overall average. Some sort of multilevel model is needed to cope with that structure in the data.

Imagine a model which tries to describe how exam scores vary along with a parental income. One common approach would be to use a random intercepts model, which would allow a different 'baseline' exam score for each school, and estimate the effect of wealth on exam scores relative to that. 

This improves our analysis in two ways:

We can extend this specification to explore differences between schools in the relationship between wealth and attainment using a random slopes model. The graphs above show simulated (fake!) data containing relationships we could plausibly find in such a sample. In the second graph, colour coding the same data points by cluster reveals that the relationship between wealth and attainment is weaker than it first seemed. 

However, these models rely on an oft-violated underlying assumption of no correlation of explanatory variables with random effects (hencefore the 'NCRX assumption'). If this assumption is not met, the resulting estimates can be inaccurate

To continue the example above, we can easily imagine that parental income might tend to be higher in schools where exam scores are above average, perhaps because wealthier parents have the means and motivation to move to an area where their child can attend an apparently high-achieving school. 

If this confounding factor confuses the model, we could draw the wrong conclusions about the way in which family wealth is involved with school attainment. In the graphs above, (fake) individuals with high income scores are clustered together in schools with high average attainment. The grey dashed line reveals a high correlation between wealth and exam scores, but the flatter, coloured lines show that school-specific relationships are weaker. What would we conclude about the effectiveness of individual schools? What about the school system as a whole?

Such a violation does not always change the results, and there are corrections we can apply to address it, but we need to understand more (on an applied level) about when and how these models perform well or otherwise, and support researchers in applying them appropriately. My PhD project will address that need through three strands of activity:

Contact

Student:
Kate O'Hara, University of Stirling, k.a.ohara@stir.ac.uk @Kate_OHara_

Supervisors:
Paul Lambert, University of Stirling, paul.lambert@stir.ac.uk
Kevin Ralston, University of Edinburgh, kev.ralston@stir.ac.uk

Resources

'Three-minute Thesis' slide, as presented at University of Stirling Festival of Research, May 2023. Image credits and references for this slide are listed here.