This is an observational, case–control study reported in accordance with the STROBE guidelines (see Additional file 1: Checklist S1).
Study sample
The UK Biobank is a UK-wide prospective cohort study, with baseline recruitment taking place between March 2006 and October 2010. At the time of this study, there were 227,297 participants who had completed consent for their primary care records to be made available and who had confirmed primary care records in the database (Fig. 1). From this pool of participants, 7 participants had missing geographical location or were outside the age range for the NHS Health Check. A further 20,256 participants (9%) were excluded as ineligible for an NHS Health Check due to an existing diagnosis of heart, brain, liver, or kidney disease, and 66,135 participants (31%) were excluded due to statins prescription or an existing diagnosis of diabetes or hypertension. The NHS Health Check criteria specifically exclude people with these conditions as they are considered to be already identified and managed by their NHS providers. Thus, 140,899 participants were used as the study sample.
Case–control matching
In order to reduce confounding from known sources, we applied an extensive case–control matching algorithm, to produce a control group that was matched one-to-one with NHS Health Check recipients using nearest neighbour propensity score matching with a calliper width of 0.2, as per published recommendations [15]. Matching was conducted with respect to demographic features (geographical region, age, sex, ethnicity, Townsend deprivation index, education), family history of disease (stroke, heart disease, dementia), physical measures (body mass index, waist-to-hip ratio, systolic blood pressure at baseline), and health behaviours (smoking, alcohol intake frequency, physical activity, and daily vegetable intake). More than 95% of cases (NHS Health Check recipients) received a high-quality match, and non-matching participants were dropped from the analysis. A review of the excluded participants indicated that although they tended to be younger, more deprived, and more obese, the main reason for matching failure was the requirement for one-to-one matching within a geographical region. In other words, the matched control for a London health check recipient must also be from London. From this process, we derived a matched sample of 97,204 participants, with 48,602 participants in each exposure group (Fig. 1).
The intervention: NHS Health Check
To identify participants who had received an NHS Health Check (or its Scottish equivalent), we performed a text search of primary care clinical records, for the following phrases “keep well, health check, check-up, healthy start, healthy lifestyle, diabetes prevent, well man, well woman, well adult”. This search returned 283,536 records between April 2006 and December 2022. From here, we reviewed the record text to exclude other types of health checks (healthy lifestyle, diabetes management, carer health checks, and unspecified check-ups) and records that did not indicate NHS Health Check completion, for example, invitations (terms including invitation, email, letter), neutral descriptors (not appropriate, indicated, status), and health check refusal (declined, refused, did not attend). The remaining codes that were used to confirm NHS Health Checks are provided in Additional file 1: Table S1. Although the official nationwide launch took place in April 2009, there were some areas that participated in a pilot programme in the year leading up to this date [16]. Within the UK Biobank resource, coverage for primary care records was high up to mid-2016, when the availability of records dropped sharply. Therefore, the exposure window was set between 1 January 2008 and 30 June 2016. The count of records for health check exposure by year is illustrated in Fig. 2. For modelling purposes, we considered the date of exposure to be the date of the first completed NHS Health Check.
Ascertainment of outcomes
Existing disease at baseline was assessed via both self-report and linked records (hospital and primary care). Incident disease after baseline was ascertained via primary care linked records, hospital records, and death registry (Additional file 1: Table S2), using published code lists where available [17,18,19]. We included the following outcomes: hypertension, diabetes, hyperlipidaemia, stroke (any type), all-cause dementia, myocardial infarction, atrial fibrillation, heart failure, acute kidney injury, chronic kidney disease (stage 3, 4, or 5), fatty liver disease, alcoholic liver disease, liver cirrhosis, liver failure, cardiovascular mortality (death with a primary cause between ICD I00–I80), and all-cause mortality. Participants were followed up until the end of October 2022. All patient consent procedures were completed by the UK Biobank, and details of record linkage in the UK Biobank have been published previously [20].
Definition of covariates
Age, sex, and all other covariates were recorded at baseline. Modelling covariates were selected based on their known associations with outcome risk in previous research; these include age, sex, geographical region, Townsend deprivation score, ethnicity (White/non-White), post-secondary education (yes/no), body mass index, waist-to-hip ratio, current smoking (yes/no), systolic blood pressure, Charlson Comorbidity Index, alcohol intake frequency, physical activity, and fresh vegetable intake. Charlson Comorbidity Index was coded from existing conditions at baseline [21]. Physical activity was coded in summed metabolic equivalent task-minutes per week, based on the aggregation of physical activity fields according to published guidance [22]. To partition participants into geographical regions, we identified the coordinates of six major cities in the mainland UK (London, Bristol, Birmingham, Manchester, Newcastle upon Tyne, and Edinburgh) and then allocated each participant to their nearest city based on the rounded east and north coordinate provided at baseline. Some of the covariates had a small amount of missing values (less than 2%) that were imputed together using multiple imputation with chained equations [23] (summarised in Additional file 1: Table S3).
Statistical analysis
We used graphical illustrations and simple descriptive statistics to outline the features of UK Biobank participants with primary care data and the uptake of the NHS Health Check. To assess the covariate-adjusted differences in outcomes between health check recipients and non-recipients, we used two main modelling approaches.
Time-varying Cox regression
We applied Cox survival modelling with the intervention (NHS Health Check date) coded in a time-varying fashion following the method outlined by Therneau and colleagues [24,25,26]. In this method, follow-up begins at baseline registration, and all participants begin the follow-up period with intervention-negative status, in other words, no NHS Health Check. Then, over time, some participants receive an NHS Health Check, so at that date, they acquire intervention-positive status. Survival models were adjusted by the full complement of covariates outlined above, as recorded at baseline. This analysis makes use of the most amount of data and allows time-based differences in exposure effect to be observed more easily but could still be biased due to residual confounding.
Aligned-start Cox regression
In this analysis, we aligned the intervention window between cases and controls in a manner similar to Sebuødegård and colleagues [27] and then applied proportional hazards regression in the conventional sense. Here, the follow-up period is defined to begin at the date of the completed NHS Health Check, with outcome times aligned accordingly. Each control participant (without an NHS Health Check) is followed up from the intervention date of his or her matched pair. Full follow-up lengths were compared and were not significantly different between the groups. In aligned-start analyses, ages were updated to be consistent with the intervention date, with the remainder of covariates as measured at baseline. This method provides better quality control for known confounding but involves the loss of data, and hence loss in power.
Survival models in both approaches above were adjusted by age, sex, geographical region, Townsend deprivation score, ethnicity, post-secondary education, body mass index, waist-to-hip ratio, current smoking, systolic blood pressure, Charlson Comorbidity Index, alcohol intake frequency, physical activity, and fresh vegetable intake. Aligned-start models were additionally adjusted by the length of time since registration. Multiple testing correction using a false discovery rate of 5% was applied to identify significant p-values across all models.
Further time adjustments
Prior research has identified that NHS Health Check receipt was associated with increased detection and diagnosis of prevalent (but unrecognised) diabetes, hypertension, and cardiovascular and kidney diseases [9]. Therefore, we conducted the survival modelling with three-time exclusion settings: (a) include all outcomes, (b) exclude outcome events in the first 12 months after NHS Health Check, and (c) exclude outcome events in the first 24 months after NHS Health Check. In epidemiological research, it is common to exclude events that occur in the first few years of follow-up when studying the relationship between an exposure and a disease outcome. This latency period intends to reduce bias from reverse causation, which occurs when the main biological processes creating the disease outcome precede the exposure. By removing events that occur in the early years of follow-up, we hope to observe associations between the exposure and disease outcome that are not due to pre-existing disease influencing exposure status (see Rothman and colleagues, p.218–219 [28]), and examples [29, 30]. We tested the proportional hazards assumption for the effect of NHS Health Check on the outcome risk using a chi-square test and graphical display of Schoenfeld residuals. In sensitivity analysis, we re-ran the aligned start Cox models with stratified time periods (Therneau and colleagues [24], Sect. 4). This approach has the lowest power of all, as each coefficient is calculated only using the outcomes within each time stratum.
Unmeasured confounding
Finally, we applied the e-value methodology described by VanderWeele and colleagues [31] to evaluate the potential nullifying impact of unobserved confounding variables. We calculated e-values and their lower bounds for each significant result and provided a translation of these into “years of ageing equivalent” to aid intuitive understanding of their relative strength.
link