UK Biobank collection#
Last modified: 06 Mar 2026
Introduction#
Datasets will be available for selection shortly.
UK LLC has generated derived datasets for 7 health outcomes, partially based on algorithms developed and validated by UK Biobank. These are:
The datasets have been derived based on UK Biobank’s methodology, though it is not possible to replicate UK Biobank’s work exactly. The differences between the two approaches are outlined below.
UK Biobank’s approach#
UK Biobank’s full methodology and description of the algorithmically-defined outcomes (ADOs) is available online.
In brief, the ADOs identify two key features of health outcomes:
the earliest recorded date of a given health outcome (across self-reported data, hospital admissions, and death records)
whether a clinical code for a health outcome is the primary (underlying) or secondary (contributory) cause of hospital admission or death.
UK LLC’s approach#
The key difference between UK Biobank’s methodology and UK LLC’s is that UK LLC has not yet incorporated self-reported diagnoses into the derived datasets. Researchers interested in self-reported health data for individual LPS are encouraged to use UK LLC Explore to identify variables of interest.
The different data sources used by UK Biobank and UK LLC are shown below.
Data source |
UK Biobank |
UK LLC |
|---|---|---|
Hospital admission data: |
||
✓ |
✓ |
|
PEDW (Wales) 1 |
✓ |
✗ |
SMR01 (Scotland) 1 |
✓ |
✗ |
✓ |
✓ |
|
Self-reported data from LPS 2 |
✓ |
✗ |
1 UK LLC’s derived datasets will be updated to include linked hospital data from NHS Wales and NHS Scotland once these data are available.
2 UK LLC is hoping to harmonise LPS participants’ self-reported diagnoses of each of the 7 health outcomes.
Health outcomes: derived datasets#
UK Biobank’s ADOs include clinical codes from both ICD-10 and ICD-9, for hospital admission data, and ICD-10 only for mortality data. The HES datasets in the UK LLC Trusted Research Environment (TRE) do not include ICD-9 codes, so only ICD-10 codes have been included. These are summarised, by health outcome, below.
UK LLC’s derived datasets are in long format and comprise just four variables:
study id (individual identifier)
date (the earliest recorded occurence of the outcome in the HES or mortality datasets)
source (i.e. hospital primary, hospital secondary, mortality underlying, mortality contributory)
outcome (e.g. asthma, Alzheimer’s disease)
Asthma, COPD, MND: for these 3 outcomes, the algorithm treats all disease sub-types as one, so there is only one record (row) per participant in the dataset. E.g. the algorithm for COPD comprises 11 ICD-10 codes, covering both emphysema and COPD, ultimately presented as a single outcome.
Dementia, MI, Parkinson’s disease, stroke: for these outcomes, event subtypes are recorded independently. E.g. a participant with a record having had both a subarachnoid haemorrhage and an intracerebral haemorrhage will have these recorded separately in the dataset, plus an additional line for ‘All cause stroke’.
ICD-10 codes for each health outcome#
Asthma#
ICD-10 codes for asthma
Code |
Description |
|---|---|
J45 a |
Asthma |
J46.X |
Status asthmaticus |
a All ICD-10 codes beginning ‘J45’ are included
Chronic obstructive pulmonary disease (COPD)#
ICD-10 codes for COPD
Code a |
Description |
|---|---|
J43 |
Emphysema |
J44 |
Other COPD |
a All ICD-10 codes beginning ‘J43’ or ‘J44’ are included
Dementia#
ICD-10 codes for dementia
UK Biobank's ADO for dementia considers Alzheimer's Disease (AD), Vascular Dementia (VD), and Frontotemporal Dementia (FTD) separately. In UK LLC's derived dementia dataset, the first recorded date and the record source are provided for each separately.Code |
Description |
AD |
VD |
FTD |
Dementia |
|---|---|---|---|---|---|
A81.0 |
Sporadic Creutzfeldt-Jakob disease |
✓ |
|||
F00 a |
Dementia in AD |
✓ |
✓ |
||
F01 b |
Vascular dementia |
✓ |
✓ |
||
F02 c |
Dementia in diseases classified elsewhere |
✓ |
|||
F02.0 |
Dementia in Pick’s disease |
✓ |
✓ |
||
F03 |
Unspecified dementia |
✓ |
|||
F05.1 |
Delirium superimposed on dementia |
✓ |
|||
F10.6 |
Mental and behavioural disorders due to |
✓ |
|||
G30 d |
Alzheimer’s disease |
✓ |
✓ |
||
G31.0 |
Circumscribed brain atrophy |
✓ |
✓ |
||
G31.1 |
Senile degeneration of brain |
✓ |
|||
G31.8 |
Other specified degenerative diseases of nervous system |
✓ |
|||
I67.3 |
Binswanger’s disease |
✓ |
Notes
a All ICD-10 codes beginning ‘F00’ are included
b All ICD-10 codes beginning ‘F01’ are included
c All ICD-10 codes beginning ‘F02’ are included
d All ICD-10 codes beginning ‘G30’ are included
Myocardial infarction (MI)#
ICD-10 codes for MI
UK Biobank's ADO for MI considers ST-elevation myocardial infarction (STEMI) and Non-ST-elevation myocardial infarction (NSTEMI) separately. In UK LLC's derived MI dataset, the first recorded date and the record source are provided for each type separately.Code |
Description |
STEMI |
NSTEMI |
MI |
|---|---|---|---|---|
I21 |
Acute MI |
✓ |
||
I21.0 |
Acute transmural MI of anterior wall |
✓ |
✓ |
|
I21.1 |
Acute transmural MI of inferior wall |
✓ |
✓ |
|
I21.2 |
Acute transmural MI of other sites |
✓ |
✓ |
|
I21.3 |
Acute transmural MI of unspecified sites |
✓ |
✓ |
|
I21.4 |
Acute subendocardial MI |
✓ |
✓ |
|
I21.9 |
Acute MI, unspecified |
✓ |
✓ |
|
I22 |
Subsequent MI |
✓ |
||
I22.0 |
Subsequent MI of anterior wall |
✓ |
✓ |
|
I22.1 |
Subsequent MI of inferior wall |
✓ |
✓ |
|
I22.8 |
Subsequent MI of other sites |
✓ |
✓ |
|
I22.9 |
Subsequent MI of unspecified site |
✓ |
✓ |
|
I23 a |
Certain current complications following acute MI |
✓ |
||
I24.1 |
Dressier syndrome |
✓ |
||
I25.2 |
Old MI |
✓ |
a All ICD-10 codes beginning ‘I23’ are included
Motor Neurone Disease (MND)#
ICD-10 code for MND
Code |
Description |
|---|---|
G12.2 |
Motor Neurone Disease |
Parkinson’s disease#
ICD-10 codes for Parkinson's disease
UK Biobank's ADO for Parkinson's disease considers Parkinson's disease (PD), Multiple System Atrophy (MSA), and Progressive Supranuclear Palsy (PSP) separately. In UK LLC's derived Parkinson's disease dataset, the first recorded date and the record source are provided for each type separately.Code |
Description |
PD |
MSA |
PSP |
All cause |
|---|---|---|---|---|---|
G20 |
Parkinson’s disease |
✓ |
✓ |
||
G21 a |
Secondary parkinsonism |
✓ |
|||
G22 |
Parkinsonism in diseases specified elsewhere |
✓ |
|||
G23.0 |
Hallervorden-Spatz disease |
✓ |
|||
G23.1 |
Progressive Supranuclear Palsy |
✓ |
✓ |
||
G23.2 |
Multiple system atrophy, parkinsonian type [MSA-P] |
✓ |
✓ |
||
G23.3 |
Multiple system atrophy, cerebellar type [MSA-C] |
✓ |
✓ |
||
G23.8 |
Other specified degenerative diseases of basal ganglia |
✓ |
|||
G23.9 |
Degenerative diseases of basal ganglia, unspecified |
✓ |
|||
G25.9 |
Extrapyramidal and movement disorder, unspecified |
✓ |
|||
G26 |
Extrapyramidal and movement disorders in |
✓ |
|||
G90.3 |
Multi-system degeneration |
✓ |
✓ |
a All ICD-10 codes beginning ‘G21’ are included
Stroke#
ICD-10 codes for stroke
UK Biobank's ADO for stroke events considers ischaemic stroke (IS), intracerebral haemorrhage (IH), and subsrachnoid haemorrhage (SH) separately. In UK LLC's derived stroke dataset, the first recorded date and the record source are provided for each type separately.Code |
Description |
IS |
IH |
SH |
Stroke |
|---|---|---|---|---|---|
I60 a |
Subarachnoid haemorrhage |
✓ |
✓ |
||
I61 b |
Intracerebral haemorrhage |
✓ |
✓ |
||
I63 c |
Cerebral infarction |
✓ |
✓ |
||
I64.X |
Stroke, not specified as haemorrhage or infarction |
✓ |
✓ |
Notes
a All ICD-10 codes beginning ‘I60’ are included
b All ICD-10 codes beginning ‘I61’ are included
c All ICD-10 codes beginning ‘I63’ are included