UK Biobank collection#

Last modified: 06 Mar 2026

UK LLC is working towards reproducing UK Biobank's 'algorithmically-defined outcomes' (ADOs) for a range of health outcomes.

Introduction#

Datasets will be available for selection shortly.

UK LLC has generated derived datasets for 7 health outcomes, partially based on algorithms developed and validated by UK Biobank. These are:

The datasets have been derived based on UK Biobank’s methodology, though it is not possible to replicate UK Biobank’s work exactly. The differences between the two approaches are outlined below.

UK Biobank’s approach#

UK Biobank’s full methodology and description of the algorithmically-defined outcomes (ADOs) is available online.

In brief, the ADOs identify two key features of health outcomes:

  1. the earliest recorded date of a given health outcome (across self-reported data, hospital admissions, and death records)

  1. whether a clinical code for a health outcome is the primary (underlying) or secondary (contributory) cause of hospital admission or death.

UK LLC’s approach#

The key difference between UK Biobank’s methodology and UK LLC’s is that UK LLC has not yet incorporated self-reported diagnoses into the derived datasets. Researchers interested in self-reported health data for individual LPS are encouraged to use UK LLC Explore to identify variables of interest.

The different data sources used by UK Biobank and UK LLC are shown below.

Data source

UK Biobank

UK LLC

Hospital admission data:

  HES APC (England)

  PEDW (Wales) 1

  SMR01 (Scotland) 1

Death register data

Self-reported data from LPS 2

1 UK LLC’s derived datasets will be updated to include linked hospital data from NHS Wales and NHS Scotland once these data are available.
2 UK LLC is hoping to harmonise LPS participants’ self-reported diagnoses of each of the 7 health outcomes.

Health outcomes: derived datasets#

UK Biobank’s ADOs include clinical codes from both ICD-10 and ICD-9, for hospital admission data, and ICD-10 only for mortality data. The HES datasets in the UK LLC Trusted Research Environment (TRE) do not include ICD-9 codes, so only ICD-10 codes have been included. These are summarised, by health outcome, below.

UK LLC’s derived datasets are in long format and comprise just four variables:

  • study id (individual identifier)

  • date (the earliest recorded occurence of the outcome in the HES or mortality datasets)

  • source (i.e. hospital primary, hospital secondary, mortality underlying, mortality contributory)

  • outcome (e.g. asthma, Alzheimer’s disease)

Asthma, COPD, MND: for these 3 outcomes, the algorithm treats all disease sub-types as one, so there is only one record (row) per participant in the dataset. E.g. the algorithm for COPD comprises 11 ICD-10 codes, covering both emphysema and COPD, ultimately presented as a single outcome.

Dementia, MI, Parkinson’s disease, stroke: for these outcomes, event subtypes are recorded independently. E.g. a participant with a record having had both a subarachnoid haemorrhage and an intracerebral haemorrhage will have these recorded separately in the dataset, plus an additional line for ‘All cause stroke’.

ICD-10 codes for each health outcome#

Asthma#

ICD-10 codes for asthma

Code

Description

J45 a

Asthma

J46.X

Status asthmaticus

a All ICD-10 codes beginning ‘J45’ are included

Chronic obstructive pulmonary disease (COPD)#

ICD-10 codes for COPD

Code a

Description

J43

Emphysema

J44

Other COPD

a All ICD-10 codes beginning ‘J43’ or ‘J44’ are included

Dementia#

ICD-10 codes for dementia UK Biobank's ADO for dementia considers Alzheimer's Disease (AD), Vascular Dementia (VD), and Frontotemporal Dementia (FTD) separately. In UK LLC's derived dementia dataset, the first recorded date and the record source are provided for each separately.

Code

Description

AD

VD

FTD

Dementia
(all causes)

A81.0

Sporadic Creutzfeldt-Jakob disease

F00 a

Dementia in AD

F01 b

Vascular dementia

F02 c

Dementia in diseases classified elsewhere

F02.0

Dementia in Pick’s disease

F03

Unspecified dementia

F05.1

Delirium superimposed on dementia

F10.6

Mental and behavioural disorders due to
use of alcohol - amnesic syndrome

G30 d

Alzheimer’s disease

G31.0

Circumscribed brain atrophy

G31.1

Senile degeneration of brain

G31.8

Other specified degenerative diseases of nervous system

I67.3

Binswanger’s disease

Notes
a All ICD-10 codes beginning ‘F00’ are included
b All ICD-10 codes beginning ‘F01’ are included
c All ICD-10 codes beginning ‘F02’ are included
d All ICD-10 codes beginning ‘G30’ are included

Myocardial infarction (MI)#

ICD-10 codes for MI UK Biobank's ADO for MI considers ST-elevation myocardial infarction (STEMI) and Non-ST-elevation myocardial infarction (NSTEMI) separately. In UK LLC's derived MI dataset, the first recorded date and the record source are provided for each type separately.

Code

Description

STEMI

NSTEMI

MI

I21

Acute MI

I21.0

Acute transmural MI of anterior wall

I21.1

Acute transmural MI of inferior wall

I21.2

Acute transmural MI of other sites

I21.3

Acute transmural MI of unspecified sites

I21.4

Acute subendocardial MI

I21.9

Acute MI, unspecified

I22

Subsequent MI

I22.0

Subsequent MI of anterior wall

I22.1

Subsequent MI of inferior wall

I22.8

Subsequent MI of other sites

I22.9

Subsequent MI of unspecified site

I23 a

Certain current complications following acute MI

I24.1

Dressier syndrome

I25.2

Old MI

a All ICD-10 codes beginning ‘I23’ are included

Motor Neurone Disease (MND)#

ICD-10 code for MND

Code

Description

G12.2

Motor Neurone Disease

Parkinson’s disease#

ICD-10 codes for Parkinson's disease UK Biobank's ADO for Parkinson's disease considers Parkinson's disease (PD), Multiple System Atrophy (MSA), and Progressive Supranuclear Palsy (PSP) separately. In UK LLC's derived Parkinson's disease dataset, the first recorded date and the record source are provided for each type separately.

Code

Description

PD

MSA

PSP

All cause
Parkinsonism

G20

Parkinson’s disease

G21 a

Secondary parkinsonism

G22

Parkinsonism in diseases specified elsewhere

G23.0

Hallervorden-Spatz disease

G23.1

Progressive Supranuclear Palsy

G23.2

Multiple system atrophy, parkinsonian type [MSA-P]

G23.3

Multiple system atrophy, cerebellar type [MSA-C]

G23.8

Other specified degenerative diseases of basal ganglia

G23.9

Degenerative diseases of basal ganglia, unspecified

G25.9

Extrapyramidal and movement disorder, unspecified

G26

Extrapyramidal and movement disorders in
diseases classified elsewhere

G90.3

Multi-system degeneration

a All ICD-10 codes beginning ‘G21’ are included

Stroke#

ICD-10 codes for stroke UK Biobank's ADO for stroke events considers ischaemic stroke (IS), intracerebral haemorrhage (IH), and subsrachnoid haemorrhage (SH) separately. In UK LLC's derived stroke dataset, the first recorded date and the record source are provided for each type separately.

Code

Description

IS

IH

SH

Stroke

I60 a

Subarachnoid haemorrhage

I61 b

Intracerebral haemorrhage

I63 c

Cerebral infarction

I64.X

Stroke, not specified as haemorrhage or infarction

Notes
a All ICD-10 codes beginning ‘I60’ are included
b All ICD-10 codes beginning ‘I61’ are included
c All ICD-10 codes beginning ‘I63’ are included