FAQs about NHS England data

FAQs about NHS England data#

Last modified: 19 Nov 2025

Answers to researchers' questions about working with linked NHS England data.

Does UK LLC check the accuracy of health records?

No, the UK LLC Data Team can only see de-identified records in the TRE and does not amend any participant data. The UK LLC Data Team only performs the following data curation tasks:

  • Clean and deduplicate data, dataset names and structures to enable data provisioning in an efficient manner while maintaining data integrity.

  • Load and integrate variable and value labelling, where available from the NHS API and other web sources, into master metadata tables.

  • Run the automated disclosure control risk assessment and manually review all flagged risks.


What medical codes are used in the NHS England data available in the TRE?

The main clinical classifications mandated by NHS England are SNOMED CT, ICD-10 and OPCS-4. More information on codes used in Electronic Health Records (EHRs) is available here: Coded variables


For which datasets do researchers need to provide codelists?

Researchers must provide codelists for their projects if they intend to use any of the following six datasets:

The datasets use a range of clinical classifications, including:

  • ICD-9 (HES & cancer registrations)

  • ICD-10 (HES)

  • SNOMED-CT (GDPPR)

  • OPCS-4 (HES)

  • ODS (cancer registrations and PCM)

  • dm+d (PCM)

  • NHS national codes (all datasets)

More information creating a codelist is available here: Codelists


How can I quantify the effect of applying codelists to my dataset?

The file ‘NHSE patient service usage’ contains the number of appearances and the date of the most recent appearance for each participant for each available NHS data source. Comparing LPS participants’ presence in NHS data sources against the data provisioned to a project will identify which participants appear in the data source but are not included in the provisioned data.


Why are there some missing variable and value labels in some datasets?

Variable labelling is primarily sourced from an NHS metadata API, but is not fully complete. Gaps in HES and MHSDS have been infilled from additional data dictionary sources. As part of ongoing work, we will be integrating additional sources to further complete the labelling and add value labels. We will inform users as these are updated. The approx. current variable label completeness is:

  • HES, NPEX, COVIDSGSS: 100%

  • MHSDS: 70-90%

  • GDPPR, CVS, CVAR: 70%

  • PCM: 40%

  • DEMOGRAPHICS, CHESS, IELISA: not available.


What version of NHS England data was I provisioned?

NHS England data provisioned to projects are locked to a specific extract. This is done using the extract_date variable found in the dataset, and is the date the data was extracted at NHS England.

All projects are ‘locked’ to an NHS quarterly extract as well to as a fixed table, which controls permissions/consent. This locking is done based on the time of first provision of each project in the TRE. This locking prevents participant numbers from fluctuating during the course of a project (if, for example, more data or more participants are added to the TRE).

Each fixed table is logged as a quarterly ‘freeze’. The freeze number, and freeze date, is provided in the ‘documentation’ folder in each TRE project space.


Why are some NHS England variables excluded or encrypted?

Prior to upload to the UK LLC TRE database, NHS data are assessed for disclosure risk. During this process, variables can be excluded from the upload if they are deemed to be disclosive. In cases where the variable has utility in an encrypted form, the variable is encrypted rather than excluded and an _e suffix is added to the end of the variable name e.g. lsoa _e. Encryption is usually applied to variables which are, or provide, proxies for location information smaller than region.


What does nic_number refer to? The NIC number refers to UK LLC's Data Sharing Agreement (DSA) with NHSE. Particpants have different NIC numbers in the NHSE datasets depending on the legal basis for linking their data (consent or Section 251). Participants who are in more than one LPS may have duplicated NHSE records if one study uses consent as its legal basis and one uses Section 251. UK LLC is working on a robust methodology for deduplicating participants who appear in multiple cohort studies, and will make this information available in Guidebook as soon as it is available.