NHSE Segmentation Dataset Reference Guide

A national data-driven approach to population segmentation has been developed to support Population Health Management (PHM) outlined in the NHS Long Term Plan. This Reference Guide provides the background, definitions and Segmentation Dataset output delivered as part of this initiative.

 

Read more about subsegment / condition definitions.

Introduction

A national data asset which enables Population Health Management at all levels of the NHS, and is the largest person-level longitudinal PHM dataset globally.

Use it for

Identifying cohorts of the population with similar needs

Benchmarking against activity, outcomes and costs

Modelling scenarios to allocate resources effectively

Evaluating and tracking the outcomes of any interventions

Person-level data for the entire GP registered population of England, more than 60 million people.

59 clinically-curated condition registers curated by a team of clinicians, data analysts, and public health experts.

Includes people who are healthy or generally well, which is crucial for Population Health Management related to primary prevention.

8 years of data until March 2024 including people who have died, been born, movement of populations between GP practices and changes in health states over time within this period.

The Segmentation Dataset has been used as the key data source for a number of landmark analyses which have led to peer reviewed publications in high impact medical journals, including Nature Medicine and The Lancet.

Supporting programmes and evaluation

Population and Person Insight (PaPI) Dashboard
Prevention & LTC (PLTC) Programmes
Darzi Review

10 Year Plan
Neighbourhood Health Guidelines 2025/26
National Diabetes Prevention Programme

Bridges to Health Population Segmentation

What?

Segmentation categorises populations according to their health and care needs, priorities, and circumstances. The ‘Bridges to Health’ (B2H) model is a fundamentally person-focused approach, with the principal goal of ‘pursuing the health of each population segment’.

Why?

To optimise health outcomes, patient experience, efficiency, and care costs, care delivery systems should respond to the needs of different population segments in different ways.

How?

Each segment and subsegment is defined clinically and translated to a data definition (sequences of clinical codes and logic) which is used to create condition registers for each subsegment.

Source: Outcomes Based Healthcare© 2017
OBH’s approach to segmentation is based on the ‘Bridges to Health’ model (Lynn et al. 2007)

Segment and Subsegment Configuration

Unique Features

Fully longitudinal and captures population dynamics

Longitudinal at person level including people who have died, been born, moved between GP practices and changes in health states over time.

Data can be used for analysis of progression of health states to multiple long term conditions, and incidence (new diagnosis) trends.

59 clinically curated condition registers

Curated by a team of clinicians, data analysts, and public health experts. These conditions align with the global Delphi consensus for the definition of multiple long term conditions.

Based on regularly reviewed national and international standards and best practice guidelines with over 130 reviewed to date, clinical definitions are translated into data definitions using complex sequencing logic of ICD-10, OPCS, and SNOMED codes amongst other flags and definitions across 10 national data sources.

Healthy or generally well population

The dataset puts individuals at the centre, with full population coverage.

As the national GP registered population is included, uniquely the dataset includes people who are healthy or generally well, which is crucial for Population Health Management related to primary prevention and allows for the calculation of national HEALTHSPAN®.

Refreshed regularly since 2019, covering a period of 8 years (April 2016 to March 2024), with data available on a monthly basis

Assurance and Validation

Analytical pipeline and dataset tests

152 tests are run on the data pipeline and dataset, including aggregate back testing of key metrics such as prevalence, incidence and mortality, as well as pipeline step-specific row-level scenario testing and aggregate checks of logic.

Benchmarking prevalence and incidence

Benchmarked against internal and external published sources of data (around 90 publicly available benchmark figures such as QOF, CVDPREVENT, HSE data and national registries, as well as peer-reviewed publications).

Peer review and major publications

The Segmentation Dataset is supporting a range of national interventions working with Universities of Leicester, Dublin and Imperial, as well as the Darzi Review, and 10 Year Plan.

The Dataset has been used in published studies in world class journals, including the largest ever study of multimorbidity globally.

The burden of diabetes-associated MLTCs on years of life spent and lost

Nature Medicine 2024 Aug 1:1-8.

Learn more

Prevalence of MLTCs in England: A whole population study of over 60 million people

Journal of the Royal Society of Medicine. 2024 Mar;117(3):104-17.

Learn more

Associations of type 1 and type 2 diabetes with COVID-19-related mortality

The Lancet Diabetes and Endocrinology. 2020 Oct; 8: 813–22

Learn more

Resources on FutureNHS

Further information can be found on the Population and Person Insights (PaPI) Workspace on FutureNHS

https://future.nhs.uk/PaPI

Visit the Data section on fNHS to find all these resources

Release notes

Contains details of any changes from the previous release.

Release notes

Technical Output Specification

Contains a data model diagram and table and column descriptions

Technical Output Specification

Getting Started in UDAL

A user guide for analysts on how to get access to the Segmentation Dataset in UDAL

Getting started with UDAL

Information on dataset version, availability and the release schedule

Visit FutureNHS

Analyst Training Materials

Visit Segmentation Dataset Training on fNHS to find short training videos for analysts and users (2-3 mins each).

Data model structure 

A run through of all the tables and columns available in the data.

Interpretation and analysis

Detailed documentation useful for understanding the data, and interpreting results.

How to query the Segmentation Dataset

Demo videos showing example SQL queries in Databricks on how to calculate core PHM analyses such as:

  • prevalence by highest acuity over time
  • prevalence by number of conditions, in people with diabetes
  • proportion of people with depression by deprivation decile

How to link and query the National Segmentation Dataset with other datasets

Demo videos showing how to calculate emergency admissions by segment over time. This includes how to load and link with SUS APC data as well as how to calculate this measure using person-time.

Data Sources Used

The Data Transformation Process

Clinical Informatics Approach

The work undertaken to translate a population segmentation model into a National Segmentation Dataset has been developed by OBH over the last 12 years, and since 2019 in partnership with NHS England.

An evidence-based approach has been used, derived from analysis of over 130 international and national best practice, guidelines and standards. The definitions that form part of the segmentation analytical pipeline are developed, tested, validated and maintained on a regular basis by OBH’s clinical team.

Image showing the key features of the National Bridges to Health Segmentation Dataset.

Why is Population Segmentation Important?

Population segmentation can be used as part of the broader PHM strategy to improve care and outcomes – see examples below:

View system transformation programmes through a person-centred (and segment specific) lens – baselining, tracking and monitoring changes following interventions or service redesign for specific cohorts.

Target specific cohorts/populations, with different needs, in different ways depending on the desired outcomes.

Improve coordination of care by focusing on different population segments/subsegments – at local, regional and national level.

Improve resource utilisation efficiency (i.e. provide better care using the same overall resources for specific populations).

Stratify populations such as those who are currently healthy / generally well to identify those most at risk of developing long term conditions.

Understand drivers of demand more accurately, and forecast and plan for changes in demand.

Move away from ‘all things to all people at all times’ approaches to primary care delivery, to more nuanced, targeted, focused care around genuine need, prevention and sustainable care.

Answer complex analytical and research questions about the link between care or interventions provided and resulting outcomes.

PHM Use Cases Examples

Using the unique features of the National Segmentation Dataset, it can either be used as a standalone dataset or by linking to activity and cost data. Specific use cases cover a wide range of core PHM functions under ‘secondary uses’ of data.

Understanding population need

Identification and prioritisation of opportunities

Complex Multimorbidity Identification

Analyse the dataset to identify individuals with multiple long term conditions (LTCs) who may not be receiving coordinated care. For example, identifying the cohort with both diabetes, severe mental illness (SMI) and organ failure to understand the prevalence and distribution of complex and high intensity health needs.

Health Inequalities Analysis

Utilise the health state, demographic and geographic data, including deprivation scores and ethnicity, to identify areas where certain conditions or health states have higher prevalence in deprived communities, or variations in condition management between ethnic groups.

End of Life Care Planning

Use the Segmentation Dataset’s ability to identify cohorts of people in their last 5 years of life to assess end-of-life care planning, time spent at home and identify gaps in palliative care referrals, alongside cost.

Prevention Opportunities

Analyse progression patterns to identify groups at risk of developing additional LTCs or those who might benefit from preventive interventions, including those currently in the Healthy / Generally Well segment.

Benchmarking activity, costs and outcomes

Benchmark nationally consistent opportunities between areas

Service Planning and Resource Allocation

Compare capitated expenditure across different segments to identify areas where service provision doesn’t match population need. This can help in benchmarking resource allocation efficiency between different regions or ICBs.

High Unplanned Care User Analysis

Benchmark emergency admission rates and A&E attendance for people with specific combinations of long term conditions across different regions to identify areas with successful community support programmes.

Early Intervention for Progressive Conditions

Compare the rates of progression from early to severe stages of conditions (e.g. organ failure, frailty) and subsequent mortality, across different areas to identify successful early intervention strategies.

Care Coordination Gaps

Benchmark the proportion of high-risk individuals receiving integrated care across different regions or ICBs to identify best practices and areas for improvement. For example, by benchmarking Days Disrupted by Care in people with MLTCs.

Resource utilisation scenarios

Model costs and ROI of interventions

Multimorbidity Management

Model the potential cost savings and improved outcomes of implementing coordinated care programmes for individuals with multiple LTCs, based on successful interventions in comparable areas.

Preventive Intervention ROI

Calculate the return on investment for implementing preventive interventions for at-risk groups identified in the Healthy / Generally Well segment, comparing potential future healthcare costs with intervention costs.

End of Life Care Optimisation

Model the cost-effectiveness of expanding palliative care services based on the identified cohort in their last 5 years of life, considering both quality of life improvements and potential reductions in acute care utilisation.

Health Inequalities Reduction

Estimate the potential impact and cost of targeted interventions to reduce health inequalities, focusing on areas with higher prevalence of certain conditions in deprived communities.

Evaluation and tracking impact

Evaluate interventions implemented retrospectively and prospectively with nationally consistent data

Segment Progression Tracking

Monitor changes in population distribution across segments over time to evaluate the impact of Population Health Management strategies. For example, time spent in the Healthy/Generally Well segment as a proportion of overall life span, ‘HEALTHSPAN®’.

Outcomes Measurement

Track condition and cohort-specific outcomes as part of care planning, using the Segmentation Dataset to establish baselines and measure improvements over time.

Service Redesign Impact Assessment

Evaluate the impact of service redesign by monitoring changes in segment-specific capitated expenditure and health activity and outcomes before and after implementation, using comparable or matched cohorts.

Health Inequality Intervention Effectiveness

Assess the effectiveness of interventions aimed at reducing health inequalities by tracking changes in condition prevalence, management and outcomes across different demographic and geographic groups over time.

Segmentation Dataset Comparison Against QOF Data

Absolute difference

This chart shows the difference between condition prevalence figures from the Segmentation Dataset and QOF data.

Comparison with QOF - absolute difference.

Data

  • Segmentation Dataset v4.2 as of 31.03.2024
  • QOF data as of 31.03.2024 (latest available), except for Depression which is 31.03.2023 as prevalence is no longer reported by QOF 
  • All ages included, unless otherwise specified
  • National data

Method

The chart is expressed as an ‘absolute’ difference i.e. calculated by subtracting the Segmentation Dataset prevalence figure from the QOF prevalence figure.

Relative difference

This chart shows the difference between condition prevalence figures from the Segmentation Dataset and QOF data.

Comparison with QOF - relative difference.

Data

  • Segmentation Dataset v4.2 as of 31.03.2024
  • QOF data as of 31.03.2024 (latest available), except for Depression which is 31.03.2023 as prevalence is no longer reported by QOF 
  • All ages included, unless otherwise specified
  • National data

Method

The chart is expressed as a ‘relative’ difference i.e. calculated by subtracting the Segmentation Dataset prevalence figure from the QOF prevalence figure, as a proportion of the actual QOF prevalence for that condition. This allows the impact on conditions with a smaller than average or larger than average prevalence to be seen.

Segmentation Dataset Comparison Against Linked Data including Primary Care

Absolute difference

This chart shows the difference between condition prevalence figures from the NHSE Segmentation Dataset and a local linked Segmentation Dataset that includes primary care data for a single ICB.

Comparison with QOF - absolute difference.

Data

  • NHSE Segmentation Dataset as of 31.03.2024 for single matched ICB population
  • Local/ICB linked Segmentation Dataset as of 31.12.2023
  • Data is for all people aged 18 years and over
  • Data for a single anonymised ICB

Method

The chart is expressed as an ‘absolute’ difference i.e. calculated by subtracting the NHSE Segmentation Dataset prevalence figure from the Local Linked Segmentation Dataset prevalence figure.

Relative difference

This chart shows the difference between condition prevalence figures from the NHSE Segmentation Dataset and a local linked Segmentation Dataset that includes primary care data for a single ICB.

Comparison with QOF - relative difference.

Data

  • NHSE Segmentation Dataset as of 31.03.2024 for single matched ICB population
  • Local/ICB linked Segmentation Dataset as of 31.12.2023
  • Data is for all people aged 18 years and over
  • Data for a single anonymised ICB

Method

The chart is expressed as a ‘relative’ difference i.e. calculated by subtracting the NHSE Segmentation Dataset prevalence figure from the local linked version prevalence figure, as a proportion of the actual prevalence from the local linked version for that condition. This allows the impact on conditions with a smaller than average or larger than average prevalence to be seen.