Introduction
A national data asset which enables Population Health Management at all levels of the NHS, and is the largest person-level longitudinal PHM dataset globally.
Use it for
Identifying cohorts of the population with similar needs
Benchmarking against activity, outcomes and costs
Modelling scenarios to allocate resources effectively
Evaluating and tracking the outcomes of any interventions
Person-level data for the entire GP registered population of England, more than 60 million people.
59 clinically-curated condition registers curated by a team of clinicians, data analysts, and public health experts.
Includes people who are healthy or generally well, which is crucial for Population Health Management related to primary prevention.
8 years of data until March 2024 including people who have died, been born, movement of populations between GP practices and changes in health states over time within this period.
The Segmentation Dataset has been used as the key data source for a number of landmark analyses which have led to peer reviewed publications in high impact medical journals, including Nature Medicine and The Lancet.
Supporting programmes and evaluation
Population and Person Insight (PaPI) Dashboard
Prevention & LTC (PLTC) Programmes
Darzi Review
10 Year Plan
Neighbourhood Health Guidelines 2025/26
National Diabetes Prevention Programme
Bridges to Health Population Segmentation
What?
Segmentation categorises populations according to their health and care needs, priorities, and circumstances. The ‘Bridges to Health’ (B2H) model is a fundamentally person-focused approach, with the principal goal of ‘pursuing the health of each population segment’.
Why?
To optimise health outcomes, patient experience, efficiency, and care costs, care delivery systems should respond to the needs of different population segments in different ways.
How?
Each segment and subsegment is defined clinically and translated to a data definition (sequences of clinical codes and logic) which is used to create condition registers for each subsegment.
Source: Outcomes Based Healthcare© 2017
OBH’s approach to segmentation is based on the ‘Bridges to Health’ model (Lynn et al. 2007)
Segment and Subsegment Configuration
Unique Features
Fully longitudinal and captures population dynamics
Longitudinal at person level including people who have died, been born, moved between GP practices and changes in health states over time.
Data can be used for analysis of progression of health states to multiple long term conditions, and incidence (new diagnosis) trends.
59 clinically curated condition registers
Curated by a team of clinicians, data analysts, and public health experts. These conditions align with the global Delphi consensus for the definition of multiple long term conditions.
Based on regularly reviewed national and international standards and best practice guidelines with over 130 reviewed to date, clinical definitions are translated into data definitions using complex sequencing logic of ICD-10, OPCS, and SNOMED codes amongst other flags and definitions across 10 national data sources.
Healthy or generally well population
The dataset puts individuals at the centre, with full population coverage.
As the national GP registered population is included, uniquely the dataset includes people who are healthy or generally well, which is crucial for Population Health Management related to primary prevention and allows for the calculation of national HEALTHSPAN®.
Refreshed regularly since 2019, covering a period of 8 years (April 2016 to March 2024), with data available on a monthly basis
Assurance and Validation
Analytical pipeline and dataset tests
152 tests are run on the data pipeline and dataset, including aggregate back testing of key metrics such as prevalence, incidence and mortality, as well as pipeline step-specific row-level scenario testing and aggregate checks of logic.
Benchmarking prevalence and incidence
Benchmarked against internal and external published sources of data (around 90 publicly available benchmark figures such as QOF, CVDPREVENT, HSE data and national registries, as well as peer-reviewed publications).
Peer review and major publications
The Segmentation Dataset is supporting a range of national interventions working with Universities of Leicester, Dublin and Imperial, as well as the Darzi Review, and 10 Year Plan.
The Dataset has been used in published studies in world class journals, including the largest ever study of multimorbidity globally.>
The burden of diabetes-associated MLTCs on years of life spent and lost
Nature Medicine 2024 Aug 1:1-8.
Learn more
Prevalence of MLTCs in England: A whole population study of over 60 million people
Journal of the Royal Society of Medicine. 2024 Mar;117(3):104-17.
Learn more
Associations of type 1 and type 2 diabetes with COVID-19-related mortality
The Lancet Diabetes and Endocrinology. 2020 Oct; 8: 813–22
Learn more
Resources on FutureNHS
Further information can be found on the Population and Person Insights (PaPI) Workspace on FutureNHS
https://future.nhs.uk/PaPI
Visit the Data section on fNHS to find all these resources
Release notes
Contains details of any changes from the previous release.
Release notes
Getting Started in UDAL
A user guide for analysts on how to get access to the Segmentation Dataset in UDAL
Getting started with UDAL
Information on dataset version, availability and the release schedule
Visit FutureNHS
Analyst Training Materials
Data model structure
A run through of all the tables and columns available in the data.
Interpretation and analysis
Detailed documentation useful for understanding the data, and interpreting results.
How to query the Segmentation Dataset
Demo videos showing example SQL queries in Databricks on how to calculate core PHM analyses such as:
- prevalence by highest acuity over time
- prevalence by number of conditions, in people with diabetes
- proportion of people with depression by deprivation decile
How to link and query the National Segmentation Dataset with other datasets
Demo videos showing how to calculate emergency admissions by segment over time. This includes how to load and link with SUS APC data as well as how to calculate this measure using person-time.
Why is Population Segmentation Important?
Population segmentation can be used as part of the broader PHM strategy to improve care and outcomes – see examples below:
View system transformation programmes through a person-centred (and segment specific) lens – baselining, tracking and monitoring changes following interventions or service redesign for specific cohorts.
Target specific cohorts/populations, with different needs, in different ways depending on the desired outcomes.
Improve coordination of care by focusing on different population segments/subsegments – at local, regional and national level.
Improve resource utilisation efficiency (i.e. provide better care using the same overall resources for specific populations).
Stratify populations such as those who are currently healthy / generally well to identify those most at risk of developing long term conditions.
Understand drivers of demand more accurately, and forecast and plan for changes in demand.
Move away from ‘all things to all people at all times’ approaches to primary care delivery, to more nuanced, targeted, focused care around genuine need, prevention and sustainable care.
Answer complex analytical and research questions about the link between care or interventions provided and resulting outcomes.
PHM Use Cases Examples
Using the unique features of the National Segmentation Dataset, it can either be used as a standalone dataset or by linking to activity and cost data. Specific use cases cover a wide range of core PHM functions under ‘secondary uses’ of data.
Understanding population need
Identification and prioritisation of opportunities
Complex Multimorbidity Identification
Analyse the dataset to identify individuals with multiple long term conditions (LTCs) who may not be receiving coordinated care. For example, identifying the cohort with both diabetes, severe mental illness (SMI) and organ failure to understand the prevalence and distribution of complex and high intensity health needs.
Health Inequalities Analysis
Utilise the health state, demographic and geographic data, including deprivation scores and ethnicity, to identify areas where certain conditions or health states have higher prevalence in deprived communities, or variations in condition management between ethnic groups.
End of Life Care Planning
Use the Segmentation Dataset’s ability to identify cohorts of people in their last 5 years of life to assess end-of-life care planning, time spent at home and identify gaps in palliative care referrals, alongside cost.
Prevention Opportunities
Analyse progression patterns to identify groups at risk of developing additional LTCs or those who might benefit from preventive interventions, including those currently in the Healthy / Generally Well segment.
Benchmarking activity, costs and outcomes
Benchmark nationally consistent opportunities between areas
Service Planning and Resource Allocation
Compare capitated expenditure across different segments to identify areas where service provision doesn’t match population need. This can help in benchmarking resource allocation efficiency between different regions or ICBs.
High Unplanned Care User Analysis
Benchmark emergency admission rates and A&E attendance for people with specific combinations of long term conditions across different regions to identify areas with successful community support programmes.
Early Intervention for Progressive Conditions
Compare the rates of progression from early to severe stages of conditions (e.g. organ failure, frailty) and subsequent mortality, across different areas to identify successful early intervention strategies.
Care Coordination Gaps
Benchmark the proportion of high-risk individuals receiving integrated care across different regions or ICBs to identify best practices and areas for improvement. For example, by benchmarking Days Disrupted by Care in people with MLTCs.
Resource utilisation scenarios
Model costs and ROI of interventions
Multimorbidity Management
Model the potential cost savings and improved outcomes of implementing coordinated care programmes for individuals with multiple LTCs, based on successful interventions in comparable areas.
Preventive Intervention ROI
Calculate the return on investment for implementing preventive interventions for at-risk groups identified in the Healthy / Generally Well segment, comparing potential future healthcare costs with intervention costs.
End of Life Care Optimisation
Model the cost-effectiveness of expanding palliative care services based on the identified cohort in their last 5 years of life, considering both quality of life improvements and potential reductions in acute care utilisation.
Health Inequalities Reduction
Estimate the potential impact and cost of targeted interventions to reduce health inequalities, focusing on areas with higher prevalence of certain conditions in deprived communities.
Evaluation and tracking impact
Evaluate interventions implemented retrospectively and prospectively with nationally consistent data
Segment Progression Tracking
Monitor changes in population distribution across segments over time to evaluate the impact of Population Health Management strategies. For example, time spent in the Healthy/Generally Well segment as a proportion of overall life span, ‘HEALTHSPAN®’.
Outcomes Measurement
Track condition and cohort-specific outcomes as part of care planning, using the Segmentation Dataset to establish baselines and measure improvements over time.
Service Redesign Impact Assessment
Evaluate the impact of service redesign by monitoring changes in segment-specific capitated expenditure and health activity and outcomes before and after implementation, using comparable or matched cohorts.
Health Inequality Intervention Effectiveness
Assess the effectiveness of interventions aimed at reducing health inequalities by tracking changes in condition prevalence, management and outcomes across different demographic and geographic groups over time.
Segmentation Dataset Comparison Against QOF Data
Absolute difference
This chart shows the difference between condition prevalence figures from the Segmentation Dataset and QOF data.

Click image to enlarge
Data
- Segmentation Dataset v4.2 as of 31.03.2024
- QOF data as of 31.03.2024 (latest available), except for Depression which is 31.03.2023 as prevalence is no longer reported by QOF
- All ages included, unless otherwise specified
- National data
Method
The chart is expressed as an ‘absolute’ difference i.e. calculated by subtracting the Segmentation Dataset prevalence figure from the QOF prevalence figure.
Relative difference
This chart shows the difference between condition prevalence figures from the Segmentation Dataset and QOF data.

Click image to enlarge
Data
- Segmentation Dataset v4.2 as of 31.03.2024
- QOF data as of 31.03.2024 (latest available), except for Depression which is 31.03.2023 as prevalence is no longer reported by QOF
- All ages included, unless otherwise specified
- National data
Method
The chart is expressed as a ‘relative’ difference i.e. calculated by subtracting the Segmentation Dataset prevalence figure from the QOF prevalence figure, as a proportion of the actual QOF prevalence for that condition. This allows the impact on conditions with a smaller than average or larger than average prevalence to be seen.
Segmentation Dataset Comparison Against Linked Data including Primary Care
Absolute difference
This chart shows the difference between condition prevalence figures from the NHSE Segmentation Dataset and a local linked Segmentation Dataset that includes primary care data for a single ICB.

Click image to enlarge
Data
- NHSE Segmentation Dataset as of 31.03.2024 for single matched ICB population
- Local/ICB linked Segmentation Dataset as of 31.12.2023
- Data is for all people aged 18 years and over
- Data for a single anonymised ICB
Method
The chart is expressed as an ‘absolute’ difference i.e. calculated by subtracting the NHSE Segmentation Dataset prevalence figure from the Local Linked Segmentation Dataset prevalence figure.
Relative difference
This chart shows the difference between condition prevalence figures from the NHSE Segmentation Dataset and a local linked Segmentation Dataset that includes primary care data for a single ICB.

Click image to enlarge
Data
- NHSE Segmentation Dataset as of 31.03.2024 for single matched ICB population
- Local/ICB linked Segmentation Dataset as of 31.12.2023
- Data is for all people aged 18 years and over
- Data for a single anonymised ICB
Method
The chart is expressed as a ‘relative’ difference i.e. calculated by subtracting the NHSE Segmentation Dataset prevalence figure from the local linked version prevalence figure, as a proportion of the actual prevalence from the local linked version for that condition. This allows the impact on conditions with a smaller than average or larger than average prevalence to be seen.