Follow Us

 

NHSE Segmentation Dataset Reference Guide

A national data-driven approach to population segmentation has been developed to support Population Health Management (PHM) outlined in the NHS Long Term Plan. This Reference Guide provides the background, definitions and Segmentation Dataset output delivered as part of this initiative.

 

For further details on the subsegment (condition) definitions click here.

Background

A national data-driven approach to population segmentation has been developed to support Population Health Management (PHM) outlined in the NHS Long Term Plan.

The segmentation approach used is an adaptation of the internationally recognised ‘Bridges to Health’ (B2H) segmentation model – a life course model that groups people into 8 segments. From the healthy / generally well population to populations at the end phases of life.

Data from the National Commissioning Data Repository (NCDR) for the entire population have been transformed into a person-centred segmentation dataset (or data model) that can be used by data analysts to derive segment-specific insights. This has been developed within NHS England’s data environment.

Originally developed in 2019 by Data, Analysis and Intelligence Service (DAIS) in NHSE, Public Health England (PHE), Outcomes Based Healthcare (OBH) and Arden & GEM CSU.

Further information on the National Bridges to Health Segmentation Dataset can be found in the Population and Person Insights (PaPI) Workspace on FutureNHS.

Why is population segmentation important?

Population segmentation can be used as part of the broader PHM strategy to improve care and outcomes in many ways…

  • View system transformation programmes through a person-centred (segment specific) lens – baselining, tracking and monitoring changes following interventions or service redesign for specific cohorts
  • Target specific cohorts/populations, with different needs, in different ways depending on the care they need
  • Improve coordination of care by focussing on different population segments/subsegments – at local, regional and national level
  • Improve resource utilisation efficiency (i.e. provide more care using the same overall resources for specific populations)
  • Understand drivers of demand more accurately, and forecast and plan for changes in demand
  • Move away from ‘all things to all people at all times’ approaches to primary care delivery, to more nuanced, targeted, focused care around genuine need, prevention and sustainable care
  • Answer complex analytical and research questions about the link between care provided and resulting outcomes (e.g. the use of fully configurable and ‘live’ condition registries has been recently tested during the Covid-19 pandemic to rapidly configure, identify, segment and stratify vulnerable populations)

Bridges to Health population segmentation

What?

Segmentation categorises populations according to their health and care needs, priorities, and circumstances. The ‘Bridges to Health’ (B2H) model is a fundamentally person-focused approach, with the principal goal of ‘pursuing the health of each population segment’.

Why?

To optimise health outcomes, patient experience, efficiency, and care costs, care delivery systems should respond to the needs of different population segments in different ways.

 How?

Each segment and sub-segment is defined clinically and translated to a data definition (clinical codes and logic) which is used to create condition registers for each subsegment.

A diagram depicting OBH’s approach to segmentation, based on the ‘Bridges to Health’ model.

Source: Outcomes Based Healthcare© 2017. OBH’s approach to segmentation is based on the ‘Bridges to Health’ model (Lynn et al. 2007).

Segmentation - A life course approach

A diagram depicting OBH’s approach to segmentation, based on the ‘Bridges to Health’ model, surrounded by information about how a life course approach to healthcare can be utilised using this approach..

Segment definitions

1
Healthy / Generally well
People who are ‘healthy or generally well’, though may have acute, but self-limiting problems. The principal care processes involved relate to primary prevention, with the aim of slowing people developing a first long term condition or disability.
2
Maternal Health
Women are included in this segment during prenatal, delivery and perinatal care.
3
Acute
People with an acute illness, and are likely to return to their former level of health. Acute illness is defined as an illness that develops quickly, often severe, and lasts a relatively short period of time (often less than 1 month). Occurrences of acute episodes are often outcomes themselves for people in other segments.
4
Long Term Conditions (LTCs)
People with one or more LTCs have chronic illnesses that are rarely resolved, but which can be treated to maintain stability, and often slow progression.
5
Disability
People with one or more serious disability, including both physical and learning disabilities.
6
Incurable cancer
People with cancer who have a trajectory described as having a reasonably predictable decline in physical health over a period of weeks, months, or, in some cases, years. Almost all people in this segment are expected to die over a period of 12 months, and therefore often receiving care from palliative care services.
7
Organ Failure
People with one or more organ system failure, or suffer frequent serious exacerbations of chronic illness. This includes people with neurological conditions, or organ failure (heart, lung, liver, kidney).
8
Frailty and Dementia
People with moderate or severe frailty who are 65 years and over, or dementia, who are typically on a gradual course of decline.

Segment configuration

The visual below shows how each Segment in the Bridges to Health segmentation model is defined by a set of Subsegments/Conditions within the Segmentation Dataset, and how the Healthy / Generally Well segment is defined as people who do not meet the criteria of any other core Segment.

Colour-coded table defining each segment by the subsegment(s) it contains, whilst also depicting the relationship between the segments.

Segmentation definition development

Each subsegment has been defined using clinical codes and logic that are evidence-based, and derived from analysis of international and national best practice, guidelines and standards. OBH have spent over 10 years building and maintaining this database and codebase.

130+
National and international standards and best practice guidelines reviewed.
1773
Clinical codes included in the default segment and subsegment definitions.
3184
Clinical codes identified and reviewed.
59
Subsegments and conditions created.

The data transformation process

National Commissioning Data Repository (NCDR) Source Data

Multiple datasets from the National Commissioning Data Repository (NCDR) are fed into the Segmentation Engine.

This is a very vast amount of ‘uncleaned’ data, from multiple setting-specific health and care providers, and is typically event- and/or diagnosis-based.

Down arrow
Segmentation Engine

The Engine is an analytical data pipeline which ‘cleans’ the data, links the data together, and transforms it so that the output Segmentation Dataset presents data on a person-level basis, spanning different care settings.

Mapping tables translate the data into segments/subsegments, using clinical codes and logic based on OBH clinical evidence, national and international consensus.

Down arrow
Segmentation Dataset

The data transformation process run by the Engine produces a single dataset structured as a Data Model, designed to be as compact as possible, and quick and easy to query. The Segmentation Dataset is a set of tables that establish, for each person which segments and subsegments they are in, in any given month, as well as other demographic and geographical information relevant to the person.

Data sources used

The Segmentation Dataset is derived from a number of national operational and care planning pseudonymised patient-level data sources available in the National Commissioning Data Repository in NHS England. The following table summarises the data sources used and the number of years of data that has been longitudinally accrued.

Source datasets including time periods used to generate the National Bridges to Health Segmentation Dataset.

How the dataset works

The resulting Segmentation Dataset for local populations provides the backbone to any Population Health Management work. The Segmentation Dataset has been designed as a dimensional data model – a standard design approach for a database structure that is optimised for data analytics. This type of model is easy to understand and intuitive for analysts to use.

A dynamic, longitudinal model to analyse trends

  • For each person registered to a GP practice in a specific geography (national, regional or local), the model shows which segment and subsegments they are in, every month for a defined retrospective period, and prospectively thereafter.
  • This includes when people ‘enter’ a subsegment or ‘leave’, including when they die.
  • This allows for analysis in population movement between population segments on a monthly basis, rather than simply a ‘snapshot’.

Condition registers at any required historical ‘snapshot’

  • For any month over the last 7 years, it is easy to extract an accurate list of who is ‘in’ each segment or subsegment – including people who are ‘currently healthy / generally well’.
  • This is similar to having a ‘live’ condition register for each condition (called ‘subsegments’ in the Bridges to Health segmentation model), which covers the entire population.

Assigning each person to a single segment, or to multiple segments

  • The data model allows the user to assign people to a single, ‘highest acuity’ segment, or to multiple segments. These ‘states’ are typically used for different analysis purposes (e.g. outcomes measurement, activity analyses, expenditure, contracting), as appropriate.

Features of the dataset

In any month, between April 2016 and March 2023, the following features are available in the Dataset for each person.

Image showing the key features of the National Bridges to Health Segmentation Dataset.

Analytics opportunities

The Segmentation Dataset itself can be used to directly generate a number of insights, including:

  • Segment and subsegment prevalence counts and rates, on any given date, for any given geography
  • Segment and subsegment mortality counts and rates, over any given period, for any given geography
  • Stratification within each segment or subsegment, or across segments and subsegments:
    • By socio-demographic variables (e.g. age and gender distribution, and deprivation profiles)
    • By clinical variables (e.g. other co-occurring conditions or segments, multimorbidity, risk factors)
  • Progression flows from one segment to another

Further value in the Segmentation Dataset is derived when it’s linked to other datasets to understand activity, outcomes or cost, as examples, by segment or subsegment. Analysts can join the Dataset at record level to the most relevant datasets for the purpose they need, to generate insights for each use case. For example:

  • Linking to SUS APC, ECDS and outpatient activity data to understand total and per person activity levels for people in each segment, by type of activity
  • Linking to SUS APC, ECDS and outpatient activity data and/or SLAM data to understand total and per person tariff for people in each segment, by type of activity
  • Linking to primary care data (where available) to understand total and per person GP practice activity for people in each segment, by appointment type and healthcare professional
  • Linking to SUS datasets to measure outcomes specific to people in particular segments or subsegments (e.g. complications in people with diabetes, falls in people with frailty and/or dementia)

Comparison against QOF

These charts shows the difference between condition prevalence figures from the Segmentation Dataset and QOF data.

Absolute difference

Comparison with QOF - absolute difference.

Data & method notes:

  • Segmentation Dataset v4.0 as of 31.03.23 (derived from NCDR data)
  • QOF data as of 31.03.23
  • All ages included, unless otherwise specified
  • National data
  • The chart is expressed as an ‘absolute’ difference i.e. calculated by subtracting the Segmentation Dataset prevalence figure from the QOF prevalence figure

Relative difference

Comparison with QOF - relative difference.

Data & method notes:

  • Segmentation Dataset v4.0 as of 31.03.23 (derived from NCDR data)
  • QOF data as of 31.03.23
  • All ages included, unless otherwise specified
  • National data
  • The chart is expressed as a ‘relative’ difference i.e. calculated by subtracting the Segmentation Dataset prevalence figure from the QOF prevalence figure, as a proportion of the actual QOF prevalence for that condition. This allows the impact on conditions with a smaller than average or larger than average prevalence to be seen.

Comparison against linked data including Primary Care

These charts shows the difference between condition prevalence figures from the NHSE Segmentation Dataset and a local linked Segmentation Dataset that includes primary care data for a single ICB.

Absolute difference

Comparison with Primary care - absolute difference.

Data & method notes:

  • NHSE Segmentation Dataset as of 31.03.23
  • Local/ICB linked Segmentation Dataset as of 31.03.23
  • Only the local linked Segmentation Dataset includes primary care data
  • Data is for all people aged 18 years and over
  • Data for a single anonymised ICB
  • The chart is expressed as an ‘absolute’ difference i.e. calculated by subtracting the NHSE Segmentation Dataset prevalence figure from the Local Linked Segmentation Dataset prevalence figure

Relative difference

Comparison with Local Linked Segmentation Dataset - relative difference.

Data & method notes:

  • NHSE Segmentation Dataset as of 31.03.23
  • Local/ICB linked Segmentation Dataset as of 31.03.23
  • Only the local linked Segmentation Dataset includes primary care data
  • Data is for all people aged 18 years and over
  • Data for a single anonymised ICB
  • The chart is expressed as a ‘relative’ difference i.e. calculated by subtracting the NHSE Segmentation Dataset prevalence figure from the local linked version prevalence figure, as a proportion of the actual prevalence from the local linked version for that condition. This allows the impact on conditions with a smaller than average or larger than average prevalence to be seen.