Exploratory Comparative Analysis of Date of Diagnosis

An exploratory analysis into the likely accuracy of date of diagnosis by condition in the National Bridges to Health Segmentation Dataset. This analysis looks at the differences in date of diagnosis between GP and hospital data. This can be used to help inform and interpret results of incidence analyses conducted on the National Segmentation Dataset.

Summary results from this analysis

This analysis compares dates of diagnosis recorded for a range of conditions at person level in GP data linked to hospital data. For conditions which often present more acutely (eg. CHD, stroke, heart failure) the date of diagnosis is either first recorded in hospital data, or very similar to the date recorded in GP data. For conditions which commonly present initially in primary care (eg. CKD, depression, hypertension), a diagnosis in GP data can often precede a diagnosis recorded in hospital data by 2 years or more.

This analysis represents different coding and diagnosis patterns observed in either GP data, or hospital data, or both. The National Segmentation Dataset draws from a much wider range of source datasets, which also include Mental Health/IAPT, Learning Disabilities, Specialised Commissioning, Maternity, and some data derived from GP data. Therefore the dates of diagnosis recorded in the National Segmentation Dataset for these conditions are likely to be significantly more comprehensively covered, and closer to the first ever recorded diagnosis date, than in hospital data alone.

At A Glance

  • Accuracy of incidence analysis depends on the accuracy of the date the condition is recorded in the data used.
  • By considering a local ICB Segmentation Dataset which is derived from primary care and secondary care data (SUS) linked at person-level, it is possible to compare the time difference between recorded diagnosis of conditions in primary care and secondary care.
  • By establishing conditions where the first diagnosis is typically recorded first in secondary care data, or within 2 years of recording in primary care data, it is possible to identify the conditions where incidence analysis based on the National Segmentation Dataset is likely to be accurate.
  • The National Segmentation Dataset is derived from a longer time period of SUS data (14 years compared to 6 in the local ICB data), and a number of other datasets (see list of source datasets). Therefore this analysis likely underestimates the accuracy of the ‘date of diagnosis’ in the National Segmentation Dataset.

Comparison of first ever ‘recording’ of a condition between GP data
and hospital data between March 2017 and April 2018 in a local ICB

  • Green: Proportion of people who have a recording of a condition in a hospital admission or outpatient appointment, either as a first ever diagnosis or within 1 month of a first diagnosis in primary care.
  • Amber: Proportion of people who have a first ever recording of a condition in primary care with a subsequent hospital admission or outpatient appointment recording after 1 month but within 2 years of a first recording in primary care. 
  • Red: Proportion of people who have a first ever recording of a condition in primary care, with no recording in a hospital admission or outpatient appointment of that condition within 2 years.

Data Source: An existing local ICB Segmentation Dataset – derived from linked data from GP Practices and hospital admissions and outpatients appointments (SUS).

Incidence Reliability Score in National Segmentation Dataset

The column on the right combines the green and amber bars to produce an overall ‘incidence reliability score’ i.e. where a recording of a condition occurs in hospital data first or within 2 years of the condition being recorded in primary care. This indicates the likely accuracy of the ‘date of diagnosis’ in the National Bridges to Health Segmentation Dataset.

Conditions near the top of the chart have the highest reliability scores, and are typically those which present first in secondary care such as coronary heart disease and severe interstitial lung disease. Conditions towards the bottom of the chart such as asthma and chronic kidney disease have a less reliable date of diagnosis, as they are often diagnosed and managed in primary care in their early stages, in people with fewer long term conditions, who are less likely to have hospital admissions.

Diabetes

The National Diabetes Audit (NDA) data is one of the source datasets used to derive the National Segmentation Dataset. NDA data is extracted from primary care, and therefore the ‘date of diagnosis’ for diabetes in the National Segmentation Dataset is likely to be highly accurate.

Limitations

A number of conditions available in the National Segmentation Dataset are not included in this analysis for different reasons including definitional differences, complexity and low volumes. These include: Cystic Fibrosis, Chronic Pain, Osteoporosis, Sarcoidosis, Sickle Cell Disease, Frailty, Incurable Cancer subsegments, as well as a number of subsegments in the Organ Failure segment.

Last Updated: 30th October 2023