The secret to easy retrieval of hospital data for real world data studies

Today, healthcare is a data-driven business, both with regards to patient care as to process optimization. As the core business of hospitals is patient care, however, it’s no wonder many of them don’t focus on using or even valorising these data. But contrary to common beliefs, a large amount of raw data is readily available and can be used in retro-active studies or to support decision making – to improve the quality of healthcare even more.
This blog shows where to find these data, the types of available data, and the difference between structured and unstructured data.

Valorisation queries based on structured data elements will result in a much more reliable set of patients than a query based on unstructured data elements.

Different types of data in healthcare settings

Depending on the use and the kind of data, every data set has its own properties for a valorisation query. The main distinctions are those between primary and secondary data use and structured vs. unstructured data.

Primary and secondary use of data

Hospitals use primary data for the reasons they are gathered for: data on blood pressure, heart rates and so on, to assess the patient’s condition in view of his/her treatment. When these data are used for what they weren’t originally collected for, it is called the secondary use of data, for example for retrospective studies. As discussed in this article on the regulations around RWD valorisation projects, the primary or secondary use of data impact the related data governance and relevant regulations.

Structured and unstructured data

Structured data are well organised: they have been entered into specific fields, often with restrictions, such as a date-format requiring the entry of dd/mm/yyyy. These restrictions guarantee the data integrity, i.e., the accuracy and consistency of data in a database. They ensure telephone numbers don’t end up in blood pressure fields.

And even though unstructured data are also kept in a database, these are often text fields with progress notes in a narrative form. Queries on these fields are possible, but their results are impacted by misspellings, typographical errors, the use of different synonyms and abbreviations by different authors.
For example, finding diabetic patients running a query on a text field will return a list of presumed diabetic patients, but you can never be sure, as patients ‘with diabetic symptoms’ might also be listed.

To eliminate this confusion, classification systems such as ICD-10-CM and SNOMED-CT were created.

Current situation at Belgian hospitals

Today, a massive amount of unstructured data is stored in hospitals, such as

Discharge letters for general practitioners
Progress notes by physicians in the patient’s file
Test reports

But there’s also a lot of structured data present, in

Hospital management systems, such as ADT (admission, discharge and transfer of patients)
Management systems for invoicing and payments (Tarfac)
Pharmacy systems, covering the medication management per patient
Clinical laboratories, managing the lab results from patients
Data collecting devices in intensive care units, in emergency departments and operating room theatres

More importantly: the data from progress notes during a patient’s hospitalisation, are captured and coded every half term. The resulting qualitative structured dataset - relatively unknown as yet – is called the Minimal Hospital Dataset (MHD – NL: MZG; FR: RHM) and has codes for every relevant diagnosis. This coded system in Belgium provides a lot of details and granularity on pathologies. All ready to be valorised. It’s just a question of asking the right questions.