Accelerating Clinical Health Data Harmonisation with AI: Building ‘The Mapping App’

by | May 30, 2025

By Peter Marsh, Data Scientist, Climate System Analysis Group (CSAG), University of Cape Town, South Africa

At the Climate System Analysis Group (CSAG), I focus on building open-source tools to support scientific collaboration. As part of the HE²AT Center, I have worked on a project to make clinical health data easier to harmonise and analyse across large, multi-country research efforts.

That work led to the development of ‘The Mapping App’, an AI-assisted tool designed to speed up and scale the harmonisation of clinical data drawn from a wide range of African studies.

The Scale of the problem

Until now, harmonising clinical data has been done manually—usually by a small team of experts working with just a few studies at a time. In contrast, the HE²AT Center aims to combine data from more than 100 African cohort studies, covering over 200,000 patients across 12 countries and 40 trials, into a single FAIR-aligned database. Meeting that scale meant we needed a different approach that could grow without losing accuracy or oversight.

Building ‘The Mapping App’

To meet that challenge, we created the Mapping App. At the tool’s core is an ontology recommendation engine that suggests the most likely matches between incoming variables and a curated set of 155 target variables. During validation, it ranked the correct variable first in 82% of cases, and within the top five 92% of the time. We also built a confidence indicator and a variable transformation engine powered by a large language model (LLM). This engine reads expert-written instructions and converts them into Python-like code. In early testing, it could automatically transform 22% of variables without manual input.

The Mapping App allows researchers to work more efficiently with complex datasets by combining automated tools with expert supervision. It is already helping make clinical data easier to use in climate-health studies, particularly those focused on heat-related health outcomes.

Open-source and reusable

We built the Mapping App using open-source tools like Streamlit, making testing, sharing, and adapting easier across teams. The tool is freely available on GitHub for others to explore or modify: https://github.com/csag-uct/Metadata-Harmonisation-Tool

Researchers at the University of Witwatersrand (Wits) are now leading a new project that builds on this work, expanding the tool’s functionality to support different harmonisation needs.

Collaboration and expert guidance

Throughout the development process, we worked closely with experts in data harmonisation, including Katherine Johnston, Lyndon Zass, and Wei Kheng Teh from eLwazi. Their advice helped us define the project scope and build a tool that others across the HE²AT and DS-I Africa networks can use.

When we first proposed the idea, Katherine Johnston said, “If you can make this work, it will be a game changer.” That input helped guide the project from the very beginning.

Looking ahead

Standardising clinical data is critical for cross-study, collaborative research—especially in fields like climate and health, where linked datasets are essential. The Mapping App offers a practical way to make that work more efficiently and scalably while allowing expert input where needed. We hope others in the field will use, adapt, or expand on the tool to support their research and collaboration goals.

Recent Impact Stories

From Perceptions to Preparedness

Mapping Heat Risk Awareness in Abidjan By Dely Iba Dieudonné Researcher, Centre Suisse de Recherches Scientifiques en Côte d'Ivoire (CSRS) Understanding Heat Through Public Perception Dely Iba Dieudonné, a postdoctoral researcher at the Centre Suisse de Recherches...

read more

From Data Deserts to Decision Tools

By Liberty Makacha Environmental Epidemiologist, Midlands State University CeSHHAR Zimbabwe PhD Student , King’s College London / Imperial College London How the ETIQUET Project is Mapping Environmental Health Risks in Africa Motivated by the lack of reliable and...

read more