By Peter Marsh, Data Scientist, Climate System Analysis Group (CSAG), University of Cape Town, South Africa
When I joined the HE²AT Center as a data scientist two years ago, working with the Climate System Analysis Group (CSAG), I was excited to apply my open-source and technical background to real-world African challenges. I didn’t expect how much I’d learn—not just about data but collaboration across disciplines.
Our goal was to create a shared, secure computing environment that would make it easier for researchers to work with climate and health data. Historically, collaboration across institutions was hard, especially when dealing with sensitive clinical data or massive, unwieldy climate datasets. We set out to remove technical roadblocks so researchers could better study the health impacts of climate change.
A Fragmented landscape
One of our biggest challenges was the lack of centralized, secure infrastructure for handling sensitive health data. Existing clinical health data holds valuable insights into how climate events, like heatwaves or rising temperatures, affect public health. But sharing and analyzing this data across institutions wasn’t straightforward, and ensuring privacy was critical.
Climate data posed a different challenge. CSAG manages over a petabyte of climate information, but its complexity and volume make it hard to use, especially for health researchers unfamiliar with climate science. Many of the links between climate and health remained out of reach simply because the data weren’t connected or easy to work with.
The HE²AT approach
So, we set out to change that, starting with a secure, private cloud system built on CSAG’s high-performance infrastructure. At the center of this setup is a shared JupyterLab workspace, which makes it easier for teams across disciplines to collaborate in real time.
Accessing climate data was another major hurdle. CSAG holds more than a petabyte of curated climate information, but it hasn’t always been easy for health researchers to use. By leveraging open-source tools like Intake, kerchunk, zarr, and xarray, we’ve made this data analysis-ready and far more approachable, even for those without a background in climate science. These tools allow researchers without a climate science background to explore and use the data more easily.
Key outcomes
This setup has made interdisciplinary collaboration much easier. Public health experts, epidemiologists, and environmental scientists can now work together using shared tools and data in one place.
Having a secure, shared environment has made it possible to collaborate across institutions in ways that weren’t feasible before.
Importantly, this system supports the generation of evidence needed to build climate-resilient health systems, which is critical as climate-related health risks rise across the continent.
The infrastructure has also strengthened capacity across the HE²AT consortium. Access to advanced tools and facilities has enabled more researchers to participate meaningfully in climate-health research.
A big reason this project worked is because of the groundwork already laid at CSAG. Years of investment in infrastructure and careful data management gave us a strong base to build on. Just as important has been the hands-on support from colleagues like Roger Duffett, Pierre Kloppers, and Lisa Van Aardenne, who’ve helped shape the system and keep it running smoothly.
Looking ahead
We’re now considering what it takes to keep this going. The system we’ve built is solid, but it won’t run itself—it needs continued support and investment to stay useful and up to date. This model could work well in other parts of Africa, especially where similar barriers to data access exist.
This project has been satisfying because of the technical work and because it’s helped make it easier for people to do important research. Giving researchers better access to data and tools means they can start tackling the big questions around climate and health that were harder to approach.

