What This Means
This research describes an ongoing study protocol designed to build a large, freely available collection of questions and answers about sexually transmitted infections (STIs) and sexual health, specifically tailored for populations in sub-Saharan Africa. The researchers are gathering real questions from people aged 15 and older through multiple channels—online, on paper, and at public events—and having medical professionals provide accurate, evidence-based answers to each question. By August 2025, more than 5,620 question-and-answer pairs had been collected, following a pilot in Kigali, Rwanda that gathered 132 questions.
The motivation for this work is that AI-powered health tools like chatbots have potential to improve access to health information, but they require large, culturally relevant training datasets to work effectively for specific populations. Currently, such datasets for STI topics in sub-Saharan Africa are lacking. The collected questions are being cleaned, organized, and tagged according to established data-sharing standards (known as FAIR principles) to make the dataset as useful as possible for researchers and developers building health AI tools.
This research suggests that crowdsourcing health questions directly from community members and patients—rather than relying only on clinician-generated content—could produce more relevant and representative training data for AI health tools. The planned open-access release of the dataset means other researchers and developers worldwide could use it to build or improve chatbots and digital tools that help people access accurate, stigma-reducing information about STIs, potentially improving health-seeking behaviors in regions with limited access to traditional health services.