What This Means
This research describes the challenges of conducting online surveys about sexual health with teenagers and young adults. When researchers ran social media ads to recruit participants aged 15-24 for a sexual health survey offering a $15 reward, they received over 20,000 responses — but after carefully checking the data, less than a quarter of those responses turned out to be legitimate. The study walks through the step-by-step process the researchers used to identify and remove fake responses, including those generated by automated computer programs called 'bots,' duplicate submissions, and responses from people who didn't actually meet the study requirements.
The verification process was extensive and involved two stages. First, the survey itself was built with several automatic detection tools, including CAPTCHA tests, fraud-scoring software, and 'honeypot' trap questions designed to catch bots. Second, participants who appeared to complete the survey legitimately were linked to a separate incentive survey, and the researchers cross-checked information like IP addresses, geographic coordinates, and personal details between the two surveys to confirm the same real person completed both. Out of 462 people who made it through to receive a reward, 17 were ultimately disqualified, leaving a final sample of 445 verified participants.
This research suggests that web-based surveys offering financial incentives are highly attractive to bots and fraudulent actors, and that researchers cannot rely on any single detection method alone. The findings are particularly relevant for public health researchers trying to study sensitive topics like sexual health in young people, who may prefer the anonymity of online surveys but whose data quality is at serious risk without rigorous, multi-layered verification protocols. The paper serves as a practical tutorial for other researchers facing similar challenges in digital health data collection.