A multiarmed bandit approach using a two-stage reward prediction model for personalized dietary and exercise recommendations demonstrated significant improvement in postprandial glucose levels in simulation and a 23% average improvement in actual glucose responses in a small real-world experiment with 6 participants.
Key Findings
Results
The proposed multiarmed bandit algorithm significantly improved postprandial glucose levels compared to a randomized policy in simulation experiments.
The method uses a two-stage reward prediction model where actions are combinations of total carbohydrate intake and postprandial walking duration
The reward is defined as the reduction in postprandial glucose levels
The online algorithm demonstrated significant improvement over a randomized policy in simulation
The simulation experiment validated the online planning approach for personalized recommendations
Results
In a small real-world experiment with 6 participants, the personalized recommendation policy achieved a 23% average improvement in actual glucose responses compared to a randomized policy.
The real-world experiment involved 6 participants
A simplified version of the proposed method was used with a single update of the recommendation policy into a personalized one
A 23% improvement on average in actual glucose responses was observed
Improvement was accompanied by behavioral adherence to recommendations concerning carbohydrate intake and postprandial walking
Methods
The proposed method uses a two-stage prediction approach that first predicts behavioral responses to an action and subsequently predicts the postprandial glycemic response.
The action space is defined as a combination of total carbohydrate intake and postprandial walking duration
Reward prediction is implemented in two stages: predicted behavioral responses to an action, followed by postprandial glycemic response
This design directly optimizes clinical outcomes (postprandial glucose levels) rather than focusing solely on behavioral changes
The approach addresses a gap in prior reinforcement learning studies that focused on behavioral changes while overlooking clinical outcomes
Background
Prior approaches to personalized behavioral recommendations through mobile apps have primarily focused on optimizing behavioral changes using reinforcement learning, overlooking clinical outcomes.
Personalized behavioral recommendations through mobile apps have proven effective in preventing serious chronic diseases such as diabetes
Recent studies have primarily focused on optimizing personalized recommendations using reinforcement learning
The main problem identified with these approaches is that they focus on behavioral changes and overlook clinical outcomes
The current study was designed to address this gap by directly optimizing postprandial glucose levels
Conclusions
Further longitudinal real-world experiments in patients with diabetes are needed to validate and generalize the findings.
The real-world experiment was small (n=6) and used a simplified version of the proposed method
Only a single update of the recommendation policy into a personalized one was tested in the real-world setting
The authors noted preliminary effectiveness was demonstrated from both simulation and small real-world experiments
Generalizability to patients with diabetes requires further study
Hotta S, Kytö M, Koivusalo S, Heinonen S, Marttinen P. (2026). Personalized Glucose Management With AI: Pilot Study Using a Multiarmed Bandit Approach.. JMIR formative research. https://doi.org/10.2196/70826