Exercise & Training

Personalized Glucose Management With AI: Pilot Study Using a Multiarmed Bandit Approach.

Hotta S, Kytö M, et al. • JMIR formative research • 2026

TL;DR

A multiarmed bandit approach using a two-stage reward prediction model for personalized dietary and exercise recommendations demonstrated significant improvement in postprandial glucose levels in simulation and a 23% average improvement in actual glucose responses in a small real-world experiment with 6 participants.

Key Findings

Results

The proposed multiarmed bandit algorithm significantly improved postprandial glucose levels compared to a randomized policy in simulation experiments.

The method uses a two-stage reward prediction model where actions are combinations of total carbohydrate intake and postprandial walking duration
The reward is defined as the reduction in postprandial glucose levels
The online algorithm demonstrated significant improvement over a randomized policy in simulation
The simulation experiment validated the online planning approach for personalized recommendations

Results

In a small real-world experiment with 6 participants, the personalized recommendation policy achieved a 23% average improvement in actual glucose responses compared to a randomized policy.

The real-world experiment involved 6 participants
A simplified version of the proposed method was used with a single update of the recommendation policy into a personalized one
A 23% improvement on average in actual glucose responses was observed
Improvement was accompanied by behavioral adherence to recommendations concerning carbohydrate intake and postprandial walking

Methods

The proposed method uses a two-stage prediction approach that first predicts behavioral responses to an action and subsequently predicts the postprandial glycemic response.

The action space is defined as a combination of total carbohydrate intake and postprandial walking duration
Reward prediction is implemented in two stages: predicted behavioral responses to an action, followed by postprandial glycemic response
This design directly optimizes clinical outcomes (postprandial glucose levels) rather than focusing solely on behavioral changes
The approach addresses a gap in prior reinforcement learning studies that focused on behavioral changes while overlooking clinical outcomes

Background

Prior approaches to personalized behavioral recommendations through mobile apps have primarily focused on optimizing behavioral changes using reinforcement learning, overlooking clinical outcomes.

Personalized behavioral recommendations through mobile apps have proven effective in preventing serious chronic diseases such as diabetes
Recent studies have primarily focused on optimizing personalized recommendations using reinforcement learning
The main problem identified with these approaches is that they focus on behavioral changes and overlook clinical outcomes
The current study was designed to address this gap by directly optimizing postprandial glucose levels

Conclusions

Further longitudinal real-world experiments in patients with diabetes are needed to validate and generalize the findings.

The real-world experiment was small (n=6) and used a simplified version of the proposed method
Only a single update of the recommendation policy into a personalized one was tested in the real-world setting
The authors noted preliminary effectiveness was demonstrated from both simulation and small real-world experiments
Generalizability to patients with diabetes requires further study

Have a question about this study?

Citation

Hotta S, Kyt&#xf6; M, Koivusalo S, Heinonen S, Marttinen P. (2026). Personalized Glucose Management With AI: Pilot Study Using a Multiarmed Bandit Approach.. JMIR formative research. https://doi.org/10.2196/70826

Key Findings

Have a question about this study?

Related Research

Citation