AI^2 Forum April 2024

This month at AI^2 the downfall of the company that once claimed their goal was “to put accessible, affordable health service in the hands of every person on Earth” was discussed.

Babylon provided remote GP consultations and a patient facing AI chatBot symptom checker, all for the price of giving up your current local GP. There was one practice in London which you could visit in person - if you wanted to. Because these consultations were primarily remote this meant that the majority of people signing up were professional, busy, younger people, which suggests these individuals are healthier and ‘cheaper’ to treat.

The symptom checker worked to give a diagnosis (from least likely to most likely condition) or triage a patient suggesting what to do next (e.g. stay at home, Pharmacy, GP, hospital). However, this symptoms checker sometimes gave invalid results sometimes as severe as diagnosing someone with a panic attack even though the symptoms clearly pointed towards a heart attack.

There were disputes over legitimacy of their diagnostic accuracy, none of their work was ever peer reviewed, and eventually the company went into adminstration.

This emphasizes why protocols should be published before the methodology has been carried out to prove the methods and metrics are legitimate. This paper was suggested this paper for those interested finding about appropriate metrics for your work: Metrics reloaded: recommendations for image analysis validation.

In the second half of the session we discussed project limitations and possible ways to frame wording for publications. Some of the common themes raised included:

  • Data quality, doctors don’t have time to fill in records completely.
  • Lack of labelled data.
  • Reproducibility being impossible where data sets are private/secure, meaning that open source data such as MIMIC is being used frequently.
  • Clinical utility capacity, for example even if a AI tool suggests admission, if hospital beds are full, a patient can’t be admitted.
  • Risk prediction doesn’t take into account pain, numbers don’t account for quality of life - we need to involve patients.
  • How to compare different metrics when everyone uses different methods.
  • When is good, good enough?

Blog written by: Zoe Hancox

Written on April 24, 2024
[ AI_squared  ]