Dr. Isaac Kohane, a computer scientist at Harvard and a physician, teamed up with two colleagues to test drive GPT-4 with one main goal: To see how the newest artificial intelligence model from OpenAI performed in a medical setting.
“I’m stunned to say: better than many doctors I’ve observed,” he says in the forthcoming book, “The AI Revolution in Medicine,” co-authored by independent journalist Carey Goldberg and Microsoft vice president of research Peter Lee. (The authors say neither Microsoft nor OpenAI required any editorial oversight of the book, though Microsoft has invested billions of dollars into developing OpenAI’s technologies.)
In the book, Kohane says GPT-4, released in March 2023 to paying subscribers, answers US medical exam licensing questions correctly more than 90% of the time. It’s a much better test-taker than previous ChatGPT AI models, GPT-3 and -3.5, and a better one than some licensed doctors.
GPT-4 is not just a good test-taker and fact-finder, though. It’s also a great translator. The book can translate discharge information for a patient who speaks Portuguese and distill wonky technical jargon into something 6th graders could easily read.
As the authors explain with vivid examples, GPT-4 can also give doctors helpful suggestions about bedside manners, offering tips on how to talk to patients about their conditions in compassionate, clear language, and it can read lengthy reports or studies and summarize them in the blink of an eye. The tech can even explain its reasoning through problems in a way that requires some measure of what looks like human-style intelligence.
But if you ask GPT-4 how it does all this, it will likely tell you that all of its intelligence is still “limited to patterns in the data and does not involve true understanding or intentionality.” That’s what GPT-4 told the book’s authors when they asked if it could engage in causal reasoning. Even with such limitations, as Kohane discovered in the book, GPT-4 can mimic how doctors diagnose conditions with stunning — albeit imperfect — success.
Kohane goes through a clinical thought experiment with GPT-4 in the book, based on a real-life case that involved a newborn baby he treated several years earlier. Giving the bot a few key details about the baby he gathered from a physical exam, as well as some information from an ultrasound and hormone levels, the machine was able to correctly diagnose a 1 in 100,000 conditions called congenital adrenal hyperplasia “just as I would, with all my years of study and experience,” Kohane wrote.
The doctor was both impressed and horrified.
GPT-4 isn’t always reliable, and the book contains examples of its blunders. They range from simple clerical errors, like misstating a BMI that the bot had correctly calculated moments earlier, to math mistakes, like inaccurately “solving” a Sudoku puzzle or forgetting to square a term in an equation. The mistakes are often subtle, and the system asserts it is right, even when challenged. It’s not a stretch to imagine how a misplaced number or miscalculated weight could lead to serious errors in prescribing or diagnosis.
Like previous GPTs, GPT-4 can also “hallucinate” — the technical euphemism for when AI makes up answers or disobeys requests.
When asked about the issue this by the authors of the book, GPT-4 said, “I do not intend to deceive or mislead anyone, but I sometimes make mistakes or assumptions based on incomplete or inaccurate data. I also do not have the clinical judgment or the ethical responsibility of a human doctor or nurse.”
One potential cross-check the authors suggest in the book is to start a new session with GPT-4 and have it “read over” and “verify” its work with a “fresh set of eyes.” This tactic sometimes works to reveal mistakes — though GPT-4 is somewhat reticent to admit when it’s been wrong. Another error-catching suggestion is to command the bot to show you its work, so you can verify it, human style.
It’s clear that GPT-4 has the potential to free up precious time and resources in the clinic, allowing clinicians to be more present with patients “instead of their computer screens,” the authors write. But, they say, “We have to force ourselves to imagine a world with smarter and smarter machines, eventually perhaps surpassing human intelligence in almost every dimension. And then think very hard about how we want that world to work.”
Originally Appeared- https://www.insider.com/chatgpt-passes-medical-exam-diagnoses-rare-condition-2023-4