Research provides “stepping stone” for future application of AI in vascular surgery

1872
L-R: Quang Le and Michael Amendola

New research on Chat generative pre-trained transformer (GPT) technology and its Vascular Education and Self-Assessment Program (VESAP) success rate provides insight into the future of artificial intelligence (AI) in vascular surgery training and practice, investigators Michael Amendola (Richmond, USA) and Quang Le (Charlottesville, USA) tell Vascular News.

Le, a medical student at the University of Virginia School of Medicine and first author of the research, explained that the project began with a petition to the Society for Vascular Surgery (SVS) Self-Assessment Committee, which granted access to the fourth edition of VESAP (VESAP4) in April 2023.

Subsequently, VESAP4 materials—namely 385 non-imaging questions, separated into 10 domains of vascular surgery knowledge—were submitted to the GPT-3.5-Turbo (GPT 3.5) large language model. Two independent reviewers examined AI-generated responses for accuracy and content, and compared them to provided key answers. Application programming interface (API) requests were triplicated to evaluate consistency.

The research, delivered as a moderated poster presentation at this year’s Southern Association for Vascular Surgery (SAVS) annual meeting (24–27 January, Scottsdale, USA), showed that GPT 3.5 provided the correct answer to 49.4% of questions, and that 77.8% of correct responses were similar across all three queries.

Le reported that GPT 3.5 performed best in questions on radiation safety, achieving a 54.4% correct rate, while it performed worst in questions on dialysis access, answering only 39% of questions correctly.

Of the incorrectly answered questions, Le noted that the most common cause of inaccuracy was retrieval of false information or failure to retrieve important facts.

The team conducted further research, presented as a poster at the 2024 Vascular Annual Meeting (VAM; 19–22 June, Chicago, USA), which found that while GPT 3.5 had an accuracy rate of about 48%, the corresponding figure for a later iteration of the model, GPT 4, was about 63%. However, the researchers also found that consistency was limited, with GPT 3.5 only consistent 55% of the time across three query attempts. GPT 4 was consistent in 90% of answers.

“Unfortunately, we found that industry updates have had conflicting effects,” Le added. He shared that, while accuracy rates remained stable in the time between the releases of these two models—June 2023 and November 2023—consistency increased to 65% in GPT 3.5 but dropped to 79% in GPT 4 in this period.

Amendola, professor of surgery at Virginia Commonwealth University School of Medicine and chief of the Division of Vascular Surgery at Central Virginia VA Health Care System in Richmond, Virginia, as well as senior author of the research, highlighted a key takeaway from the project: “The interesting finding in this is that [ChatGPT] was not as good as we thought it ought to be.”

Challenges

This comment laid the foundations for a wider discussion on the future of AI—an umbrella term covering a variety of different technologies, from large language models like ChatGPT to machine learning algorithms—in vascular surgery. Both Amendola and Le highlighted various challenges that need to be addressed at this early stage of development before wider implementation into training and practice can be successful.

“A lot of this technology is experimental right now,” Le pointed out, noting that its application is currently highly varied by institution and by country.

Both Le and Amendola underline regulatory issues as one limitation. “Integration of these tools—AI and more specifically large language models—continues to pose significant challenges due to the legislative consideration as well as integration into existing health systems and health record systems,” said Le, with Amendola adding that privacy and Health Insurance Portability and Accountability Act (HIPPA) concerns are “limiting the infiltration of a lot of these models at large healthcare systems.”

Perhaps the biggest drawback at present, though, according to Le, is the tendency for models such as the one used in the aforementioned research to “hallucinate.” He explained: “Our large language models sometimes make up information in a way that might be harmful when used in a clinical setting.”

Amendola also highlighted the impact of early-iteration AI on vascular training, stressing that educational institutions are grappling with how to effectively handle the use of AI among students. “Can you generate your own AI-based position paper or personal statement for an application?” he asked, highlighting a key question at the center of this conundrum.

Data are also an issue. Le pointed out that large language models are influenced by training data, which “may harbor hidden bias.” In addition, he noted that predictive machine learning modeling needs large amounts of clean data, which often are not available. “Unfortunately,” he stressed, “data capture during clinical care tends to be of low quality, for various reasons, with lots of missing or poorly documented information, which reduces the strength of such predictive modeling.”

Opportunities

Overall, however, both researchers expressed cautious optimism about the potential of AI technology in the vascular surgery landscape of the future. “While our study has been about understanding the limitation of these new large language models, it’s only a stepping stone towards the innovative application of these tools,” Le said, putting his and Amendola’s research into context.

Machine learning offers a significant opportunity to individualise care, Le noted. He added that AI in general could improve the efficiency of vascular practice and education, citing the rapid drafting of medical documentation and the distillation of historical medical data as examples. Large language models, Le continued, could build realistic clinical situation simulations for training purposes.

“We’re using this technology now in a lot of other parts of our lives, and eventually it will become part and parcel of what we see at the bedside and within our practice,” Amendola posited.

Alongside this potential, though, Amendola was keen to stress the need for safety measures. “I think there’s a lot of promise and opportunity, but there needs to be a lot of policy, and we’re going to have to put some guardrails on what exactly some of these models are able to look at.”

Commenting finally on adoption of AI at the physician level, Le emphasised the generally user-friendly nature of most current popular large language models, and that learning curves will become shorter as time goes on. “Overall, I think as these technologies develop, the barrier to entry and the learning curve will continue to decrease,” he commented.

“It will be an aid,” Amendola said as a closing remark, citing as one of his key messages the fact it is not a question of if but when with the adoption of AI, urging colleagues to embrace this new set of technologies. “One of the best quotes I’ve heard about AI is that AI will not replace doctors or surgeons, but the surgeon or the doctor who doesn’t use AI will be replaced.”


LEAVE A REPLY

Please enter your comment!
Please enter your name here