As regards in-hospital mortality for abdominal aortic aneurysm (AAA) repair, researchers have found that “not all risk scores are created equal”. Livia E V M de Guerre (Beth Israel Deaconess Medical Center, Boston, USA; University Medical Center, Utrecht, The Netherlands) and colleagues have identified performance variation between three commonly-used risk scores in administrative and quality improvement registries, and recently reported their findings online in the Journal of Vascular Surgery (JVS).
“Accurate and contemporary prognostic risk prediction is essential to inform clinical decision-making surrounding AAA care,” the authors write. For this reason, de Guerre et al decided to validate and compare three different in-hospital mortality risk scores—Medicare, vascular study group of New England (VSGNE), and Glasgow Aneurysm Score (GAS)—in one administrative and two quality improvement registries.
Writing in JVS, the authors report that the VSGNE risk score performed best in the quality improvement registries but underestimated mortality. However, they add, the Medicare risk score demonstrated better calibration in the administrative dataset after open repair. Although the VSGNE risk score appeared to perform better in the quality improvement registries, the authors note that “its overly-optimistic mortality estimates and its reliance on detailed anatomic and clinical variables reduced broader applicability to other databases”.
The researchers included patients undergoing elective AAA repair from 2012–2015 in the national inpatient sample (NIS), the vascular quality initiative (VQI; excluding the VSGNE region), and the national surgical quality improvement programme (NSQIP) datasets to validate the three risk scores, they detail.
De Guerre and colleagues identified a total of 25,461 NIH, 18,588 VQI, and 8,051 NSQIP patients who underwent elective open or endovascular aneurysm repair (EVAR). Regarding mortality, they report the following key finding: “Overall, the Medicare risk score was more likely to overestimate mortality in the quality improvement registries, while the VSGNE risk score underestimated mortality in all databases”.
The authors add that, after EVAR, the Medicare risk score had a higher area under the curve (AUC) in the NIS compared to GAS (p<0.001) but not compared to the VSGNE risk score (p=0.54). In addition, they found that the VSGNE risk score was associated with a significantly higher receiver operating characteristic (ROC) AUC compared to the Medicare (p<0.001) and GAS risk score (p<0.001) in the VQI registry, and that the VSGNE risk score showed improved calibration compared with the Medicare risk score across all three databases (all p<0.001).
After open repair, de Guerre et al continue, the Medicare risk score showed improved calibration compared with the VSGNE risk score in the NIS (p<0.001). However, in the VQI registry, they note that the VSGNE risk score compared to the Medicare risk score had significantly better discrimination (p=0.008) and calibration (p<0.001).
The authors consider some of the reasons behind the variation. They recognise, for example, differences in variable definitions and availability between the databases that were used to create the risk models and the datasets to which the researchers subsequently applied each risk score. De Guerre and colleagues also highlight that the different patient populations represented within the datasets have different distributions of risk factors.
In the discussion of their findings, the researchers stress that their results “need to be interpreted within the context of [the study’s] limitations”. For example, they recognise that that broader generalisability of their findings and recommendations towards universal applicability in clinical and research settings is “not straightforward” due to the inclusion of only three risk scores. However, they stress that scores they examined are the most commonly used among three well-known data sources that inform the field, and thus their study “represents an important contribution”.