Skip to main content

Table 6 Comparison of the performance of the AI algorithm across different groups

From: Evaluation of a deep learning software for automated measurements on full-leg standing radiographs

Lengths and angles

MAE [95% CI]

Statistical test

Children (N = 26)

Adults (N = 141)

HKA (°)

0.27 [0.22, 0.31]

0.31 [0.28, 0.33]

F(1, 162) = 6.00, p = 0.02

Pelvic obliquity (mm)

0.58 [0.43, 0.72]

0.79 [0.66, 0.91]

W = 1365, p = 0.15

Top leg length (mm)

0.87 [0.75, 0.99]

1.07 [0.94, 1.17]

F(1, 155) = 1.51, p = 0.22

Center leg length (mm)

1.13 [0.96, 1.30]

1.53 [1.38, 1.65]

F(1, 162) = 1.31, p = 0.25

Top femoral length (mm)

0.73 [0.62, 0.83]

1.00 [0.89, 1.09]

F(1, 156) = 0.01, p = 0.92

Center femoral length (mm)

0.90 [0.74, 1.04]

1.29 [1.19, 1.39]

F(1, 163) = 0.00, p = 0.99

Tibial length (mm)

1.22 [0.96, 1.47]

1.41 [1.20, 1.58]

F (1, 164) = 3.81, p = 0.05

 

Hip implant (N = 19)

No implant (N = 110)

 

HKA (°)

0.26 [0.21, 0.30]

0.30 [0.27, 0.32]

F(1, 127) = 4.50, p = 0.04

Pelvic obliquity (mm)

NA

NA

NA

Top leg length (mm)

0.74 [0.56, 0.93]

1.01 [0.88, 1.12]

F(1, 120) = 0.014, p = 0.91

Center leg length (mm)

1.25 [0.97, 1.49]

1.46 [1.30, 1.60]

F(1, 127) = 7.82, p = 0.006*

Top femoral length (mm)

0.83 [0.43, 1.16]

0.89 [0.80, 0.98]

F(1, 121) = 0.034, p = 0.85

Center femoral length (mm)

1.43 [1.17, 1.66]

1.18 [1.07, 1.29]

F(1, 128) = 11.49, p < 0.001*

Tibial length (mm)

1.84 [1.04, 2.50]

1.15 [1.02, 1.29]

F(1, 129) = 1.60, p = 0.21

 

Knee implant (N = 35)

No implant (N = 110)

 

HKA (°)

0.33 [0.29, 0.36]

0.30 [0.27, 0.32]

F(1, 142) = 0.098, p = 0.75

Pelvic obliquity (mm)

1.04 [0.69, 1.33]

0.66 [0.57, 0.75]

W = 2173, p = 0.48

Top leg length (mm)

1.16 [0.99, 1.34]

1.01 [0.88, 1.12]

F(1, 142) = 0.030, p = 0.86

Center leg length (mm)

1.57 [1.36, 1.78]

1.46 [1.30, 1.60]

F(1, 142) = 0.12, p = 0.72

Top femoral length (mm)

1.16 [0.94, 1.36]

0.89 [0.80, 0.98]

F(1, 143) = 0.68, p = 0.41

Center femoral length (mm)

1.28 [1.10, 1.45]

1.18 [1.07, 1.29]

F(1, 143) = 0.049, p = 0.83

Tibial length (mm)

1.80 [1.48, 2.07]

1.15 [1.02, 1.29]

F(1, 142) = 34.30, p < 0.001*

 

Conventional radiographs (N = 121)

EOS images (N = 46)

 

HKA (°)

0.32 [0.30, 0.34]

0.25 [0.22, 0.28]

F(1, 162) = 25.22, p < 0.001*

Pelvic obliquity (mm)

0.80 [0.65, 0.93]

0.58 [0.45, 0.72]

W = 1788, p = 0.32

Top leg length (mm)

1.04 [0.91, 1.16]

1.01 [0.86, 1.15]

F(1, 155) = 5.40, p = 0.02

Center leg length (mm)

1.54 [1.40, 1.68]

1.26 [1.10, 1.41]

F(1,162) = 2.43, p = 0.12

Top femoral length (mm)

0.97 [0.87, 1.08]

0.89 [0.72, 1.04]

F(1, 156) = 2.55, p = 0.11

Center femoral length (mm)

1.24 [1.14, 1.35]

1.19 [1.05, 1.33]

F(1, 163) = 1.05, p = 0.31

Tibial length (mm)

1.44 [1.22, 1.63]

1.22 [1.06, 1.38]

F(1, 164) = 0.67, p = 0.41

 

Genu varum (N = 83)

All (N = 167)

 

HKA (°)

0.33 [0.30, 0.36]

0.30 [0.28, 0.32]

F(1, 162) = 14.79, p < 0.001*

 

Genu valgum (N = 20)

All (N = 167)

 

HKA (°)

0.28° [0.22, 0.34]

0.30 [0.28, 0.32]

F(1, 162) = 1.18, p = 0.28

  1. Performance of the AI algorithm on pediatric and adult patients, on images with and without implant (knee and hip prosthesis), on conventional radiographs versus EOS images, and on images with and without malalignment for the HKA angle, as assessed by the mean absolute error (MAE). Mann–Whitney U tests were computed to evaluate how implant and imaging modality influenced differences between AI-based and ground truth measurements of pelvic obliquity. Linear mixed models with patient as a random effect were employed to assess the influence of implant and imaging modality on all measurements but pelvic obliquity. Additional linear mixed models with patient as a random effect were computed to examine the influence of genu varum and genu valgum on differences between AI and ground truth measurements. Counts represent the number of patients rather than the number of images.
  2. *Statistically significant result