Skip to main content

Table 5 Results of the agreement analyses

From: Evaluation of a deep learning software for automated measurements on full-leg standing radiographs

Lengths and angles

Sample size (N)

ICC between AI and GT [95% CI]

Intra-reader reliability [95% CI]

ICC between radiologists [95% CI]

HKA

331

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

 > 0.99 [0.99, 1]

Pelvic obliquity

150

0.97 [0.96, 0.98]

0.98 [0.95, 0.99]

0.99 [0.98, 0.99]

Top leg length

309

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

 > 0.99 [0.95, 1]

Center leg length

331

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

Top femoral length

312

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

Center femoral length

334

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

Tibial length

344

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

 > 0.99 [> 0.99, 1]

  1. Agreement between AI and the ground truth (GT) was assessed with intraclass correlation coefficients (ICC) from a two-way mixed-effects model with absolute agreement for multiple raters. The intra-reader reliability was assessed with ICC from a two-way mixed-effects model with absolute agreement for a single rater. Agreement between the two radiologists who established the ground truth was evaluated with ICC from a two-way random-effects model with absolute agreement for multiple raters. Sample size (N) is also displayed