Background Parathyroidectomy offers the only cure for primary hyperparathyroidism, but today only 50% of primary hyperparathyroidism patients are referred for operation, in large part, because the condition is widely under-recognized. The diagnosis of primary hyperparathyroidism can be especially challenging with mild biochemical indices. Machine learning is a collection of methods in which computers build predictive algorithms based on labeled examples. With the aim of facilitating diagnosis, we tested the ability of machine learning to distinguish primary hyperparathyroidism from normal physiology using clinical and laboratory data. Methods This retrospective cohort study used a labeled training set and 10-fold cross-validation to evaluate accuracy of the algorithm. Measures of accuracy included area under the receiver operating characteristic curve, precision (sensitivity), and positive and negative predictive value. Several different algorithms and ensembles of algorithms were tested using the Weka platform. Among 11,830 patients managed operatively at 3 high-volume endocrine surgery programs from March 2001 to August 2013, 6,777 underwent parathyroidectomy for confirmed primary hyperparathyroidism, and 5,053 control patients without primary hyperparathyroidism underwent thyroidectomy. Test-set accuracies for machine learning models were determined using 10-fold cross-validation. Age, sex, and serum levels of preoperative calcium, phosphate, parathyroid hormone, vitamin D, and creatinine were defined as potential predictors of primary hyperparathyroidism. Mild primary hyperparathyroidism was defined as primary hyperparathyroidism with normal preoperative calcium or parathyroid hormone levels. Results After testing a variety of machine learning algorithms, Bayesian network models proved most accurate, classifying correctly 95.2% of all primary hyperparathyroidism patients (area under receiver operating characteristic = 0.989). Omitting parathyroid hormone from the model did not decrease the accuracy significantly (area under receiver operating characteristic = 0.985). In mild disease cases, however, the Bayesian network model classified correctly 71.1% of patients with normal calcium and 92.1% with normal parathyroid hormone levels preoperatively. Bayesian networking and AdaBoost improved the accuracy of all parathyroid hormone patients to 97.2% cases (area under receiver operating characteristic = 0.994), and 91.9% of primary hyperparathyroidism patients with mild disease. This was significantly improved relative to Bayesian networking alone (P < .0001). Conclusion Machine learning can diagnose accurately primary hyperparathyroidism without human input even in mild disease. Incorporation of this tool into electronic medical record systems may aid in recognition of this under-diagnosed disorder.
ASJC Scopus subject areas