Construction of disease risk scoring systems using logistic group lasso: application to porcine reproductive and respiratory syndrome survey data

H. Lin, C. Wang, P. Liu, and D.J. Holtkamp (2012). Construction of Disease Risk Scoring Systems using Logistic Group Lasso: Application to Porcine Reproductive and Respiratory Syndrome Survey Data. Journal of Applied Statistics, 40(4):736-746.

Abstract

We propose to utilize the group lasso algorithm for logistic regression to construct a risk scoring system for predicting disease in swine. This work is motivated by the need to develop a risk scoring system from survey data on risk factor for porcine reproductive and respiratory syndrome (PRRS), which is a major health, production and financial problem for swine producers in nearly every country. Group lasso provides an attractive solution to this research question because of its ability to achieve group variable selection and stabilize parameter estimates at the same time. We propose to choose the penalty parameter for group lasso through leave-one-out cross-validation, using the criterion of the area under the receiver operating characteristic curve. Survey data for 896 swine breeding herd sites in the USA and Canada completed between March 2005 and March 2009 are used to construct the risk scoring system for predicting PRRS outbreaks in swine. We show that our scoring system for PRRS significantly improves the current scoring system that is based on an expert opinion. We also show that our proposed scoring system is superior in terms of area under the curve to that developed using multiple logistic regression model selected based on variable significance.

Publication
In Journal of Applied Statistics.
Date
Links