Tree and spline based association analysis of gene-gene interaction models for ischemic stroke.

Research paper by Nancy R NR Cook, Robert Y L RY Zee, Paul M PM Ridker

Indexed on: 30 Apr '04Published on: 30 Apr '04Published in: Statistics in Medicine


In the biology of complex disorders, such as atherothrombosis, interactions among genetic factors may play an important role, and theoretical considerations suggest that gene-gene interactions are quite common in such diseases. We used a nested case-control sample from the Physicians' Health Study, a randomized trial assessing the effects of aspirin and beta-carotene on cardiovascular disease and cancer among 22071 US male physicians, to examine these relationships for ischemic stroke. Data were available on 92 polymorphisms from 56 candidate genes related to inflammation, thrombosis and lipid metabolism, assessed in 319 incident cases of ischemic stroke and 2090 disease-free controls. We used classification and regression trees (CART) and multivariate adaptive regression spline (MARS) models to explore the presence of genetic interactions in these data. These models offer advantages over typical logistic regression methods in that they may uncover interactions among genes that do not exhibit strong marginal effects. Final models were selected using either the Bayes Information Criterion or cross-validation. Model fit was assessed using 10-fold cross-validation of the entire selection process. Both the CART and two-way MARS-logit models identified an interaction between two polymorphisms linked to inflammation, the P-selectin (val640leu) and interleukin-4 (C(582) T) genes. Internal validation of these models, however, suggested that effects of these polymorphisms are additive. Although further external validation of these models is necessary, these methods may be valuable in exploring and identifying potential gene-gene as well as gene-environment interactions in association studies.