Quantcast

Global vs. local models for cross-project defect prediction

Research paper by Steffen Herbold, Alexander Trautsch; Jens Grabowski

Indexed on: 11 Nov '16Published on: 24 Oct '16Published in: Empirical Software Engineering



Abstract

Abstract Although researchers invested significant effort, the performance of defect prediction in a cross-project setting, i.e., with data that does not come from the same project, is still unsatisfactory. A recent proposal for the improvement of defect prediction is using local models. With local models, the available data is first clustered into homogeneous regions and afterwards separate classifiers are trained for each homogeneous region. Since the main problem of cross-project defect prediction is data heterogeneity, the idea of local models is promising. Therefore, we perform a conceptual replication of the previous studies on local models with a focus on cross-project defect prediction. In a large case study, we evaluate the performance of local models and investigate their advantages and drawbacks for cross-project predictions. To this aim, we also compare the performance with a global model and a transfer learning technique designed for cross-project defect predictions. Our findings show that local models make only a minor difference in comparison to global models and transfer learning for cross-project defect prediction. While these results are negative, they provide valuable knowledge about the limitations of local models and increase the validity of previously gained research results.AbstractAlthough researchers invested significant effort, the performance of defect prediction in a cross-project setting, i.e., with data that does not come from the same project, is still unsatisfactory. A recent proposal for the improvement of defect prediction is using local models. With local models, the available data is first clustered into homogeneous regions and afterwards separate classifiers are trained for each homogeneous region. Since the main problem of cross-project defect prediction is data heterogeneity, the idea of local models is promising. Therefore, we perform a conceptual replication of the previous studies on local models with a focus on cross-project defect prediction. In a large case study, we evaluate the performance of local models and investigate their advantages and drawbacks for cross-project predictions. To this aim, we also compare the performance with a global model and a transfer learning technique designed for cross-project defect predictions. Our findings show that local models make only a minor difference in comparison to global models and transfer learning for cross-project defect prediction. While these results are negative, they provide valuable knowledge about the limitations of local models and increase the validity of previously gained research results.