Investigating the use of ensemble techniques in predicting object-oriented software maintainability

  • Hadeel Abdullah Alsolai

Student thesis: Doctoral Thesis

Abstract

Context: Prediction of the maintainability of classes in object-oriented systems is a significant factor for software success; however, it is a challenging task. Although prior object-oriented software maintainability literature acknowledges the role of machine learning techniques as valuable predictors of potential change, the most suitable technique that consistently achieves high accuracy remains undetermined and there is no clear indication of which techniques are more appropriate.;Objective: This thesis aims to empirically investigate the capability of ensemble models to provide an increased prediction accuracy, compared with individual models, by applying them on several software maintainability datasets using different base models and analysing the impact of parameter tuning.;Method: In the first part of this thesis, a systematic review of studies related to the prediction of the maintainability of object-oriented software systems using machine learning techniques is presented. In the remaining parts of this thesis, three empirical studies were performed to evaluate and compare different homogeneous and heterogeneous ensemble models against sets of individual models for predicting software maintainability of object-oriented systems at the class level. These models were employed on 14 datasets that were extracted from the maintenance of object-oriented software systems.;Results: The systematic literature review determined 56 relevant studies and indicated that the application of ensemble models is relatively rare, thus there is a need to perform studies using these models as well as others to an extensive variety of datasets. The results obtained from three empirical studies indicate that the proposed ensemble models yield improved prediction accuracy over most of the individual models. This improvement was significant only in the third empirical study, along with a few cases in the second empirical study. In most cases, nearest neighbours or support vector regression achieved the best prediction accuracy among individual models; moreover, these models as a base model in bagging and additive regression outperformed other prediction models, along with random forest.;Conclusion: The main finding is that ensemble models are effective for predicting software maintainability and they are more accurate than some individual models; their performance may be improved by using large datasets, or parameter tuning. Also, ensemble models improve the performance of weaker base models.
Date of Award11 Mar 2021
Original languageEnglish
Awarding Institution
  • University Of Strathclyde
SupervisorMarc Roper (Supervisor) & Murray Wood (Supervisor)

Cite this

'