This Ph.D. thesis explores the strength and applicability of machine learning-based classifiers within the context of business analytics for data-driven decision making. The focus is on supervised binary classification on structured datasets, which are vastly present in relational databases across all enterprises. Advanced analytics has become indispensable for today's corporate world and it is demonstrated that predictive analytics is one of the major contributors to capture business value across the financial services value chain. To test this hypothesis different models as Generalized Linear Models, Random Forest, Gradient Boosting, and Artificial Neural Networks were tested, compared, and combined to test their predictive strength and robustness in different scenarios and use cases. The results indicate the superiority of Gradient Boosting when it comes to structured datasets compared to all other classifiers. This is a major reason why the diffusion of Deep Learning within business analytics is lacking behind. Also, the ensemble learning method stacking - which uses several base learners to create a more powerful super learner - proved to be a viable tool to consistently improve upon the accuracy of even the most powerful candidate models - including Gradient Boosting. Automated Machine Learning (AutoML) was benchmarked against manually tuned models and proved to be a valuable tool to democratize predictive analytics for small to medium-sized corporations and to tackle the skill shortage for ML experts. AutoML has the potential to completely automate the predictive modeling process, but it is mainly concerned with model tuning and selection while ignoring steps at the beginning and end of the pipeline. Also, an ML pipeline setup is suggested that would - once it is automated - be able to reach human expert-level prediction accuracy for binary classification on structured datasets. All those models were tested and applied in the context of different business analytics use cases- with a focus on financial services - to solve problems in credit risk management, insurance claims prediction, and marketing and sales. All use cases demonstrate improvements in prediction accuracy and hence offer direct value gains. Throughout the thesis, there is a consideration of the advantages and constraints when it comes to the use of ML models in the industry including a translation into managerial implications. Also, general economic and business implications are discussed to understand how the field will evolve in the future.
|Date of Award||1 Jun 2021|
- University Of Strathclyde
|Supervisor||Marc Roper (Supervisor) & John Levine (Supervisor)|