br In the above studies on
In the above studies on cancer survivability classification, most prediction models can achieve a good accuracy. However, there are a number of models having a low sensitivity caused by using traditional classification methods on imbalanced data, especially for survivability prediction of high mortality cancers . Therefore, one should also consider the charac-teristics of the data to improve the overall prediction performance. The first stage of the proposed two-stage model is to classify the survivability of advanced-stage colorectal cancer patients, which is an imbalanced classification problem. Many approaches have been proposed for addressing imbalanced classification problems. The most commonly used methods in-clude oversampling and undersampling. Oversampling tends to achieve class balance by duplicating or creating minority class samples. Undersampling only use a small number of majority class samples to keep a balance with minority samples. These methods are found to be able to improve the overall classification performance to some extent. A undersampling and ensembling strategy was demonstrated to be more effective for imbalanced classification , which trains some classifiers using undersampled majority class samples and the minority class samples and combines them into an ensemble. Inspired by this strategy, this paper constructs a novel tree-based ensemble method for the imbalanced classification of stage IV cancer survivability.
2.2. Ensemble learning methods
For improving the prediction performance, efforts on improving the existing ensemble methods have been made. Com-bining one ensemble learning method with another method is one of the typical ideas. Yao et al.  proposed to combine random forest with multivariate adaptive regression splines algorithms to generate models for breast cancer survivability prediction, and it Geneticin, G-418 Sulfate was shown that the performance of the proposed method was slightly better than random forest. Con-ducting feature selection to reduce the size of data before training prediction models with random forest is also a way to improve the accuracy of cancer survivability prediction . Besides, optimizing the key parameters of ensemble meth-ods is believed to be helpful in achieving a high classification accuracy . Qian et al.  used random forest to locate the incidence position of prostate cancer. All parameter combinations were tried and the best parameters are chosen using cross-validation to achieve the highest accuracy at a higher computational cost.
It is widely recognized that a good ensemble needs a large number of diverse and accurate base learners. One popular approach of getting different base learners is to manipulate the features of the dataset. For example, Ho  proposed a random subspace method for constructing a decision-tree-based ensemble classifier, in which randomly selected subsets of features are utilized as the input to promote diversity. These methods are able to generate abundant base learners, how-ever, have problem in determining the best trade-offs between diversity and accuracy that is actually needed. Zhou et al.
 proposed the concept of selective ensemble, which refers to ensembles constructed from a selected subset of the gen-erated base learners and the selection process is also called ensemble pruning. Selective ensembles are able to improve the prediction performance by removing base learners that do not contribute to reducing the generalization error, which can also save the computational resources with similar or better accuracy compared to the ensemble before selection .
A number of selective ensemble classification methods have been proposed. Opitz  designed an algorithm called ge-netic ensemble feature selection (GEFS) which employed GA to find a set of feature subsets to train a set of diverse and accurate neural networks for constructing an ensemble. The feature subset was evaluated in terms of both accuracy and diversity of the neural networks, where diversity was defined to be the average difference between the predictive results of the component classifier and the ensemble. Qiang et al.  used a clustering algorithm to prune the ensemble where ensemble members are divided into multiple clusters by clustering according to the similarity of the members and the most accurate member of each cluster is selected. Elghazel et al.  ranked the original members of the ensemble based upon a predefined criterion and selected a subset of the members from the ordered list in sequences. Single-objective evolutionary algorithms were used to find accurate and diverse ensemble members by aggregating accuracy and diversity into a scalar fitness function [1,29]. Gu and Jin  used a multi-objective evolutionary algorithm to generate ensembles trading off be-tween accuracy and diversity. Although it has been well recognized that selective ensembles can always perform better than traditional ensemble methods for classification , not much research on selective ensembles for regression problems has been performed . It should be noted that most existing selective ensemble methods were developed for classification, which cannot be directly used for regression problems.