Study of Random Forest and Its Variants
Master's thesis — ensemble learning, hyperparameter tuning, and imbalanced classification
Authors
Rishabh Bhartiya
Date
September 2022
Institution
Univ. Milano
Supervisor
Prof. Gabriele Gianini
Abstract
This thesis investigates Random Forest and its principal variants — ExtraTreesClassifier, BalancedRandomForest, and EasyEnsemble — across three applied medical and financial datasets. The study examines how hyperparameter tuning (n_estimators, max_depth, min_samples_split, class_weight) affects generalization on imbalanced datasets where minority-class precision is clinically or financially critical. Experiments were conducted across credit card fraud detection, breast cancer diagnosis, and heart disease prediction, comparing baseline RF against tuned variants using precision, recall, F1, AUC-ROC, and confusion matrix analysis. The results demonstrate that hyperparameter-tuned Random Forest with balanced class weights consistently outperforms baseline configurations, with BalancedRandomForest achieving the largest gains on the most severely imbalanced dataset (credit card fraud: 0.27% positive rate).
Compared 4 RF variants across credit card fraud, breast cancer, and heart disease datasets
BalancedRandomForest improved minority-class recall from 71% → 92% on fraud detection
Identified class_weight as the single most impactful hyperparameter for imbalanced classification
Full Python/Scikit-learn implementation with stratified 5-fold cross-validation
Full Document
← → arrow keys to navigate · scroll within viewer
Loading
Motivation
Random Forest is one of the most deployed ensemble methods in industry — yet its behavior on imbalanced datasets is poorly understood by practitioners. Class imbalance is the norm, not the exception, in medical diagnostics and financial fraud detection. This thesis was motivated by a practical question: which RF variant, with which hyperparameters, performs best when the minority class matters most?
Research Questions
- How does Random Forest performance degrade as class imbalance increases?
- Which RF variant — standard, ExtraTrees, BalancedRF, or EasyEnsemble — best handles severe imbalance?
- What is the marginal impact of individual hyperparameters on minority-class recall?
- Do findings generalize across different domain datasets?
Datasets
- Credit Card Fraud (Kaggle) — 284,807 transactions, 0.17% fraud rate. Extreme imbalance.
- Breast Cancer Wisconsin — 569 samples, malignant vs benign. Moderate imbalance.
- Heart Disease (UCI) — 303 samples, presence vs absence. Near-balanced.
Methods
All experiments implemented in Python using Scikit-learn. The study compares four classifier families:
- RandomForestClassifier — baseline and hyperparameter-tuned configurations
- ExtraTreesClassifier — extra randomness in split selection, faster training
- BalancedRandomForestClassifier — undersamples majority class at each bootstrap
- EasyEnsembleClassifier — ensemble of AdaBoost on balanced subsamples
Hyperparameters tuned via Grid Search with stratified 5-fold cross-validation:
n_estimators (50–500), max_depth (None, 10, 20, 30),
min_samples_split (2, 5, 10), class_weight ('balanced', None),
max_features ('sqrt', 'log2', None).
Key Findings
- On the credit card fraud dataset, BalancedRandomForest improved minority-class recall from 0.71 (baseline RF) to 0.92, at the cost of a precision decrease from 0.88 to 0.79
- ExtraTreesClassifier provided 30–40% faster training than standard RF with equivalent or slightly better generalization on near-balanced datasets
- The
class_weight='balanced'parameter was the single most impactful hyperparameter for minority-class F1 on all three datasets - EasyEnsemble achieved the highest AUC-ROC across all datasets (0.98, 0.99, 0.97) but at 4× the inference time of standard RF
- Hyperparameter tuning consistently yielded 5–15% improvement in minority-class F1 over default configurations
Technical Implementation
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from imblearn.ensemble import BalancedRandomForestClassifier, EasyEnsembleClassifier
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.metrics import classification_report, roc_auc_score
param_grid = {
'n_estimators': [100, 200, 300],
'max_depth': [None, 10, 20],
'min_samples_split': [2, 5],
'class_weight': ['balanced', None]
}
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
grid_search = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=cv,
scoring='f1_macro',
n_jobs=-1
)
grid_search.fit(X_train, y_train)
best_rf = grid_search.best_estimator_
y_pred = best_rf.predict(X_test)
print(classification_report(y_test, y_pred))
print(f"AUC-ROC: {roc_auc_score(y_test, best_rf.predict_proba(X_test)[:,1]):.4f}")
Conclusion
The thesis demonstrates that algorithm selection and hyperparameter tuning are more impactful
than architecture complexity for tabular imbalanced classification. BalancedRandomForest is
the recommended choice when minority-class recall is the primary objective; standard RF with
class_weight='balanced' is the best default for general use.
The study provides a reproducible experimental framework applicable to any imbalanced
binary classification problem.