Comparison of machine learning algorithms and oversampling techniques for urinary toxicity prediction after prostate cancer radiotherapy

Affiliation auteurs!!!! Error affiliation !!!!
TitreComparison of machine learning algorithms and oversampling techniques for urinary toxicity prediction after prostate cancer radiotherapy
Type de publicationConference Paper
Year of Publication2019
AuteursMylona E, Lebreton C, Fontaine P, Supiot S, Magne N, Crehange G, de Crevoisier R, Acosta O
Conference Name2019 IEEE 19TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE)
PublisherIEEE; IEEE Comp Soc
Conference Location345 E 47TH ST, NEW YORK, NY 10017 USA
ISBN Number978-1-7281-4617-1
Mots-clésImbalanced data, Machine learning, Prostate cancer radiotherapy, radiotherapy, Urinary toxicity
Résumé

Prostate cancer radiotherapy unavoidably involves the irradiation not only of the target volume, but also of healthy organs-at-risk, neighboring the prostate, likely causing adverse, toxicity-related side-effects. Specifically, in the case of urinary toxicity, these side effects might be associated with a variety of dosimetric, clinical and genetic factors, making its prediction particularly challenging. Given the inconsistency of available data concerning radiation-induced toxicity, it is crucial to develop robust models with superior predictive performance in order to perform tailored treatments. Machine Learning techniques emerge as appealing in this context, nevertheless without any consensus on the best algorithms to be used. This work proposes a comparison of several machine-learning strategies together with different minority class oversampling techniques for prediction of urinary toxicity following prostate cancer radiotherapy using dosimetric and clinical data. The performance of these classifiers was evaluated on the original dataset and using four different synthetic oversampling techniques. The area under the ROC curve (AUC) and the F-measure were employed to evaluate their performance. Results suggest that, regardless of the technique, oversampling always increases the prediction performance of the models (p=0.004). Overall, oversampling with Synthetic Minority Oversampling Technique (SMOTE) followed by Edited Nearest Neighbour algorithm (ENN) together with Regularized Discriminant Analysis (RDA) classifier provide the best performance (AUC=0.71).

DOI10.1109/BIBE.2019.00180