Cycle-consistent adversarial networks improve the generalizability of the radiomics model in grading meningiomas upon external validation

Patient population

The Yonsei University Institutional Review Board approved this retrospective study and waived the need to obtain informed patient consent. All methods were performed in accordance with current guidelines and regulations. We identified 297 patients who were pathologically confirmed to have meningioma and underwent baseline conventional MRI between February 2008 and September 2018 in the institutional dataset. Patients with 1) missing MRI sequences or inadequate image quality (not= 17), 2) history of surgery (not= 15), 3) history of tumor embolization or gamma knife surgery prior to MRI examination (not= 5), and 4) an image processing error (not= 2) were excluded. A total of 257 patients (low-grade, 162; high-grade, 95) were enrolled in the institutional cohort.

Identical inclusion and exclusion criteria were applied to identify 62 patients (low-grade, 47; high-grade, 15) from Ewha Mokdong University Hospital between January 2016 and December 2018 for external model validation. The patient flowchart is shown in Fig. S1.

Pathological diagnosis

Pathological diagnosis was made by neuropathologists, according to WHO criteria19. Criteria for atypical meningioma (WHO grade 2) included 4 to 19 mitoses per 10 high power fields, the presence of cerebral invasion, or the presence of at least three of the following characteristics: “sheet” growth, hypercellularity, necrosis, large prominent nucleoli and small cells. Criteria for anaplastic meningioma (WHO grade 3 included frank anaplasia (histology resembling carcinoma, sarcoma, or melanoma) or high mitoses (>20 mitoses per 10 high power fields)19.

MRI protocol

In the institutional training dataset, patients were scanned on 3.0 Tesla MRI units (Achieva or Ingenia; Philips Medical Systems). Imaging protocols included T2-weighted imaging (T2) and contrast-enhanced T1-weighted imaging (T1C). T1C images were acquired after administration of 0.1 mL/kg of gadolinium-based contrast product (Gadovist; Bayer).

In the external validation sets, patients were scanned on 1.5 or 3.0 Tesla MRI units (Avanto; Siemens or Achieva; Philips Medical Systems), including T2 and T1C images. T1C images were acquired after administration of 0.1 mL/kg of gadolinium-based contrast product (Dotarem; Guerbert or Gadovist; Bayer). Substantial variation existed between acquisition parameters for T2 and T1C among different MRI units between institutional and external validation sets and reflected the heterogeneity of meningioma imaging data in clinical practice (Supplementary Table 1) .

Image pre-processing and radiomic feature extraction

Image resampling to 1 mm isovoxels, correction of low-frequency intensity non-uniformity by the N4 bias algorithm, and co-registration of T2 images on T1C images were performed using advanced standardization tools (ANT).20. After stripping the skull by Multi-cONtrast brain STRipping (MONSTR)21, signal intensities were normalized by z-score. An affine registration was carried out to transform the cerebral images to the MNI15222.

A neuroradiologist (with 9 years of experience) who was blinded to clinical information semi-automatically segmented the entire tumor (including cystic or necrotic changes) on the T1C images using 3D Slicer software ( v.4.13.0; www.slicer.org) with edge and threshold based algorithms. Another neuroradiologist (with 16 years of experience) reassessed and confirmed the segmented lesions.

The radiomics characteristics were calculated with a python-based module (PyRadiomics, version 2.0)23, with a bin size of 32. They included (1) 14 shape features, (2) 18 first-order features, and 3) 75 s-order features (including grayscale co-occurrence matrix, length runtime grayscale matrix, grayscale size area matrix, grayscale dependency matrix, and neighbor graytone difference matrix) (Supplementary Material S1 and Supplementary Table 2). Features adhere to sets of standards by the Image Biomarker Standardization Initiative 24. A total of 214 radiomic elements (107 × 2 sequences) were extracted.

Construction of radiomic models

The scheme of building the radiomics model and setting up an application system based on CycleGAN is shown in Fig. 1a. Radiomic characteristics have been normalized MinMax. Since the number of radiomic features was greater than the number of patients, mutual information was applied to select the significant features. The basic radiomic classifiers were constructed using extreme gradient boosting with tenfold cross-validation in the training set. The synthetic minority oversampling technique was applied to oversample the minority class25. To improve predictive performance and avoid possible overfitting, Bayesian optimization, which searched the hyperparameter space for optimal hyperparameter combinations, was applied. Area under the curve (AUC), precision, sensitivity, specificity and F1 score (definitions presented in Supplementary Material S2) were obtained. The feature selection and machine learning process was done using Python 3 with the Scikit-Learn library module (version 0.24.2).

Figure 1

(a) Global CycleGAN pipeline and radiomics for meningioma grading. (b) General network architecture of CycleGAN. CycleGAN = Cycle-Consistent Adversarial Networks, T1C = post-contrast T1-weighted image, T2 = T2-weighted image.

CycleGAN app

Figure 1b shows the general network architecture of CycleGAN. Generative adversarial network (GAN) has two neural networks namely a generator and a discriminator for distinct purposes. The CycleGAN uses two sets of GANs for style transfer to train unsupervised image translation models16. Unpaired institutional training and external validation datasets were used to train CycleGAN’s discriminators and generators.

To be delivered in CycleGAN16, brain MRIs were converted to two-dimensional images in every aspect of the axial, sagittal, and coronal planes. Because image size varied between institutions and individuals, images were resized to 99 × 117 × 95 pixels after saving the MNI152 model and 116 × 116 pixels before putting them into CycleGAN.

In the first set of GANs, the first generator (G1) in CycleGAN converts images from the external validation dataset to the domain of the institutional training dataset, while the first discriminator D1 checks whether the computed images by G1 are real or false (generated). Through this process, the synthetic images of G1 improve with the feedback of their respective discriminators. In the second set of GANs, the second generator (G2) transfers the synthetic image generated from the first generator (G1) to the image of the original external validation dataset, while the second discriminator ( D2) checks if the images computed by G2 are real or fake (generated). Through this process, the trained CycleGAN model transferred the styling of the external validation images to the training set. The cycle consistency loss, which is the difference between the generated output and the input image, was calculated and used to update the generator models at each training iteration.16. L2 loss, which is known to speed up the training process and generate crisp, lifelike images in GAN26.27, was used to estimate the loss of cycle coherence. Inference results were randomly sampled and checked by a neuroradiologist (with 9 years of experience) for plausibility. Images from the external validation set after CycleGAN were submitted to assess the performance of the radiomics model against the original external validation dataset. Since the original external validation set and images from the external validation set after CycleGAN were independent of the radiomics modeling in the training process, there is no potential data breach.28. Details of the CycleGAN architecture are shown in Supplementary Table 3.

Evaluation of the effect of CycleGAN: Fréchet Inception Distance and t-Distributed Stochastic Neighbor Embedding

The Fréchet start distance (FID) was calculated to measure the similarity between two sets of image data to quantitatively measure the quality of the model by evaluating the generated data (Supplementary Material S3)29. The FID is an extension of the initial score30 and compares the distribution of generated images with the distribution of actual images that were used to train the generator. The FID has been shown to be consistent with human judgments and more robust to noise than the initial score29. Three FID scores, namely “training vs original external validation”, “original external validation vs transferred external validation” and “training vs transferred external validation” were calculated. To visualize the effect of CycleGAN on the extracted radiomic features, the high-dimensional feature space was projected and visualized into a lower-dimensional space using a two-dimensional manifold t-Distributed Stochastic Neighbor Embedding (t-SNE)31.

Comments are closed.