We present a method to improve accuracy of renal tumor segmentation in CT images by considering the structure of kidneys. We apply deep learning based on the following protocol. First, kidney regions are extracted from the CT images. Second, adjacent eight slices along axial direction are picked up as a patch. Third, the center of gravity of the kidney region in the patch is aligned to the center of the patch in sagittal and coronal direction. Fourth, we apply data augmentation with scaling and rotation around the center, which preserves the basic slice structure of kidneys. Finally, these patches are fed to a 3D U-Net for training. Compared with the conventional 3D U-Net, the proposed method improves the DICE score from 0.507 to 0.604.
© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
The number of patients with renal cancer is increasing every year and this trend is expected to continue in the coming years. Since renal tumors are often discovered accidentally during tests for other diseases, there is a possibility that some doctors may miss a lesion.
Recently, automatic organ and tumor segmentation using machine learning has been tried vigorously and numerous competitions have been held in this research area. Almost all the researchers in this field use deep learning to tackle the problem.
In 2012, the deep learning-based method proposed by Hinton et al.  greatly outperformed the conventional methods in ImageNet Large Scale Visual Recognition (ILSVRC) . Since then, deep learning has dominated in the field of image recognition and has also been widely used in the field of medical imaging .
2D U-Net  and 3D U-Net , which are variations of convolutional neural networks (CNN) designed to be used in segmentation tasks, are quite often used to label organs or tumors in medical imaging. Since CT images have 3D structures, 3D U-Net is used most often, which holds true in the renal tumor detection competition KiTS19  also.
Since 3D U-Net requires large memory capacity because of its 3D structure, it is often the case that the image data size fed to the network is quite limited, which leads to decisions based only on local information, causing false positives in implausible regions. Moreover, 3D U-Net has more parameters than 2D U-Net, which causes unstable learning when the distribution of classes is unbalanced.
Several studies have been tried to address these drawbacks. One solution for the former drawback is to combine global and local networks to remove implausible noises. In brain tumor segmentation, Kamnitsas et al. have fed high- and low-resolution data to each dedicated CNN and have combined these outputs to address the problem of locality . Another solution is to use 3D U-JAPA-Net , which combines a probability atlas of each organ location and applies 3D U-Net in the plausible location of organs. It also applies fine-tuning transfer learning to improve DICE scores. AH-Net  learns information between slices efficiently with few parameters by transferring 2D network features to a 3D network to address the latter drawback.
In the hepatic tumor segmentation, Tang et al. improved segmentation accuracy by using E2Net, where edge part and the other part of liver and hepatic tumors were learned in parallel . Wang et al. proposed dual-level selective transfer learning, where CT images of two phases were utilized to realize better segmentation results .
In segmentation of renal tumors, some participants in KiTS19 tried to improve the accuracy by modifying the network to use residual layers [12,13], but no dramatic change in results were observed. Li et al. proposed a memory-efficient automatic kidney and tumor segmentation method by applying depthwise separable convolution and non-local context guided mechanism . Ruan et al. proposed Mt-UcGAN, where quantification of tumor indices and uncertainty estimation of segmentation were unified with segmentation of renal tumors .
In this study, we present a new learning method using a 3D U-Net, where the errors in implausible regions are suppressed by taking into account the global structure of kidneys. We propose use of a wide and thin patch with the center of gravity aligned to the center of the patch. In the conventional 3D U-Net, patch shape is close to cube to take into account the overall 3D structure on average, while the proposed system adopts a flat cuboid patch to cover the whole structure in the axial (transverse) plane, which tends to have a rotational symmetry when the kidney region is located in the center of each patch. Our proposed pre-processing method realizes natural data expansion with plausible kidney slice structures to facilitate effective training by data augmentation with rotating and scaling. The proposed data augmentation is much simpler than the elaborate methods for CT imaging [16–18].
2.1 Conventional 3D U-Net learning scheme
The goal of this paper is to improve segmentation accuracy of renal tumors by using a 3D U-Net with data augmentation that takes into account the kidney slice structures. In order to measure the effectiveness of proposed method, we compare the proposed method with the standard 3D U-Net training method under the same conditions.
First, we describe the conventional 3D U-Net used in this study as a control of the experiment (Fig. 1). In the contraction path, two convolutional layers with a kernel size of 3×3×3 is adopted in each layer. In each convolution, input images are padded on the edges with zero so that the image size of the input and output may be equal. Then, each convolution process is followed by batch normalization (BN) , activation by the rectified linear unit (ReLU) , and max pooling with a 2×2×2 kernel and stride with 2×2×2 for extracting features and down-sampling. In addition, the number of channels is doubled in the second convolution in each step. Every step in the expansive path consists of an up-sampling of the feature map followed by a 2×2×2 up-convolution that doubles resolutions, a concatenation with the feature map from the contracting path, and two 3×3×3 convolutions, each followed by BN and ReLU. In the final layer, a convolution with kernel size 1×1×1 is applied to map each 64-component feature vector to the desired number of classes, where a sigmoid function for classification is used instead of BN and ReLU.
In the conventional 3D U-Net learning scheme, we set the input size to 32×48×48 (axial, coronal, sagittal) and the output size to be the same as the input. We train this model to predict renal tumor regions. Training data is generated by raster-scanning the region of above size from CT images and ground truth labels. Here note that only the portions containing renal tumors are used for training. Since the portions including tumors are limited, each patch is made so that it may have a 50% overlap with neighboring patches in each direction. During the test, only the portions containing kidney labels acquired from subsection 2.3 are predicted.
2.2 Pre-processing of images to generate thin patches
In this subsection, data generation method we propose in this paper is described. In order to make it easier for the model to understand the structure of kidneys, input images are pre-processed as below.
The procedure of patch generation is shown in Fig. 2. First, we search a kidney region and make a bounding box. Second, the bounding box is set to be in the center of a larger box. Third, neighboring eight axial slices are picked up from the above large box (Non-Aligned Thin Patch: NAT-Patch). Fourth, the image is shifted along the axial plane so that the center of gravity of the kidney region in eight slices may be located in the center of the patch (Aligned Thin Patch: AT-Patch). Here the center of gravity of kidney region is calculated by averaging the coordinates of voxels labeled as kidney in the ground truth. (Note that the ground truth is used only in the training process.)
The size of a patch is 8×96×96. Each patch is made so that it may have 87.5% overlap with other patches along the axial direction (stride is one). Note that patches have to be moved along the axial, sagittal and coronal directions in the conventional 3D U-Net, which causes data explosion when the stride is too small. We set the output size to be the same as the input size. When testing, we find the center of gravity with a kidney label acquired from subsection 2.3 instead of a ground truth label. For comparison, we also experiment with the case where the image is not shifted to the center (NAT-Patch).
One notable feature of kidney slices is that they tend to have rotational symmetry geometrically. When we apply data augmentation by rotating and scaling the slice image, natural data for training are generated when the center of gravity is located in the center of image, which is set to be the axis of image rotation.
2.3 Generating kidney labels with 3D U-Net
In order to segment renal tumors, kidney labels are needed in the beginning. In this subsection, we describe the method we use to generate them with 3D U-Net. We use datasets of two kinds, one of which comprises CT images, kidney labels, and renal tumor labels (kidney dataset). The other comprises CT images and liver labels (liver dataset). The information of liver labels is used to remove noise in the segmentation of kidneys.
The structure of 3D U-Net used to segment livers and kidneys is show in Fig. 3. The basic architecture is the same as the network shown in Fig. 1. Therefore, we only describe the differences. In the convolutional layer, input images are not padded. Therefore, the input size is 116×132×132 and the output size is as small as 28×44×44. Because kidneys and a liver are much greater than renal tumors, this model is adopted. The training data is generated by raster-scanning the region of above size from CT images and ground truth labels.
The training goes as shown in Fig. 4. First, a kidney label segmentation model and a liver label segmentation model are trained with kidney dataset and liver dataset respectively. Second, kidney and liver labels are created by the network trained above for the kidney dataset. Then, the liver region predicted by the trained network is eliminated from the predicted kidney label. Finally, a pair of labeled regions with the largest and the second largest sizes are extracted to eliminate speckles.
3. Experiments and results
We compare accuracies given by the methods described in the previous section. We apply the conventional 3D U-Net and the 3D U-Net with NAT-Patch and AT-Patch to renal tumor detection.
In this study, we use two private datasets provided by the University of Tsukuba Hospital. One dataset consists of CT images of 105 patients with renal tumors, where the kidneys and renal tumors are labeled by radiologists (kidney dataset).
The other consists of CT images of 108 patients with hepatic tumors, where the liver and hepatic tumors are also labeled by experts (liver dataset). For pre-processing, all images are resampled to 3.0×0.78×0.78 mm. The CT intensity values were scaled to [0, 1] using [−100, 400] HU window. The resolution of each CT slice image ranged from 333×333 to 634×634 pixels. The number of slices ranged from 48 to 341.
In renal tumor segmentation, data augmentation using rotation ranging from 0 to 360 degrees and scaling from 0.95 to 1.05 times was applied randomly to each patch in 3D U-Net with NAT-Patch and AT-Patch to increase the number of training samples. In conventional 3D U-Net, random flip in each direction and scaling from 0.95 to 1.05 times was applied for data augmentation. All components of the model were implemented with PyTorch framework . The PC we used was composed of Intel Core i7-8700 K CPU, 32GB main memory, and NVIDIA Quadro GV100 with 32GB video memory. The detail of experiments was as follows: epoch = 300; learning rate = 0.0015; loss function = DICE Loss + Binary Cross-Entropy; batch size = 16; optimizer = Adam . We applied three-fold cross-validation in all the experiments to evaluate the performance of each method. We applied neither post-processing nor masks to reduce false positives.
In kidney and liver segmentation, the experimental conditions were almost the same as above. The only differences were lack of data augmentation and minor differences of parameters (epoch = 50; learning rate = 0.001; batch size = 3). The average DICE score for kidney segmentation was 0.922 for kidney dataset. The liver segmentation DICE was 0.961 for liver dataset. After correction of kidney labels with liver labels, the average DICE score for kidneys was improved to 0.926.
The results of these experiments are shown in Figs. 5, 6 and 7, where the average of DICE per case (N = 105) and the standard deviation are included. Also shown are the results of the paired t-tests between the results of each experiment. Figure 5 shows the results for all data. Figure 6 shows the results for data with small tumors (N = 75). Data with small tumors is defined as S < 0.2, where
The results for data with large tumors which meet S ≥ 0.2 are shown in Fig. 7 (N = 30).
Figures 8 and 9 show the examples of predictive labels obtained in the experiment. In Fig. 8, the result obtained by 3D U-Net includes large prediction errors, whereas the 3D U-Net with NAT-Patch and AT-Patch do not. Figure 9 shows that 3D U-Net with the proposed pre-processing (AT-Patch) identifies the tumor with a higher accuracy.
As Fig. 5 shows, the proposed method gives significantly better results than the conventional method. As explained above, the augmented data given by rotation can be natural expansions of data with plausible images when the center of gravity is aligned in the center of image, because the axial slices of kidneys tend to have rotational symmetry.
As Figs. 6 and 7 show, the conventional and proposed methods give almost the same results for data with large tumors, while the proposed method outperforms the conventional method for data with small tumors. Since the basic structure of kidneys is preserved when the renal tumor is small, data augmentation taking into account the rotational symmetry of kidney structure works well as expected. Higher accuracy in detecting small tumors, which is a more difficult task for humans, is practically useful.
As Fig. 8 shows, 3D U-Net tends to have false positives in various regions of kidneys, which results from the fact that only local data are fed to the 3D U-Net because of the limitation of memory capacity. Since the entire kidney region in each slice is fed to 3D U-Net, this kind of errors can be decreased when we use 3D U-Net with NAT-Patch and AP-Patch.
We have proposed a new deep learning method to improve accuracy of renal tumor segmentation in CT images by taking into account the rotational symmetry of kidney slices. A 3D U-Net with the thin and wide patch size is used to include the whole kidney in each slice. We have applied pre-processing to align the center of gravity of kidney regions to the image center, where data augmentation is given by scaling and rotating the image around the center. Compared with the conventional 3D U-Net, the proposed method improves the DICE score from 0.507 to 0.604, which is statistically significant. For data with small tumors, the proposed method outperforms the conventional method (from 0.445 and 0.569), which is also statistically significant.
Core Research for Evolutional Science and Technology (JPMJCR18A2); Japan Society for the Promotion of Science (17H00750).
The authors declare no conflicts of interest.
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
1. A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems 25 1, 1097–1105 (2012). [CrossRef]
2. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” Int J Comput Vis 115(3), 211–252 (2015). [CrossRef]
3. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken, and C. I Sánchez, “A Suvey on Deep Learning in Medical Image Analysis,” Med. Image Anal. 42, 60–88 (2017). [CrossRef]
4. O Ronneberger, P Fischer, and T Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” MICCAI 2015, LNCS 9351, pp. 234–241, (2015).
5. Ö. Çiçek, A Abdulkadir, S S Lienkamp, T Brox, and O Ronneberger, “3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation,” MICCAI 2016, LNCS 9901, pp. 424, (2016).
6. N Heller, N Sathianathen, A Kalapara, E Walczak, K Moore, H Kaluzniak, J Rosenberg, P Blake, Z Rengel, M Oestreich, J Dean, M Tradewell, A Shah, R Tejpaul, Z Edgerton, M Peterson, S Raza, S Regmi, N Papanikolopoulos, and C Weight, “The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical Outcomes,” arXiv:1904.00445, (2019).
7. K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker, “Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation,” Med. Image Anal. 36, 61–78 (2017). [CrossRef]
8. H Kakeya, T Okada, and Y Oshiro, “3D U-JAPA-Net: Mixture of convolutional networks for abdominal multi-organ CT segmentation,” MICCAI 2018, LNCS 11073, pp. 426–433, (2018).
9. S. Liu, D. Xu, S. K. Zhou, T. Mertelmeier, J. Wicklein, A. Jerebko, S. Grbic, O. Pauly, W. Cai, and D. Comaniciu, “3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics 11071, 851–858 (2018). [CrossRef]
10. A Myronenko and A Hatamizadeh, “3D Kidneys and Kidney Tumor Semantic Segmentation using Boundary-Aware Networks,” ArXiv:abs/1909.06684, (2019).
11. Y Tang, Y Tang, Y Zhu, J Xiao, and R M Summers, “E2Net: An Edge Enhanced Network for Accurate Liver and Tumor Segmentation on CT Scans,” MICCAI 2020, LNCS 12264, pp. 512–522, (2020).
12. W Wang, Q Song, J Zhou, R Feng, T Chen, W Ge, D Z Chen, S K Zhou, W Wang, and J Wu, “Dual-Level Selective Transfer Learning for Intrahepatic Cholangiocarcinoma Segmentation in Non-enhanced Abdominal CT,” MICCAI 2020, LNCS 12261, pp. 64–73, (2020).
13. G Santini, N Moreau, and M Rubeaux, “Kidney tumor segmentation using an ensembling multi-stage deep learning approach,” A contribution to the KiTS19 challenge, arXiv: abs/1909.00735, (2019).
14. V. Sandfort, K. Yan, P. J. Pickhardt, and R. M. Summers, “Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks,” Sci. Rep. 9(1), 16884 (2019). [CrossRef]
15. Z Li, J Pan, H Wu, Z Wen, and J Qin, “Memory-Efficient Automatic Kidney and Tumor Segmentation Based on Non-local Context Guided 3D U-Net,” MICCAI 2020, LNCS 12264, pp. 197–206, (2020).
16. Y Ruan, D Li, H Marshall, T Miao, T Cossetto, I Chan, O Daher, F Accorsi, A Goela, and S Li, “Mt-UcGAN: Multi-task Uncertainty-Constrained GAN for Joint Segmentation, Quantification and Uncertainty Estimation of Renal Tumors on CT,” MICCAI 2020, LNCS 12264, pp. 439–449, (2020).
17. G. Wang, W. Li, M. Aertsen, J. Deprest, S. Ourselin, and T. Vercauteren, “Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks,” Neurocomputing 338, 34–45 (2019). [CrossRef]
18. A Krishna, K Bartake, C Niu, G Wang, Y Lai, X Jia, and K Mueller, “Image Synthesis for Data Augmentation in Medical CT using Deep Reinforcement Learning,” arXiv:2103.10493, (2021).
19. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” ICML 37, 448–456 (2015).
20. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” ICML 807–814 (2010).
21. A Paszke, S Gross, F Massa, A Lerer, J Bradbury, G Chanan, T Killeen, Z Lin, N Gimelshein, L Antiga, A Desmaison, A Köpf, E Yang, Z DeVito, M Raison, A Tejani, S Chilamkurthy, B Steiner, L Fang, J Bai, and S Chintala, “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” arXiv:1912.0170, (2019).
22. D P Kingma and J L Ba, “Adam: A method for stochastic optimization,” ICLR 2015, Vol 13, (2015).