Automated Breast Ultrasound Lesions Detection using Convolutional Neural Networks

—Breast lesion detection using ultrasound imaging is considered an important step of Computer-Aided Diagnosis systems. Over the past decade, researchers have demonstrated the possibilities to automate the initial lesion detection. However, the lack of a common dataset impedes research when comparing the performance of such algorithms. This paper proposes the use of deep learning approaches for breast ultrasound lesion detection and investigates three different methods: a Patch-based LeNet, a U-Net, and a transfer learning approach with a pretrained FCN-AlexNet. Their performance is compared against four state-of-the-art lesion detection algorithms (i.e. Radial Gradient Index, Multifractal Filtering, Rule-based Region Ranking and Deformable Part Models). In addition, this paper compares and contrasts two conventional ultrasound image datasets acquired from two different ultrasound systems. Dataset A comprises 306 (60 malignant and 246 benign) images and Dataset B comprises 163 (53 malignant and 110 benign) images. To overcome the lack of public datasets in this domain, Dataset B will be made available for research purposes. The results demonstrate an overall improvement by the deep learning approaches when assessed on both datasets in terms of True Positive Fraction, False Positives per image, and F-measure.


I. INTRODUCTION
B REAST cancer is one of the leading causes of death for women worldwide and it is expected that more than 8% of women will develop breast cancer during their lifetime [1].The most commonly used and effective technique for breast cancer detection is digital mammography (DM) [2].However, there are some limitations to DM imaging in dense breasts, where lesions have a similar attenuation compared to the dense tissue, and as such they can be hidden by the surrounding tissue.Currently, an important alternative to DM is ultrasound (US) imaging, which is used as a complementary method for breast cancer detection due to its versatility, safety and high sensitivity [3].However, US imaging depends more on the radiologist than other commonly used techniques such as mammography.Interpreting US images requires experienced and well-trained radiologists due to the complexity and presence of speckle noise.Thus, Computer-Aided Diagnosis (CAD) could be beneficial to help radiologists in the US-based detection of breast cancer, minimizing the effect of the operator-dependent nature of US imaging.Different studies have investigated the influence of CAD on diagnostics [4], [5] and showed that CAD is an important tool to improve the diagnostic sensitivity and specificity.The first challenge in any CAD is the ability to locate the lesion.This process should be automated to help the radiologist make a diagnosis efficiently and a high sensitivity and specificity are expected.
The lack of a public standard dataset in breast US research has limited the fair evaluation of the performance of algorithms.The quality of breast US images is highly dependent on the acquisition process and there is a vast variability between different US systems that influence the results obtained by algorithms.The appearance, location and size of the lesions also affect the results.
In this paper, we review four popular lesion detection methods [6]- [9].We propose the use of deep learning approaches for breast ultrasound lesion detection and investigate three different methods: a Patch-based LeNet, a U-Net, and a transfer learning approach with a pretrained FCN-AlexNet.Then the performances of deep learning approaches are compared with the state-of-the-art algorithms on two breast ultrasound datasets (Dataset A and Dataset B) and make Dataset B available for research purposes.To date, we are the first to conduct this comprehensive comparison on two common datasets and propose the use of deep learning approaches for breast US lesion detection.

II. RELATED WORK
This section describes four state-of-the-art methodologies for lesion detection in breast US imaging.Two of the selected methodologies, Radial Gradient Index (RGI) Filtering [6] and Multifractal Filtering [7], are two of the most cited works in this area.This study also includes two recent approaches, Rulebased Region Ranking [8] and Deformable Part Models [9].

A. Radial Gradient Index (RGI) Filtering
Drukker et al. [6] developed a lesion detection and classification method as a two-stage process.The first stage was the detection of lesion candidates using a RGI Filtering technique.The second stage was the classification of those candidates, segmenting them by maximising an average radial gradient (ARD) index for regions grown from the detected points and classifying them with a Bayesian neural network as false positives or potential lesions.Here we focus on the performance evaluation of the initial lesion detection stage, thus only the location of lesion candidates is evaluated.
Lesion candidates were identified using a filtering technique based on the calculation of the RGI of contours throughout the image [10].For a given point (x, y) in the image, lesionlike shapes were obtained by multiplying the image with a 2D isotropic Gaussian function centred at (x, y) to construct a constrained image.Contours of the lesion candidates for a given point were obtained by grey-level thresholding the constrained image.All possible lesion contours within a specified size range were determined, and the RGI value was calculated for each contour as a measure of the likelihood that a given contour represents a lesion.
where C i is the i-th possible lesion contour, g(x , y ) is the maximum grey-value gradient vector of length | g(x , y )| and r(x , y ) the unit radial vector pointing from (x, y) to (x , y ).By definition, due to normalization, RGI values are between 1 (pointing radially outward) and −1 (pointing radially inward).For a given image point (x, y), the contour with the maximum absolute RGI value was selected, and this value was assigned to the (x, y) coordinate in the RGI-filtered image.The RGI-filtered image was subsequently thresholded to determine lesion candidates.The threshold was varied iteratively until either at least one region of interest is detected, indicating a lesion candidate, or the minimum specified RGI threshold value was reached.

B. Multifractals Filtering
The main contribution of the Multifractals Filtering technique lies in the implementation of multifractals analysis in breast US.In 2008, Yap et al. [7] presented a novel initial lesion detection method based on a set of image processing operations.To ensure the homogeneity of the US images, histogram equalisation was first implemented.Then the speckle noise was reduced using a hybrid filtering approach [11].Hybrid filtering combines the strength of nonlinear diffusion filtering [12] to produce edge-sensitive speckle reduction, followed by linear filtering (Gaussian blur) [13] to smooth the edges and to eliminate oversegmentation.Subsequent to hybrid filtering, multifractals [14] were used to further enhance the partially processed images.Multifractal analysis refers to the analysis of an image using multiple fractals (i.e.not just one as in fractal analysis).The generalized formulation for multifractal dimensions (D) of order q can be represented as: for q ∈ R and q = 1 lim where is the linear size of the cells, q is the order for cell size and μ is the measure defined as the probability of the greyscale level in the images, where all the grey levels fall in the range of (0-1).Multifractal analysis enables improved separability of tumour regions from normal regions.**** After pre-processing, images were segmented by using a grey-value thresholding segmentation method [15].This thresholding segmentation often leads to the identification of multiple regions of interest, of which generally only one or two would be of diagnostic importance.To identify these important regions, a rule-based Region of Interest (ROI) selection, based on the size and location of the region was used as a discriminative criterion.Based on the knowledge provided by expert radiologists [16], most of the lesions are located in the upper part of the images.Hence, a reference point (x r , y r ) where was chosen, with x r from the top of the image.The candidate region closest to the (x r , y r ) location and that satisfied the size-related criterion was selected as the final detected lesion.

C. Rule-Based Region Ranking (RBRR)
Shan et al. [8] proposed a lesion detection methodology that considered both texture and spatial features.They first used speckle reducing anisotropic diffusion (SRAD) [17].The SRAD method processes the image iteratively with adaptable weighted filters to reduce noise and preserve edges.The diffusion coefficient was determined by where q(x, y; t) is the instantaneous coefficient of variation depending on gradient ∇I and the Laplacian ∇ 2 I and determined by The initialisation q 0 (t) is given by where t is the iteration time and z(t) is the most homogeneous area in the image at iteration t and var[z(t)] is its variance.
Once the image was de-speckled, an iterative threshold selection algorithm was applied to segment the image.First, all local minima of the image histogram were calculated and the de-speckled image was binarised using the smallest local minimum as the threshold value.Then, if the ratio of the number of foreground pixels and the number of background pixels was less than 0.1, the next local minimum value was set as the threshold.The process continued iteratively until the ratio was larger than 0.1.This value was chosen experimentally in the original paper [8].Subsequently, morphological operations (dilation and erosion) were performed to remove noisy regions.If none of the regions intersected with the image centre region (a window about half the size of the entire image and located at the image centre) the threshold became the next local minimum and the process was repeated.Once some region intersects with the central window, regions connected with the boundary that do not intersect with the central window are removed.The remaining lesion region candidates were ranked using the scoring formula where k is the number of candidate regions, Area n is the number of pixels in the region, C n is the center of the region, C 0 is the center of the image, dis(a, b) is the Euclidean distance between points a and b and var(C n ) is the variance of a small circular region centered at C n .Finally, the location of the seed point was located in the centre of the region with a highest score.Thus, ((x min + x max )/2, (y min + y max )/2) was considered as a seed point, where [x min , y min , x max , y max ] defined the minimum rectangle that contained the lesion.

D. Deformable Part Models (DPM)
The DPM proposed by Felzenszwalb et al. [18] is one of the effective object detection methods in the recent literature.The work of Pons et al. [9] demonstrated the feasibility of adapting this methodology to detect lesions in breast US images and obtained accurate results.The DPM method modeled the appearance of objects based on a histogram of oriented gradients (HOG) in terms of a low resolution root filter template, which defined the detection window, along with a set of higher resolution part filter templates that captured finer details.Each part defined a set of possible placements for a part relative to the root filter and a deformation cost for each placement.
The system used a scanning window approach that searched a model over a HOG pyramid [19] to detect objects at different scales.The image was divided into a dense grid where the histogram of gradient orientations was computed in each cell and is normalised with respect to the gradient energy in the neighbourhood surrounding it.The HOG pyramid was defined by computing the HOG features of each level of an image pyramid.Hence, features at the top level captured coarse gradients as opposed to finer gradients found at lower levels.
Both root and part filters were rectangular templates F of size w × h specifying weights for subwindows of a HOG pyramid.In this case, H is a HOG pyramid and p = (x, y, l) a location in the l-th level of that pyramid.The vector obtained by concatenating the HOG features in the w × h subwindow of H in p was defined as φ(H, p) and the score of F on this detection window was F • φ(H, p).
The model for an object with n parts was defined by a root filter F 0 and a set of parts P i = (F i , v i , d i ), where F i was a filter for the i-th part, v i was a two-dimensional vector specifying possible locations relative to the root, and d i was a four-dimensional vector specifying coefficients of a quadratic function that defines a deformation cost for each possible placement of the part.
The placement of the model was given by z = (p 0 , ..., p n ) where p i = (x i , y i , l i ) specifies the level and the position of the i-th filter.Note that the location of the root filter was defined when i = 0.The final score of a detection was the score of the root filter plus the score of the best location of the parts, placed at twice the resolution in the pyramid, minus a deformation cost that penalises undesired placements of the parts, where gives the displacement of the i-th part relative to the root location and d i are the deformation features.
The method took advantage of the additional information provided by the part filters.However, these part filters do not need to be labelled (they were considered as latent values).The method described a discriminative training with partially labelled data called a latent Support Vector Machine, which was an iterative training process that alternates between fixing latent values for positive examples and optimizing the latent SVM function (see Felzenszwalb et al. [18] for details).

E. Deep Learning for Breast Imaging
Overall, the state-of-the-art methods are not robust, particularly the image processing based approaches, relying on rule based approaches and specific assumptions.Without needing such strong assumption, deep learning approaches have shown a superior accuracy in object detection, which suggests that could also improve the state of the art of lesion detection in breast ultrasound.Deep learning in medical imaging is mostly represented by convolutional networks.Based on how they are trained, they can be mostly categorized in the following: 1) Patch-based CNNs approach.This approach trains the convolutional neural networks (CNNs) with image patches for training and a sliding window approach for testing [20], [21].However, feeding each patch to the network is time-consuming and the patch overlap produces substantial redundancy [22].2) Fully convolutional approach.To avoid computational redundancy, Long et al. [23] proposed a fully convolutional approach to increase the efficiency by training on whole images.It produces segmentation by pixelwise prediction rather than single probability distribution in the classification task for each image.An example of a modified version of such approach is U-Net [22].3) Transfer learning approach.Another approach that has been widely used recently in biomedical research is the transfer learning approach [24], [25].This method uses a pre-trained model from non-medical images to overcome the limitation of data deficiency in medical imaging research.In breast imaging, the majority of the existing publications are focusing on using CNNs for mammography.Dhungel et al. [26] have implemented deep learning for segmentation of masses; Mordang et al. [27] proposed the use of CNNs in microcalcification detection; and more recently, Ahn et al. [28] proposed the use of CNNs in breast density estimation.In breast ultrasound imaging, Huynh et al. [24] proposed the use of a transfer learning approach for ultrasound breast images classification.This is the only work in breast ultrasound but it does not cover lesion detection.In this paper, we propose the use of deep learning approaches for automated breast ultrasound lesions detection.To show the benefits of deep learning approaches, we compare the performances with the four aforementioned (Section II A-D) state-of-the-art lesion detection algorithms.

A. Overview
This study made use of two different datasets of US images.The datasets were obtained from US systems with different specifications and at different times.They are referred to as Dataset A and B.
Dataset A was collected in 2001 from a professional didactic media file for breast imaging specialists [16].The images were obtained with B&K Medical Panther 2002 and B&K Medical Hawk 2102 US systems with a 8-12 MHz linear array transducer.The dataset consists of 306 images from different cases with a mean image size of 377 × 396 pixels.These images contained one or more lesions.Within the lesion images, 60 images presented malignant masses and 246 were benign lesions.From the malignant images, 27 were diagnosed as invasive ductal carcinomas, 4 were ductal carcinomas in situ, 6 were malignant phyllodes tumours and 23 were other unspecified malignant lesions.From the benign images, 74 were complex cysts, 89 were simple cysts, 55 were fibroadenomas and 28 were other benign lesions.To obtain Dataset A, the user needs to purchase the didactic media file from Prapavesis et al. [16].
Dataset B was collected in 2012 from the UDIAT Diagnostic Centre of the Parc Taulí Corporation, Sabadell (Spain) with a Siemens ACUSON Sequoia C512 system 17L5 HD linear array transducer (8.5 MHz).The dataset consists of 163 images from different women with a mean image size of 760 × 570 pixels, where each of the images presented one or more lesions.Within the 163 lesion images, 53 were images with cancerous masses and 110 with benign lesions.From the malignant images, 40 were invasive ductal carcinomas, 4 were ductal carcinomas in situ, 2 were invasive lobular carcinomas and 7 were other unspecified malignant lesions.From the benign images, 65 were unspecified cysts, 39 were fibroadenomas and 6 were of another type of benign lesions.Note that in both datasets the lesions were delineated by experienced radiologists.Dataset B and the respective delineation of the breast lesions will be available online (goo.gl/SJmoti) for research purposes.

B. Comparison
Fig. 1 displays three images from each of the two datasets to represent the differences in three aspects: speckle noise, image quality and lesion appearance.In terms of speckle noise, images from Dataset A show a significant presence of this artefact but it is less obvious for images in Dataset B, where the speckle noise was partly reduced by the US acquisition system.The image quality also varies in both datasets due to the different resolutions.Note that the resolution for the recent US device to produce Dataset B is better than in the older US device (Dataset A).Consequently, the defined structures (such as ribs, pectoral muscle or parenchymal tissue) are more visible in Dataset B. The lesion appearance also varies in both datasets.In Dataset B the appearance of tissue is better defined than in Dataset A, as is illustrated in Fig. 1(b) where even the inner structures in the fibroadenoma lesion are visible.
To further evaluate the datasets, we compare the lesion size, the ratio between the area of the lesion and the area of the image, and the distance from the image centre and the lesion centroid.Fig. 2 shows the box plot charts for these comparisons where differences between both datasets are noticeable: the average size of the lesions in Dataset A is smaller than in Dataset B (Fig. 2(a)) but the ratio between lesion pixels and total image pixels is higher (Fig. 2(b)).Regarding the spatial distribution of the lesions in the image, lesions in Dataset A are more centred than in Dataset B (Fig. 2(c)).However, none of these differences are significant.Furthermore, other characteristics such as the quality of the image may affect the performance of the lesion detection results.

A. Convolutional Neural Networks
Deep learning is a representation learning method [29] that will automatically discover features suited for a particular task from the raw data.The feature extractors are task-specific, in that they are not fixed to a set of specific rules each time [30].Each network contains multiple layers that lead to hierarchical features used in the learning process [29], [31].Convolutional Neural Networks (CNNs) [32] have become an important technique in image analysis, particularly in detection or recognition of faces [33], text [31], human bodies [34] and biological images [35].However, it has not been used in breast ultrasound lesion detection.For these reasons, we study the performance of CNNs in breast ultrasound lesion detection.
CNNs consist of convolutional layers and pooling layers [32], where the role of the former is to extract local features from a set of learnable filters and the role of the latter is to merge neighbouring patterns, reducing the spatial size of the previous representation and adds spatial invariance to translation [29].CNNs are hierarchical neural networks and their accuracy is dependent on the design of the layers and training methods [36].
Some popular CNNs available in the Caffe framework [30] are LeNet [31], AlexNet [37] and GoogleNet [38].We investigated the use of three types of deep learning for breast lesion detection: a patch-based approach using LeNet [31], U-Net [22] and a transfer learning approach using Fully Convolutional Networks [23].
1) Patch-based LeNet: As the ultrasound breast images in the datasets are grayscale and the size of the breast lesions is relatively small, LeNet [31] was chosen as a suitable architecture to solve the two-class classification problem.The training and validation images are input as patches from areas of the images containing abnormal breast lesions and normal tissue.These input patches are sized at 28 × 28, which correlates to the input size of LeNet.The LeNet architecture is simple and was originally created for digit classification [31].Breast lesions contain similar gradients that can be exposed through CNNs.The overall architecture can be seen in Fig. 3, with the inputs consisting of image patches of breast lesions and normal tissue.The inputs are fed into the first convolution layer and max pooling layer, which is repeated once and finalised with two fully connected layers.The final number of outputs are 2 neurons, which are the activations generated for the two classes: lesion and non-lesion.The final part of the CNN is the output of class probabilities to measure how close the final fully connected parameters are with respect to the ground truth labels of the training and validation data.The loss was calculated using multinomial logistic loss with a softmax classifier.
The output of our network is a prediction of whether the patch is a lesion or healthy breast tissue.It is formed by two fully connected layers with the softmax function defined as where f j is the j-th element of the vector of class scores f and z is a vector of arbitrary real-valued scores that are squashed to a vector of values between zero and one that sum to one.The loss function is defined so that having good predictions during training is equivalent to having a small loss.A Rectified Linear Unit (ReLU) layer is included at the first fully connected layer.This element-wise operation is calculated in-place for the Caffe framework [30], and so saves on some memory.It is defined as where the function f thresholds the activations at zero.Using a sliding window of 28 × 28 pixels with a stride of 1 for the test images, the predicted lesion patches were segmented.The unconnected regions with an area of less than 10 pixels were removed from the segmented images to reduce False Positives (FPs) through empirical experimentation.The centre points of the segmented regions were recorded as seed points.
2) U-Net: U-Net is a modified and extended version of a fully convolutional network [22], which can overcome the need of large-scale dataset in biomedical imaging research.It is an encoder-decoder based CNN with skip connections.Ronneberger et al. [22] proposed U-Net to enable the use of data augmentation, including the use of non-rigid deformations, to make full use of the available annotated sample images to train the model.These aspects suggest that the U-Net could potentially provide satisfactory results with the size of the available datasets currently used.
3) Transfer Learning: Transfer Learning is a procedure where a CNN is trained to learn features for a broad domain after which the classification function is changed to optimize the network to learn features of a more specific domain.Under this setting, the features and the network parameters are transferred from the broad domain to the specific one.Our proposed transfer learning approach is based on fully convolutional networks (FCN-AlexNet) [23] for semantic segmentation.FCN-AlexNet is a fully convolutional network version of the original AlexNet classification model with a few adjustments of the network layers for segmentation [23].This network was originally used for the classification of 1000 different objects of classes on the ImageNet dataset [37].

B. Performance Metric
Lesion detection is an initial stage of CAD, which most of the time, uses the detected lesion location as a seed point to subsequently initialise a segmentation algorithm.Most of the breast US lesion detection methodologies in the literature evaluate their algorithms using the seed point detection as a criterion.In current practice, a radiologist annotated a rectangular ROI with four crosses.Based on these four extreme points (top, bottom, left and right), we generated a bounding box as illustrated in Fig. 5. Detection is considered as a True Positive (TP) if the detection point (centre of the segmented region) is placed within the bounding box of an expert radiologist.Otherwise, it was considered to be a False Positive (FP).
In this paper, we compare the performance of lesion detection techniques in breast US research by using True Positive Fraction (TPF) and False Positives per image (FPs/image) [6]- [8]: TPF = number of TPs number of actual lesions (11) FPs/image = number of FPs number of images .( TPF measures the sensitivity of the method.Some of the algorithms are capable of detecting multiple lesions while some are only capable of detecting a single lesion.The TPF allows a fair measurement as it is measuring the total detected lesions to the total number of actual lesions.Thus, if a method can detect only one lesion in an image with multiple lesions, the TPF of this methodology will be lower than the method that is capable of detecting multiple lesions. In addition to TPF and FPs/image, the F-measure (the weighted harmonic mean of recall and precision) [39], is computed as:

C. Implementation
It is worth mentioning that the implementation of DPM [9] and Multifractal Filtering [7] were provided by the original authors, while the implementation of the RGI Filtering [4] and RBRR [8] were accurately re-implemented following the description in their respective papers.To obtain the best performance for the state-of-the art methods on the datasets, we have defined some parameters.For Rulebased Region Ranking, since most of the lesions in [8] appear in the top region of the image, the central window was initialised in the centre-top part of the image.In addition, the iteration time t was set to 50 in the speckle reducing anisotropic diffusion (SRAD) process.In Multifractal Filtering [7], the order was specified as q = −1 for the cell size = 3.
The DPM approach [9] has been trained with a mixture model of 3 components and 8 parts for each root filter.These parameters were chosen in a previous study [40] where different configurations of DPM parameters were assessed in order to obtain the best results in breast US images.For the number of available images, we have configured the training and testing processes as a 10-fold cross-validation.This methodology vastly increases the computation costs in the training stage but allows a more accurate assessment of the methods.
The proposed Patch-based CNNs approach for this study is the LeNet framework [31].The breast ultrasound images are in grayscale and are split into 28 × 28 patches.The network is trained by using Root Mean Square Propagation (RMSProp) with a learning rate of 0.01, 60 epochs with the dropout rate of 0.33.The experiment is run using 10-fold cross validation.
For the U-Net implementation, the training data includes the original ultrasound breast images and ground truth training label as shown in Fig. 4. We assessed the performance of the model using 10-fold cross validation.The network is trained by using the Adam optimizer [41], with a learning rate of 0.0001 and 300 epochs.The training data for the proposed transfer learning approach for this study was breast ultrasound images and ground truth training label (as illustrated in Fig. 4).We used the Caffe [30] framework to implement FCN-AlexNet.We have evaluated the model using 10-fold cross validation.We train the model using stochastic gradient descent with a learning rate of 0.001, 60 epochs with a dropout rate of 33%.The number of epochs was kept at 60 as in [42] as which convergence has already happened when we performed empirical experiment.
V. RESULTS AND DISCUSSION Fig. 5 shows the results of breast lesion detection where Row 1 present an image from Dataset A, with a well-defined lesion boundary and a distinct appearance to the normal tissue (intensity values and texture).This is the best case scenario where all the detection methods identified the lesion correctly.Row 2 presents a case from Dataset B where the lesion's appearance is close to the normal tissue and the location where the lesion is close to the top.In this case, only DPM and CNNs detected the lesion correctly.The methodologies that depend on the lesion location have failed to detect the lesion.Row 3 depicts a case from Dataset A where there is a complex shadow in the image.None of the state-of-the-art methods were able to detect the lesion apart from the proposed CNNs.Finally, Row 4 shows a case where none of the methods were able to detect the lesion due to the small lesion size.
Quantitative results are presented in Table I.These are provided in terms of True Positive Fraction (TPF), False Positives per image (FPs/image) and F-measure.When training and testing on a single dataset, the Transfer Learning FCN-AlexNet out-performed other methods for lesion detection, with TPF of 0.98, FPs/image of 0.16 and F-measure of 0.91 for Dataset A; and TPF of 0.92, FPs/image of 0.17 and F-measure of 0.89 for Dataset B. It is observed that the performance of U-Net is lower than Patch-based LeNet.DPM achieved good results in TPF, with 0.80 for Dataset A and 0.79 for Dataset B and with a comparable F-measure to CNNs.Deep learning approaches and DPM achieved low FPs/image.The Multifractal Filtering [7] and RBRR [8] obtained good results for the images in Dataset A, with TPF of 0.76 and 0.75 respectively, but not for the images in Dataset B (with TPF of 0.59 and 0.60, respectively).The average FPs/image for Multifractal Filtering is lower than the RBRR.Finally, the RGI Filtering [6] showed a good performance in terms of TPF in both datasets (0.76 and 0.72) but with a high FPs/image and poor F-measure.
Methods based on image processing (RGI Filtering [6], Multifractal Filtering [7] and Rule-based Region Ranking [8]) were inconsistent and obtained poor results when dealing with images acquired from two different US systems.One explanation is that most of the approaches take the characteristics of their datasets into consideration, such as the lesion location, the influence of the speckle noise or the appearance of the lesions.These characteristics may differ in another dataset, which reduce the accuracy of the algorithms.Dataset B was acquired from a modern US system, which introduces new challenges for the existing techniques in lesion detection.These US systems acquire high-resolution images which may include other structures such as ribs, pectoral muscle or the air in the lungs making the lesion detection more difficult.Dataset A was obtained from an older US system.The nature of the images is normally of a lower resolution and with a higher noise level.For a better visualisation, the radiologist tends to place the suspected lesion at the centre of the image.Nowadays, with high quality US systems this is no longer necessary due to the fact that one image can capture larger regions of the breast.Hence, methodologies that assume that the lesion is centred in the image fail in more cases when using the modern US systems.
The techniques with better results in breast lesion detection are the machine learning and deep learning approaches, where the Transfer Learning FCN-AlexNet performed the best overall.This is due to the fact that these approaches adopt a training process, which helps the method to build a particular model of each dataset.The training stage mimics an adaptation process for different datasets.Thus, it is not as dataset-dependent as other methodologies.However, this methodology contains some drawbacks.The main drawback is the training process, which is time consuming and requires a representative set of normal images.The acquisition of these images in an ultrasonic examination is not common practice in clinical environments.
To investigate the robustness of deep learning approaches on different datasets, we conducted an experiment by combining the two datasets (A+B) -this formed a total of 356 benign lesions and 113 malignant lesions.By using the similar settings as outlined in the methodology, the results are shown in the final three rows of Table II -with (A+B).Overall, Transfer Learning FCN-AlexNet performed best for Dataset A with a slight improvement on TPF of 0.99, FPs/image of 0.16 (unchanged) and F-measure of 0.92.For Dataset B, the best TPF was 0.93, achieved by Transfer Learning FCN-AlexNet, but the overall best result was Patch-based LeNet with FPs/image of 0.09 and F-measure of 0.91.These results indicated that the supervised deep learning approaches were data-driven and the performance improved with more training data.For many deep learning applications, there is a requirement for large amounts of representative training and testing data to be collected to achieve high accuracies [43].
We have explored the possibility to train on one dataset and test on the other.When training on Dataset B and testing on Dataset A using U-Net, the result dropped to a TPR of 0.83, FP/Image of 0.08 and F-measure of 0.87.When training on Dataset A and test on Dataset B, the result was 0.70 TPR, 0.66 FP/image and 0.59 F-measure.This experiment shows that it is not ideal to train on one dataset different from the testing set.Combining the datasets provides improved training for the framework.

VI. CONCLUSION
This paper investigated the use of three deep learning approaches (Patch-based LeNet, U-Net, Transfer Learning FCN-AlexNet) and a comprehensive evaluation of the most representative lesion detection methodologies for breast ultrasound lesion detection.The performances were evaluated on two datasets in terms of TPF, FPs/image and F-measure.
Amongst the different methodologies discussed in this paper, the Transfer Learning FCN-AlexNet achieved the best results for Dataset A and the proposed Patch-based LeNet obtained the best results for Dataset B in terms of FPs/image and F-measure.DPM and deep learning methods are adaptable to the specific characteristics of any dataset, since these are machine-learning based and a particular model is constructed for each dataset.However, the limitation of such methods is that they require a training process and negative images in the experiment.For further research, it is our assertion that deep learning approaches could be adapted to other medical imaging techniques such as 3 dimensional ultrasound or elastography.
Lesion detection is the initial step of a CAD system.Hence, future work will focus on increasing the accuracy by adding more training data, extending our works to breast ultrasound lesion segmentation and classification, and evaluate the performance of the complete CAD framework.

Abstract-
Breast lesion detection using ultrasound imaging is considered an important step of computer-aided diagnosis systems.Over the past decade, researchers have demonstrated the possibilities to automate the initial lesion detection.However, the lack of a common dataset impedes research when comparing the performance of such algorithms.This paper proposes the use of deep learning approaches for breast ultrasound lesion detection and investigates three different methods: a Patch-based LeNet, a U-Net, and a transfer learning approach with a pretrained FCN-AlexNet.Their performance is compared against four state-of-the-art lesion detection algorithms (i.e., Radial Gradient Index, Multifractal Filtering, Rule-based Region Ranking, and Deformable Part Models).In addition, this paper compares and contrasts two conventional ultrasound image datasets acquired from two different ultrasound systems.Dataset A comprises 306 (60 malignant and 246 benign) images and Dataset B comprises 163 (53 malignant and 110 benign) images.To overcome the lack of public datasets in this domain, Dataset B will be made available for research purposes.The results demonstrate an overall improvement by the deep learning approaches when assessed on both datasets in terms of True Positive Fraction, False Positives per image, and F-measure.Index Terms-Breast cancer, convolutional neural networks, lesion detection, transfer learning, ultrasound imaging.

Fig. 1 .
Fig. 1.Examples of images in Dataset A (first row) and Dataset B (second row).(a) shows an example of cyst images, (b) images with fibroadenoma lesion and (c) examples of invasive ductal carcinoma.

Fig. 2 .
Fig. 2. Dataset feature comparison.Box plot chart comparing (a) the lesion size, (b) the ratio between the area of the lesion and the area of the image and (c) the distance from the image centre to the lesion centroid.

Fig. 3 .
Fig. 3.The overall LeNet architecture.The numbers at the convolution and pooling layers indicate kernel size, stride (in brackets) and the total amount of neurons present at each layer.

Fig. 4 .
Fig. 4.An example input image for training the network, with the training label used for U-Net and FCN-AlexNet.

Fig. 5 .
Fig. 5. Examples cases from Dataset A and B to illustrate the performance of the lesion detection algorithms.The rectangle indicates the ground truth and the crosses are the detected abnormality.The first row (image from Dataset A) shows an easy case where all methods detected the lesion.The second row (image from Dataset B) illustrate a case where the lesion is located close to the top and only DPM, Patch-based LeNet and U-Net detected the lesion.The third row (image from Dataset A) shows an image with complex shadow and only the proposed deep learning approaches detected the lesion.The fourth row (image from Dataset B) shows an image with a very small region where none of the methods detect the lesion, and only the FCN-AlexNet has no false positive.

TABLE I COMPARISON
OF PERFORMANCE FOR DIFFERENT METHODS WHEN TRAINING AND TESTING ON SINGLE DATASET

TABLE II COMPARISON
OF THE PERFORMANCE OF THE PROPOSED DEEP LEARNING APPROACHES ON THE COMBINED DATASET