Image Processing for Automatic Cell Nucleus Segmentation Using Super pixel and Clustering Methods on Histopathological Images

On a daily basis, it appears that the number of cancer cases and cancer-related deaths are increasing. Early detection and treatment of the malignant region are critical for successful treatment. Early detection of sick cells is made possible with the use of computer-assisted programmes, which are then diagnosed by experienced pathologists due to their efforts. Using computer-aided programmes, this research found that cell nuclei could be automatically detected in high-resolution histopathological images using global segmentation methods such as k-Means and Fuzzy C Means and algorithms from superpixel segmentation methods such as SLIC, Quick-shift, Felzenszwalb, Watershed, and ERS. Using high-quality histopathology pictures, the researchers discovered that the k-means and FCM algorithms performed significantly better than the baseline techniques in the study. In terms of precision, the Quickshift and SLIC approaches produced superior outcomes. The K-means and FCM algorithms perform best in the F-M test, and the true negative ratio is more successful in the Quickshift and SLIC methods than in the F-M test.


Introduction
Cancer is still the leading cause of death globally. Cancer affects people of all ages and backgrounds. In Iraq, thousands of people get cancer each year, including men and women. In Iraq country has a lower cancer incidence. Women get breast cancer at a rate of one in four, compared to one in ten men [1]. This allows for early cancer detection and treatment. Cancer awareness grows as countries progress. CT, MR, and ultrasound can now detect suspicious cells. High-resolution histopathology images aid in early cancer detection. Pathology is vital in these assessments. In Greek, pathology means disease. Pathology studies disease mechanisms and organ morphology. It studies the cellular changes that occur in infected organs. They examine cells under a microscope to diagnose diseases. A pathologist takes a biopsy of the suspect tissue. Microscope examines it after fixation and tracking. It uses expertise and consultation to diagnose illness. Sharp imaging has made skilled pathologists' jobs easier. Like the microscope, digital image processing is useful. Only a few people can use the microscope at once, preventing distant pathologists from inspecting the tissue [2].
Hundreds of pathologists can examine the tissue simultaneously [5]. These images are made by magnifying and digitizing human tissue. Medical images with composite structures and information are histopathological images.
Expert pathologists examine organs and tissues in the cell to diagnose the disease: expertise influences tissue or organ size, patient risk, and diagnosis. Pathologists can automate tedious tasks using image processing software. Pathologists have to examine cells one by one. Examination of cells one by one can be difficult and tiring for expert pathologists. By using computer-aided diagnosis (CAD) systems [3], this process, which is difficult and tiring by expert pathologists, can be carried out in software with technological devices. The computer-aided diagnosis system aims to analyze the digitized image in the computer environment and create decision support for diagnosis.
Histopathology is a branch of science that examines the microscopy of lesions occurring in the natural structure of organs, tissues and cells [4]. With the development of computer technology, fast processing and saving time have increased the importance of computer technology. Digital cameras are used to transfer medical images to digital media. Images taken with a microscope are called histopathological images.

Segmentation of Histopathological Images:
Effects of Color Spaces on Cell Segmentation: Color spaces in the literature are a model used to express colours mathematically. Colour spaces are formed to represent all colours found in nature. Colour diversity is provided by changing the three main colours in certain proportions in colour spaces [6][7].
 RGB (Red, Green, Blue) colour space: The colour space is widely used to express colours in devices such as scanners and monitors. Mixing the three basic colours in a certain ratio allows the formation of different colours. Mixing these basic colours at 100% creates white, and mixing them at 0% creates black [7].  HSV (Hue, Saturation, Value) colour space: HSV colour space consists of Hue, Saturation and Value variables. Hue is the quality that distinguishes a colour from other colours according to its type or tone, and Saturation indicates how saturated the colour is used. On the other hand, value is used to distinguish whether the colour is light or dark  XYZ colour space: XYZ colour space was designed by CIE in 1931 and constituted the basic structure of all colours. XYZ colour space is obtained by transforming the RGB colour space into a coefficient matrix. The sum rate of the impulses of the nerves transmitting the RGB colours indicates X, Y, Z colours.  LAB colour space: It was created by CIE in 1976, and it is aimed to create an easier expression of colours. The LAB colour spaces components stand for L (Luminance), a (Gradation) and b (Saturation). The most important feature of LAB colour space is its uniform distribution in human perception. For this reason, LAB colour space is used in histopathological image analysis studies.  Image Processing: Image processing is a technique for converting a physical image into a digital format that can then be processed. As a form of input, images such as photographs or video clips. A portion of the image you specify appears in the output [6]. Images are typically treated as twodimensional (2D) signals and processed using predetermined signal processing methods by the image processing system. After obtaining the digital image, the next step is preprocessing. In these processes, the acquired image is made clearer to be processed to the next stage easily and without errors. Some of these stages are;  Making the image clear  Filtering impurities in the image  Eliminating or minimizing the structural defects on the image In the field of image processing medicine, certain techniques are applied to the images obtained from devices such as CT, MR or microscope to obtain a certain region.

Review of Literature
According to Mittal, et al. (2018) [9], the segmentation technique is a multi-step procedure that includes image enhancement, core region extraction, core centroid marking, core area enhancement, and complex core separation. The proposed system also included a new performance measure as a cumulative factor.
Albayrak,et al. (2018) [8] compared various machine learning and deep learning models on the CRC colon cancer dataset in their study. Because traditional machine learning algorithms, such as convolutional neural networks, do not take a two-dimensional input, they have benefited from local binary images. It is stated in the conclusion section that the K-nearest neighbour and RF algorithms produce excellent results, while the CNN algorithm produces better results without feature extraction.
Saturi, et al. (2020) [10] used deep learning methods during the treatment phase to conduct a study and analyze it in order to predict side effects in patients diagnosed with breast cancer over the age of 65. As a result of the research, doctors will be able to decide whether or not to treat breast cancer patients over the age of 65 with chemotherapy. As a result, they were able to achieve a 74 per cent accuracy rate.
Information security is mentioned in websites and applications in Cuadros Linares et al.,(2020) [11] studies. The datasets they obtained and the original sized datasets were compared using k nearest neighbour, support vector machines, and extreme vector machines, as well as various metrics like the success rate of the models with the best performance for the test data, sensitivity false alarm rate, and so on. They claim that extreme learning machines can be easily integrated into online intrusion detection systems and used as a backup method.

Materials and Method
The Beck Laboratory at Harvard University provided the data for our investigation. High-resolution histopathological images of kidney cancer from TCGA (The Cancer Genome Atlas) data are included in this collection. Large-scale TCGA is supported by the NHGRRI and the American National Cancer Institute of the National Human Genome Research Institute. TCGA has about ten thousand comprehensive molecular studies in 25 common cancer types. With this situation, it collects the whole image from its participants within the scope of the study. It is seen that TCGA constitutes the main source for projects in computational pathology by examining the molecular, morphological and clinical features of cancer disease together [12]. Histopathological images were cropped image sections of 400×400 dimensions. A total of 810 kidney cell carcinomas were obtained, 81 of which were present on each slide. Cellular structures in the image were marked by expert pathologists, and 64 histopathological images were used. Performance comparison of images tagged by pathologists and images obtained by segmentation methods was made. Three histopathological images were obtained from the Harvard University Beck Laboratory renal cell carcinoma data set, and the reference images of these images are given in Figure   Clustering groups similar objects. Hard and fuzzy clustering are the two basic clustering methods. Hard clustering assigns data to a separate cluster for processing. This means that each data point is assigned to only one cluster when it clusters the image. Overlapping structures, noise, and low contrast reduce the method's effectiveness [11]. Soft clustering is a method used successfully in image segmentation. Fuzzy clustering algorithms are more resilient to uncertainty. The fuzzy c means method is the most widely used fuzzy clustering method [13].

k-Means Method:
The K-means method (kMM) divides N data into c clusters by minimizing the distance between the data and the cluster. MacQueen proposed it in 1967. The unsupervised algorithm is the simplest. It has two stages. First, 'c' cluster centres are chosen at random. Next, each data centre is assigned to the nearest cluster. The cluster distance is calculated using Euclidean and Manhattan calculations. Repetition of the process until new cluster centres are found and reach the minimum level [14].
K -Steps performed in the means algorithm;  Determination of the number of clusters (c)  Determination of the maximum value (ε) for the stopping condition  Cluster centre is randomly determined  Euclidean distance is calculated separately for each data  The calculated data is assigned to the cluster closest to it.  If the condition is not met, the process continues. If so, the clustering process is terminated. In the k-means clustering algorithm, the computation time increases as the number of data and clusters increases [14].
in Equation 1 is the square distance between the nearest cluster centres located between the cluster centres. In other words, it is the cluster centre of . The value aims to reduce the error by placing a new cluster centre at the location. is the cluster centres obtained so far (Likas et al., 2003). In the k-means clustering algorithm, the computation time increases as the number of data and clusters increase.

Fuzzy C-Means Method (FCM)
Dunn proposed the FCM algorithm in 1972, and Bezdek developed it. Granath has reported several FCM algorithm variants [15]. Used in many image processing algorithms. It is widely used because it is not sensitive to noise. The Fuzzy C-Means algorithm assigns data to multiple clusters. A membership degree allows more than one cluster to be owned. The degree of membership is determined by taking values between 0 and 1. The algorithm creates a membership matrix (U0) with random values. The stopping conditions are the threshold value (c) and the number of clusters (c). The fuzzy parameter m is determined. In the fuzzy c Means method, the cluster centre is calculated as in equation 3. is the membership value of the pixel in the set The membership matrix is calculated as in equation 4 using +1 The process continues until the condition is satisfied. After the transaction is fulfilled, the centre returns to the matrix.
The steps performed in the Fuzzy C-Means algorithm;  The number of clusters (c) and the fuzzy parameter value (m) are determined  The threshold value (ε), which is the stopping condition, is determined  Calculate the distance between cluster centres and pixels  Membership centre value is calculated  Suppose the difference between the membership matrix and the calculated membership matrix as a result of the calculations is higher than the first determined value. In that case, it is repeated from the 3rd step.

Superpixel Method for Segmentation of Histopathological Images
Detecting cancerous cells and structures in histopathological images requires accurate segmentation. Experts need to segment correctly to detect and diagnose.
This study evaluated the superpixel method's success in segmenting high-resolution histopathological images [16].
In high-resolution image segmentation, each pixel is semantically labelled. Each pixel's texture is grouped by properties like colour and density. Superpixels are these groups. Superpixels are meaningful superpixels in histopathological images. Thus, complex processes are avoided and reduced processing time.

Simple Linear Iterative Clustering (SLIC) Method
SLIC Achanta et al. SLIC's superpixel segmentation algorithm uses k-means to cluster pixels based on colour and coordinate information [17]. Superpixels provide computation speed, memory savings, and ease of use when preprocessed with SLIC.
The SLIC-Segnet algorithm has two parts: encoder and decoder (decoder). Encoder classification uses the vgg16 algorithm. The decoder increases the image size to match the input image. The operation performed by the segmentation method for each pixel is given in equation 6.

In Equation 7, it is stated how the coordinate information
of each pixel is calculated. The value specified in the equation is the pixel accepted as the center, and the value is the value to be clustered.
It is the realization of the assignment of the relevant pixel calculated in Equation 8 to a centre.
The maximum expected spatial distance within a given cluster should correspond to the sampling interval ( ). SLIC superpixels correspond to clusters on the lab colour image plane. Determining the maximum colour distance ( ) will not be easy, as colour distances will vary significantly from image to image and from cluster to cluster. This situation can be avoided by fixing to a fixed value of m.
The m value in Equation 9 also enables the determination of the relative importance between colour similarity and spatial proximity [17].

Quick Shift Segmentation Method
Quickshift is a well-known superpixel segmentation method. It is an iterative algorithm that identifies data in a set of data points. Any field can use Quickshift. It distinguishes modes when all data points are connected. Various data clusters are separated [18]. This method creates superpixels by changing the fast transition parameters based on recognition. Neighbors are considered within a spatial distance with a Gaussian kernel. Then, a tree is formed connecting each image pixel to the nearest neighbor with a higher intensity value, as shown in the equation.
N is data, and k(x) is the kernel function with a Gaussian window. The expression D( , ) is the distance between and . Using the Euclidean distance in the spatial and colour space, the distances are calculated by the equation in 11.
is the weighting parameter in the equation, and the smaller the weighting parameter, the more important the spatial area [17][18].

Felzenszwalb Segmentation Method (FSM)
Felzenszwalb is a local variation graph algorithm. Because the algorithm has no density constraints, it typically produces skewed regions [19]. The algorithm is fast and works almost linearly with time. The segmentation method can ignore detail in high-variability images while retaining detail in low-variability images. The input of a graph-based algorithm; It is a graph G=(V, E) with n vertices and m sides.
Its output is the division of =( 1, 2, … ) components. is expressed as the internal difference of the component, the maximum weight that covers the component minimum, Well; 1, the difference between the components 2 and V are defined as the minimum weight edge connecting the two components.

Watershed Segmentation Method
With Watershed, the image gradient is calculated with good adherence to the object boundaries and allows control over the number and spatial arrangement of the resulting image (Machairas et al., 2014). The algorithm requires a grayscale gradient image to express the borders. It interprets the image's bright pixels as a hilly landscape. A new image segment is created from the high hills and floodplains. Lalitha et al.(2016) [20] A grayscale image is G= (D,E,F). Only critical points in a connected domain D form functions on D if F is a continuous space C(D). The distance between the points is calculated as in the following equation

ERS Segmentation Method
It uses a graphics-based approach to find compact and homogeneous superpixels, unlike other well-known superpixels like SLIC. The ERS has two parts: entropy and balancing. These are homogeneous clusters, and the balancing term provides similar-sized clusters [21]. The number of clusters (k) and weight factor (wij) are adjusted in the ERS algorithm. The original subset (A) has superpixel segmentation [14]. Entropy measures the uncertainty of a random variable (H). The probability mass function and the entropy of a discrete random variable X are calculated as in the equation

Metrics Used to Evaluate Segmentation Performance
In the experiments, some measurements were used to determine operational performance. These; True Positive Ratio, True Negative Ratio, Precision, precision, Overlap Ratio, False Positive Ratio, False Negative Ratio are the measurements used to evaluate FM segmentation performance [13]   True Positive Ratio-(TPR) is given in equation 2.1, which expresses the ratio of positively labelled pixels to all pixels of the kernel, according to their belonging to the kernel.
True Negative Ratio-TNR (True Negative Ratio-TNR) expresses the ratio of pixels that do not belong to the core but that is marked as negative to all non-core pixels. It is given in

Segmentation Results
In the study, 64 high-resolution histopathological images of renal cell carcinoma obtained at Harvard University Beck Laboratory were performed using seven different segmentation methods. The images obtained as a result of segmentation were evaluated with the performance criteria using the reference image. Some preprocessing steps were applied to the histopathological images and then segmentation processes were applied. After preprocessing, global methods such as kMeans, Fuzzy C Means, superpixel methods, SLIC, Quick Shift, Felzenswalb, Watershed and ERS segmentation methods were used [22]. In the structure of histopathological images, there are sections such as cytoplasm, cell wall, adipose tissue and nucleus. For this reason, cluster centres such as 3,4 and 5 were selected in the k Means, and Fuzzy C Means methods, which are global segmentation methods.

Segmentation with K Means method
The K Means segmentation method used clusters 3,4,5 and 6 for segmentation. When the number of clusters was 2, 3, or 4, the nuclear structure could be segmented as nuclei, but it could not when the number of clusters was 4. It has been found that 5 clusters perform better segmentation.

Segmentation with Fuzzy C Means method
The Fuzzy C Means segmentation method used clusters 3,4,5,6 for segmentation. When the number of clusters was 2 or 3, cell walls were perceived as nuclei, but when the number of clusters was 5, the nuclear structure could not be segmented as nuclei. It has been found that 4 clusters perform better segmentation.

segmentation with the SLIC method
In the SLIC segmentation method, it is seen that it gives better results when divided into 3000-pixel groups. It has been seen that as the number of pixels and compact value increase in the parameters, it gives better results. It has been observed that the performance remains constant after a certain value. The parameters used and their results are shown in Figure 3  The quick shift segmentation method principle is the algorithm that identifies the data in a series of data points and does it iteratively. The segmentation method has two main parameters. The performance value according to these two parameters is given in Figure 3

segmentation with the Quick Shift method
The quick shift segmentation method principle is the algorithm that identifies the data in a series of data points and does it iteratively. The segmentation method has two main parameters. The performance value according to these two parameters is given in Figure 3

Segmentation with the Felzenszwalb method
The actual size and number of segments in the Felzenszwalb algorithm can vary greatly depending on local contrast. The algorithm scale parameter has a single parameter that affects the segment size. The performances obtained with the parameters used are given in figure.3.5.

segmentation with the Watershed method
Two different parameters, markers and compactness, were used in the Watershed segmentation algorithm. It is seen that the performance value increases as the markers value increases. However, it seems to remain constant after a certain value. In this study, apart from the methods used in the literature, Felzenszwalb and Watershed methods were used. It is seen that the true positive rate gives better results in K-Means and Fuzzy C-Means segmentation methods.

Conclusions and Recommendations
Conclusions: Automatic detection of unhealthy cellular structures on high-resolution histopathological images is of great importance for cancer detection. Different segmentation methods of 64 histopathological images were used. Some parameters were used to evaluate the results obtained. This study evaluated the performance performances of segmentation methods for automatic cell detection on high-resolution histopathological images. Performances of k-Means, Fuzzy C Means and superpixel segmentation algorithms (SLIC, Watershed, Quickshift, Felzenszwalb, ERS) were obtained on 64 high-resolution histopathological images.  .1 shows that seven segmentation methods were employed in the study. According to this study, the k-Means and FCM algorithms perform better in high-resolution histopathological images. In terms of precision, the Quickshift and SLIC methods performed better. Algorithms using k-Means and FCM provide the best results for the F-M, while Quickshift and SLIC methods do better with the true negative ratio (TNR). Since the cells marked by the pathologists are not completely marked, their performance decreases. The performance graph of the segmentation algorithms used in the graph is shown in Figure 4.1.

Recommendations:
Cancer is one of the most common diseases today. Segmentation methods increase the chance of early detection of cancerous cells day by day. With the use of segmentation methods, the accuracy of the diagnostic method and the determination of the appropriate treatment method can be ensured. It is thought that this study will lead to further research for the diagnosis, diagnosis and treatment of diseases with segmentation methods. By improving the methods and techniques used, the detection of diseased cells can be concluded more successfully.