Kernel-based spectral color image segmentation

Hongyu Li; Vladimir Bochko; Timo Jaaskelainen; Jussi Parkkinen; I-fan Shen

doi:10.1364/JOSAA.25.002805

1. INTRODUCTION

Spectral color image segmentation has drawn considerable attention [1, 2, 3, 4, 5, 6] in recent years due to its potential applications in forest assessment, mineral exploration, medical imaging, and so on. The main advantage of spectral color images comprising more than 20 spectral bands is the large amount of color information involved, which dramatically improves the ability to detect individual materials or separate areas with visually different colors. However, almost all of the existing methods were developed for special purposes or had limitations in their practical applications. And most of these methods viewed each pixel in a spectral color image as a multidimensional vector and did not consider characteristics of spectra such as geometric features. To address these problems, therefore, a kernel-based approach to spectral color image segmentation is proposed in this paper.

In spectral spaces, each color is represented by a spectral curve; thus, pixel values in spectral color images generated by spectral imaging sensors correspond to a vector whose entries describe the energy value in some wave band. Although spectral curves of an image are distinct, homogeneous regions are, in general, characterized by similar spectra. In other words, the wealth of spectral information available in color spectra provides critical cues for distinguishing colors and regions. To make full use of the abundant color information contained in spectral color images, we take into consideration the geometric features as well as the magnitude of spectra in calculating the similarity between pixels. In this work we propose using the slope and curvature of spectral curves to describe geometric features efficiently. Slope and curvature’s incorporation into similarity measures via spectrum extension improves the quality of image segmentation.

In this paper we present a spectral kernel clustering method for multiple clusters and apply it to spectral color image segmentation. (Please note that the word “spectral” appears twice in this sentence and has completely different meanings in each case: The former is a mathematical concept and represents the eigenvalues of a matrix; the latter is a physical concept and is used to describe the distribution of energy emitted by a radiant source.) Spectral methods for image segmentation are based on eigendecomposition of an affinity matrix. Weiss stated in [7] that the affinity matrix contains information about segmentation from visual inspection and pointed out that, by using the top eigenvectors of such a matrix simultaneously, one could extract a representation that leads trivially to a discrete segmentation. In [8], Ng et al. proposed a spectral clustering approach and gave conditions under which eigenvectors could be used to compute a reasonable cluster. Yu and Shi have provided in [9] a discretization method to find group indicators from obtained eigenvectors. Cristianini et al. [10] generalized spectral methods from affinity matrices to kernel matrices and proposed two cost functions to measure the quality of the bipartitioning of a data set.

Our approach draws inspiration from [10] and extends the kernel-based clustering method for two clusters to an arbitrary number of clusters. The key to spectral kernel clustering is to extract a feature space spanned by several leading eigenvectors of a normalized kernel matrix. Due to the huge amount of data in spectral color images, constructing and solving a kernel matrix become difficult, even impossible, on common computers. To save computational and storage costs, we introduce the multiscale strategy combined with the Nyström method [11]. This strategy first downsamples spectral color images and builds a hierarchical structure, such as a pyramid. Eigenvectors are computed on the top scale of the pyramid using the Nyström method and then are propagated to finer scales, based on the similarity between neighboring pixels.

In our experiments, which are described in more detail later in this paper, we used four kernel functions–linear, polynomial, Gaussian, and ANOVA (analysis of variance) [12]—and employed confusion matrices to evaluate the segmentation performance of the proposed method. We also compared our method with several state-of-the-art methods. Experiments showed that the combination of the multiscale and Nyström methods can prevent boundaries from deteriorating between homogeneous regions during downsampling. Thus, our approach has been shown to be effective in providing good segmentation while speeding up the estimation of eigenvectors of large matrices. In addition, the proposed method for spectral color image segmentation is independent of any illuminant and sensitive to changes in color arising from hue or chroma.

The remainder of this paper is organized as follows: In Section 2, spectral color image preprocessing is presented. Section 3 describes spectral kernel clustering. The multiscale strategy is discussed in detail in Section 4. In Section 5, we evaluate and analyze the segmentation performance of the proposed method and compare our method with several state-of-the-art methods. This paper ends with conclusions in Section 6.

2. SPECTRAL COLOR IMAGE PREPROCESSING

As shown in Fig. 1 , the proposed approach to spectral color image segmentation can be divided into three stages: preprocessing, multiscale strategy, and discretization. Given a spectral color image, we first implement image preprocessing through smoothing, normalizing, and spectrum extension. In spectrum extension, we concatenate magnitude, slope, and curvature of spectra and form a fused space in which the similarity between color spectra is computed. Then our multiscale strategy is used to fast solve eigenvectors of the normalized kernel matrix. The image is downsampled in the spatial domain to produce an image pyramid. On the top scale of the pyramid, the eigenvectors are approximated with the Nyström method and are subsequently propagated into the other scales of the pyramid. The procedure of propagating from coarse to fine scales is called scale extension. Finally, the segmentation result could be obtained by discretizing the eigenvectors on the original scale. The discretization method we adopt was proposed by Yu and Shi in [9], which uses an alternating optimization procedure.

In the next sections we discuss the stages of our approach in detail, beginning with spectral color image preprocessing. The notation we use is presented in Table 1 .

2A. Spectral Color Images

A ρ-band spectral color image is a two-dimensional grid $(μ \times ν)$ of pixels, with a ρ-dimensional vector for each pixel, where μ and ν denote the height and width, respectively, of an image. Therefore, a spectral color image can be viewed as a set of spectra Θ composed of ι $(ι = μ ν)$ data with ρ spectral bands. One of the important characteristics of spectral color images is that they contain a large number of spectral channels in the optical wavelength range. The number of channels can vary from tens to hundreds. Compared to color images in the RGB space, spectral color images have a clear advantage in terms of describing color objects.

A spectral color image, called “colorchecker,” has been considered as an example. This image was obtained by the Color Group of the University of Joensuu [13] and was used for precise color balance when shooting color film. The image has 81 spectral bands covering a spectral region from $380 nm to 780 nm$ in $5 nm$ steps. In each spectral band, there actually is a component image, and all the component images constitute this spectral color image. For the purposes of studying spectral colors, the colorful region of $141 \times 291 pixels$ has been extracted from the original colorchecker image, representing an array of 18 printed color squares. The used colorchecker spectral color image is displayed in panel (a) of Fig. 2 , and the RGB image reproduced from this spectral color image is shown in panel (b). From the colorchecker image, we selected four points, 1, 2, 3, and 4. Points 1 and 2 were from a homogeneous region (i.e., the region with similar colors); points 3 and 4 were located in another homogeneous region. It can be found from Fig. 2c that the spectral curves of points 1 and 2 are quite close and similar and readily distinguishable from those of points 3 and 4. Therefore, it reveals that the geometric shape of spectra provides a significant hint that clusters exist.

2B. Smoothing and Normalizing

To remove noise arising during spectral color image acquisition, it is necessary to do smoothing before segmentation. In our work, smoothing has two aspects: smoothing of spectral curves via a cubic spline curve and reducing spots or small regions while preserving edges in each component image via median filtering.

It is known that spectral curves incorporate three types of information: hue, light, and chroma. To cluster different colors in a correct way, normalization is necessary. Normalization could reduce the effect of illumination that does not provide useful information for color clustering. The first step of normalizing spectra is to do centering by subtracting the mean of the spectral data from each spectral value. Subsequently, the norm of the spectral data is made equal to 1 in the Euclidean sense by dividing each spectral value by the square root of the sum of squares of the spectral data.

For instance, given a set of spectra Θ composed of ι data with ρ spectral channels,

ϴ = (\begin{matrix} θ_{1} \\ ⋮ \\ θ_{i} \\ ⋮ \\ θ_{ι} \end{matrix}) = (\begin{matrix} θ_{1}^{1} & \dots & θ_{1}^{j} & \dots & θ_{1}^{ρ} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ θ_{i}^{1} & \dots & θ_{i}^{j} & \dots & θ_{i}^{ρ} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ θ_{ι}^{1} & \dots & θ_{ι}^{j} & \dots & θ_{ι}^{ρ} \end{matrix}),

the normalization is done first by

θ_{i}^{j} = θ_{i}^{j} - \frac{1}{ρ} \sum_{k = 1}^{ρ} θ_{i}^{k},

then

θ_{i}^{j} = \frac{θ_{i}^{j}}{\sqrt{\sum_{k = 1}^{ρ} {(θ_{i}^{k})}^{2}}} .

In Fig. 3 , we present an example that shows the smoothing and normalizing of spectra. The four color spectra in Fig. 3a were from the Munsell data set, which is used to evaluate the perceptual qualities of color spaces. The system of color notation (refer to online version for color) identifies colors in terms of three attributes: hue, value, and chroma. In this example, the notation of the black and yellow curves were $2.5 R$ $6 ∕ 2$ and $2.5 R$ $9 ∕ 2$ , respectively; yet those of the blue and cyan curves were $2.5 G$ $6 ∕ 2$ and $2.5 G$ $9 ∕ 2$ . It is clear that the black curve was much closer to the yellow curve than the blue curve after normalizing [Fig. 3c], but it was totally opposite before normalizing [Fig. 3a]. When clustering these four colors, we usually group the black curve with the yellow curve rather than the blue curve, since the black and yellow curves have the same hue $(2.5 R)$ and chroma (2). So we can conclude that the normalization of spectral data is helpful for color clustering.

2C. Geometric Description

In this work, we described the geometric shape of color spectra by the slope Φ and curvature Ψ of spectral curves that can be computed with finite-difference methods. Since curve smoothing is done at the preprocessing stage, we can efficiently use finite difference methods to approximate the slope and curvature of spectral curves. In particular, given a color spectra $θ_{i}$ , we compute slope $ϕ_{i}$ and curvature $ψ_{i}$ at position j as follows:

ϕ_{i}^{j} = (θ_{i}^{j + ϵ} - θ_{i}^{j - ϵ}) ∕ 2 ϵ,

ψ_{i}^{j} = (θ_{i}^{j + ϵ} + θ_{i}^{j - ϵ} - 2 θ_{i}^{j}) ∕ ϵ^{2},

where ϵ represents step length and is generally assigned to be 1 for simplicity. Since formulas (4, 5) cannot work at the beginning and end of spectral curves, the obtained data

ϕ_{i}

in the slope space Φ and

ψ_{i}

in the curvature space Ψ would be vectors of length

ρ - 2

. To combine slope and curvature with magnitude, we need to first normalize

ϕ_{i}

and

ψ_{i}

like

θ_{i}

. Figure 4 shows the slope and curvature of spectra of the colorchecker image at a wavelength of

625 nm

. Obviously, both the slope and curvature of color spectra contain significant information regarding clustering.

We still used the spectra shown in Fig. 3b as an example to illustrate the significance of geometric shape in spectral data clustering. If only magnitudes of spectra were taken into account, the blue and black curves were quite close in the Euclidean sense, although they looked completely different. However, if we took into consideration only the slope information of spectral curves, the blue curve was much closer to the cyan curve, rather than to the black curve. In addition, experimental results have shown the usefulness of slope and curvature in improving the performance of spectral color image segmentation.

2D. Spectrum Extension

For the analysis of spectral color images, we need to incorporate not only the magnitude but also the geometric features of spectra into the similarity measure, because the geometric shape of spectral curves contains significant cues about clustering as well. The natural way of combining the magnitude, slope, and curvature of spectra is to directly concatenate them after weighting. As a result, we get a fused space $Δ = {δ_{i}}_{i = 1}^{ι}$ . The combination procedure is called spectrum extension. We use weight coefficients, α, β, and γ, such that $0 ⩽ α$ , β, $γ ⩽ 1$ and $α + β + γ = 1$ , as impact factors of these three attributes to adjust their contribution to the similarity measure. That is, given a color spectra $θ_{i}$ , we can get a point $δ_{i}$ in the fused space Δ after spectrum extension:

δ_{i} = (α θ_{i}, β ϕ_{i}, γ ψ_{i}) = (α θ_{i}^{1}, \dots, α θ_{i}^{j}, \dots, α θ_{i}^{ρ}, β ϕ_{i}^{1}, \dots, β ϕ_{i}^{j}, \dots, β ϕ_{i}^{ρ - 2}, γ ψ_{i}^{1}, \dots, γ ψ_{i}^{j}, \dots, γ ψ_{i}^{ρ - 2}) .

The dimensionality of the fused space Δ is

3 ρ - 4

, the sum of dimensionalities of the magnitude, slope, and curvature spaces. The similarity

k_{i j}

between a pair of color spectra is computed in the fused space Δ and, therefore, is written as

k (δ_{i}, δ_{j})

. Due to the fact that

α + β + γ = 1

, we actually only need to preset two parameters. In this work, α and β were used as free parameters. In Subsection 5B, we give some recommendations on the choice of these two parameters.

3. SPECTRAL KERNEL CLUSTERING

The proposed spectral kernel clustering method draws inspiration from [10] and extends the kernel-based clustering method for two clusters to an arbitrary number η of clusters. Since a $ι \times ι$ kernel matrix K can represent the similarity between ι elements (data points or pixels), we can use K to predict whether two elements are in the same cluster. Let E denote the set of all elements to be grouped, $S = {(S_{t})}_{t = 1}^{η} η$ disjoint clusters, such that $\cup_{t} S_{t} = E$ , where t represents the index of a cluster. Spectral kernel clustering can be treated as a problem of minimizing the intercluster similarity by virtue of K, wherein the cost function with respect to partition S is defined as

C (S) = \sum_{t = 1}^{η} \sum_{i ∊ S_{t}, j ∊ E \ S_{t}} k_{i j},

where

k_{i j}

denotes the entry of K at the intersection of the

i th

row and

j th

column.

Let $Y ∊ R^{ι \times η}$ be the indicator matrix, each column of which, i.e., $y_{t} ∊ {0, 1}^{ι}$ , has nonzero components exactly at points that belong to the $t th$ cluster. Since knowledge of Y is equivalent to knowledge of S, we will use these two formulations interchangeably when referring to partitions. Then we have

C (Y) = C (S) = \sum_{t = 1}^{η} \sum_{i ∊ S_{t}, j ∊ E \ S_{t}} k_{i j} = \sum_{t = 1}^{η} (\sum_{i, j ∊ E} k_{i j} - \sum_{i, j ∊ S_{t}} k_{i j}) = \sum_{i j ∊ E} k_{i j} - \sum_{t = 1}^{η} \sum_{i, j ∊ S_{t}} k_{i j} = tr Y^{T} D Y - tr Y^{T} K Y,

where

D = diag (d_{1}, d_{2}, \dots, d_{ι})

with

d_{i} = \sum_{j = 1}^{ι} k_{i j}

and tr denotes the trace of a matrix. The classical relaxation of minimizing cost function (8) leads to an eigendecompositon problem. Let

Z = D^{1 ∕ 2} Y

and impose a constraint on

Z : Z^{T} Z = I_{η}

, where

I_{η}

denotes the

η \times η

identity matrix. Then the cost function

C (Y)

can be rewritten as

C (Y) = tr Y^{T} D Y - tr Y^{T} K Y = tr Z^{T} Z - tr Z^{T} D^{- 1 ∕ 2} K D^{- 1 ∕ 2} Z = tr I_{η} - tr Z^{T} D^{- 1 ∕ 2} K D^{- 1 ∕ 2} Z = η - tr Z^{T} D^{- 1 ∕ 2} K D^{- 1 ∕ 2} Z .

Let

P = D^{- 1 ∕ 2} K D^{- 1 ∕ 2},

which is essentially the normalization of kernel matrix K. Then minimizing cost function (7) is equivalent to maximizing

tr Z^{T} P Z

. Relaxing Z into the continuous domain turns the discrete problem into a tractable continuous optimization problem whose solutions involve the first η largest eigenvectors V of P [14].

To find the indicator matrix, we need to discretize the continuous solution of $Y = D^{- 1 ∕ 2} Z$ , where the rows of Y are those of Z scaled by the inverse square root of D, since $D^{- 1 ∕ 2}$ is diagonal. Here we adopted the discretization method that Yu and Shi proposed in [9]. Next we will focus on approximating the eigenvectors of P rapidly when applying the spectral clustering method to spectral color image segmentation.

4. MULTISCALE STRATEGY FOR SPECTRAL COLOR IMAGE SEGMENTATION

Spectral clustering methods are extremely attractive in that they are based on eigendecomposition algorithms whose stability is well understood. Unfortunately, the eigendecomposition of a full matrix presents a serious computational problem. Since P grows as the square of the number of pixels in an image, it quickly becomes infeasible to fit P in memory, let alone compute its leading eigenvectors. Some works to address this problem have recently appeared and can be divided into two categories: One is based on the Nyström method in [11, 15, 16]; the other builds on the hierarchical structure of the image pyramid [17, 18, 19]. Both of them have individual drawbacks: (1) The Nyström method is still restricted to the size of the image (e.g., when the image resolution is $300 \times 300$ , it is almost impossible to compute eigenvectors using the method presented in [11]); (2) in the coarse level of the multiscale strategy, the contours or shapes of homogeneous regions often tend to become blurry.

To overcome these drawbacks, we combine these two classes of methods in spectral color image segmentation and adopt the Nystöm method only on the top scale of the pyramid. Empirically, the contours in an image can be well preserved when this image is downsampled until the spatial resolution is around $75 \times 75$ . In this case, therefore, the number of scales h can be automatically determined in terms of the expression below:

h = \max (⌈ \log_{2} \frac{\max (μ, ν)}{75} ⌉, 1),

where the symbol ⌈⋅⌉ denotes rounding to the nearest integer toward positive infinity.

In our multiscale strategy, we first downsample the input spectral color image in the spatial domain and construct an image pyramid. The downsampling procedure is quite simple. One-time downsampling is composed of two steps. We first take a pixel every two pixels along the horizontal direction on the original image; then we do the same along the vertical direction on the image obtained after horizontal sampling. After sampling in both horizontal and vertical directions, we can get an image on a coarser scale. If the downsampling procedure is repeated $h - 1$ times, $h - 1$ different scales will be produced. Linking all these scales and the original scale together will generate an h-level image pyramid.

4A. Nyström Method

On the top of the pyramid, since normalized kernel matrix P (10) is dense, the Nyström method proposed in [11] is used to approximately solve its eigenvectors V. Let matrix P be partitioned by randomly choosing n columns and rows into four blocks:

P = [\begin{matrix} A & B \\ B^{T} & C \end{matrix}],

where A is an

n \times n

symmetric matrix, B is of size

n \times (ι - n)

, and C is an

(ι - n) \times (ι - n)

symmetric matrix. According to the Nyström method [11], C can be approximated with

B^{T} A^{- 1} B

; therefore, it is possible to approximately compute P as

\hat{P} = [\begin{matrix} A & B \\ B^{T} & B^{T} A^{- 1} B \end{matrix}] .

The quality of approximation of P can be estimated via the norm of

‖ C - B^{T} A^{- 1} B ‖

. As discussed in [11], selecting about 1% of all data to form matrix A, i.e.,

n \approx 0.01 ι

, is generally enough for reconstructing the original normalized kernel matrix P well.

Let $G = A + A^{- 1 ∕ 2} B B^{T} A^{- 1 ∕ 2}$ and diagonalize it as $G = U_{G} Λ_{G} U_{G}^{T}$ . Defining matrix V as

V = [\begin{matrix} A \\ B^{T} \end{matrix}] A^{- 1 ∕ 2} U_{G} Λ_{G}^{- 1 ∕ 2},

we can conclude that

\hat{P}

is diagonalized by V and

Λ_{G}

, i.e.,

\hat{P} = V Λ_{G} V^{T}

and

V^{T} V = I

.

In the procedure described above, computing G and V presents a challenge. It is clear that the computation complexity in these two steps is $O (ι n^{2} + n ι^{2})$ . It remains fairly prohibitive if the image size is more than $300 \times 300$ .

4B. Scale Extension

Given a set of eigenvectors in the coarse scale, we need to estimate the counterparts V in the fine scale. The procedure of propagating eigenvectors from the coarser to finer scales is called scale extension. For example, in Fig. 5 , pixels in the coarse level represented by blue circles were sampled from the fine level consisting of all circles. Now, we have already obtained the value of eigenvectors corresponding to blue points in the coarse level and want to propagate these values to those unknown points. The strategy here is to make use of the similarity between points in the neighborhood to estimate eigenvectors. More specifically, the given value $v_{p}$ at pixel p can be spread to its neighborhood according to the formula below:

v_{(p, p_{i})} = v_{p} \times w_{i} \times \frac{k (p, p_{i})}{k (p, p)},

where

k (\cdot, \cdot)

represents the similarity between two pixels and the weight values

w_{i}

are different for points in its 4-neighborhood and 8-neighborhood,

1 ∕ 2

and

1 ∕ 4

, respectively. So the value

v_{p_{i}}

at pixel

p_{i}

will be finally decided by the values of its neighboring points

N e i b (p_{i})

that appear in the coarse level:

v_{p_{i}} = \sum_{I ∊ Neib (p_{i})} v_{(I, p_{i})} .

For example, the corresponding value

v_{p 1}

to pixel

p_{1}

should be

v_{p_{1}} = \sum_{I ∊ {p, r}} v_{(I, p_{1})} = \frac{1}{2} v_{p} \times \frac{k (p, p_{1})}{k (p, p)} + \frac{1}{2} v_{r} \times \frac{k (r, p_{1})}{k (r, r)},

and for pixel

p_{5}

, the value was,

v_{p_{5}} = \sum_{I ∊ {p, q, r, s}} v_{(I, p_{5})} = \frac{1}{4} v_{p} \times \frac{k (p, p_{5})}{k (p, p)} + \frac{1}{4} v_{r} \times \frac{k (r, p_{5})}{k (r, r)} + \frac{1}{4} v_{q} \times \frac{k (q, p_{5})}{k (q, q)} + \frac{1}{4} v_{s} \times \frac{k (s, p_{5})}{k (s, s)} .

After propagation, approximate eigenvectors $V^{0}$ in the fine level could be obtained. But they are only roughly estimated and need to be refined by using the subspace iteration technique. Let columns $v_{1}^{0}, v_{2}^{0}, \dots, v_{η}^{0}$ of $V^{0}$ be η starting vectors. Applying power iteration to each of these vectors simultaneously, one can obtain the following algorithm (called subspace iteration [20]):

V^{0} = v_{1}^{0}, v_{2}^{0}, \dots, v_{η}^{0},

where t denotes the number of iterations. The only requirement of convergence for subspace iteration is that the initial estimate has a nonzero component in each direction in the subspace

V^{i}

. This obviously holds in this strategy, as orthogonal eigenvectors in the coarse level have nonzero components in each direction. Besides, since

V^{0}

are already the approximate eigenvectors of P, subspace iteration will converge very quickly in our approach (usually

t = 20

is enough for convergence).

4C. Segmentation Algorithm

Given a spectral color image of size $μ \times ν \times ρ$ , the number of segments η and weight coefficients α and β,

1. Implement spectral color image preprocessing by smoothing, normalizing [Eqs. (2, 3)] extending spectra [Eq. (6)], and generating the fused space Δ by using α and β.

2. Compute the number of scales h using Eq. (11) and downsample the image in the spatial domain to form an image pyramid with h scales.

3. Construct and normalize kernel matrix $K_{i}$ using Eq. (10) in Δ. Here we use $K_{i}$ $(i = 1, \dots, h)$ to denote a similarity matrix on the $i th$ scale and likewise for $D_{i}$ , $P_{i}$ , and $V_{i}$ .

4. Compute the first η largest eigenvectors $V_{i}$ of matrix $P_{i}$ using the Nyström method [Eq. (14)] on the top of the pyramid (i.e., $i = h$ ).

5. Extend current eigenvectors $V_{i}$ to $V_{i - 1}$ on the $(i - 1) th$ scale according to the similarity between neighboring pixels and refine $V_{i - 1}$ with $P_{i - 1}$ using subspace iteration [Eq. (19)] until eigenvectors $V_{1}$ on the original scale are obtained.

6. Discretize $D^{- 1 ∕ 2} V_{1}$ by using the discretization method proposed in [9] to get the segmentation indicator matrix Y.

The schematic illustration of the multiscale strategy for spectral color image segmentation is shown in Fig. 6 . It is worth noting that matrices K and P, except on the top level where P is approximated by $\hat{P}$ , are quite sparse, since only the similarity between pixels in the 8-neighborhood needs to be worked out.

5. EXPERIMENTS

This section presents the results of experiments conducted to evaluate and analyze the performance of the proposed method for spectral color image segmentation. The effects of kernel functions and parameter choice are first presented, followed by an analysis of the correctness of our method and a comparison with several state-of-the-art methods. Finally, we investigate the ability of the proposed method to produce reliable segmentations on spectral color images with a variety of properties.

All the spectral color images we use are available in [13]. We employ two methods to visualize segmentation results: The former attempts to display different segments by overlaying transparent pseudocolors over the corresponding color image (as in Figs. 7, 8), where each segment is represented by one color; the latter attempts to highlight boundaries between segments by imposing boundaries on the color image reproduced from the input spectral color image (as in Fig. 9, below).

5A. Effect of Kernel Functions

Since the colorchecker spectral color image with up to 19 potential segments (Fig. 2) contains affluent color information, we are going to use it as a sample to study the effect of different kernels and parameters on segmentation performance.

In this case, four kernels—linear, polynomial, Gaussian, and ANOVA (see [12] for details)—are tested. The first three are respectively defined over the fused space Δ as follows:

k (δ_{i}, δ_{j}) = ⟨ δ_{i}, δ_{j} ⟩,

k (δ_{i}, δ_{j}) = {(⟨ δ_{i}, δ_{j} ⟩ + 1)}^{d},

k (δ_{i}, δ_{j}) = \exp (- {‖ δ_{i} - δ_{j} ‖}^{2} ∕ σ^{2}),

where

δ_{i}

represents a point in the fused space Δ and degree d and parameter σ are positive, real numbers. The ANOVA kernel of degree d is computed by the following recursion:

k_{0}^{m} (δ_{i}, δ_{j}) = 1, if m ⩾ 0,

k_{t}^{m} (δ_{i}, δ_{j}) = 0, if m < t,

k_{t}^{m} (δ_{i}, δ_{j}) = (δ_{i}^{m} δ_{j}^{m}) k_{t - 1}^{m - 1} (δ_{i}, δ_{j}) + k_{t}^{m - 1} (δ_{i}, δ_{j}),

where

δ_{i}^{m}

denotes the

m th

entry of vector

δ_{i}

,

1 ⩽ m ⩽ ζ

(the dimensionality of the input space), and

0 ⩽ t ⩽ d

. Finally,

k (δ_{i}, δ_{j}) = k_{d}^{ζ} (δ_{i}, δ_{j})

. For simplicity, we set degree d and parameter σ to be 2 and 0.35, respectively, in the experiments that follow.

The image size of colorchecker was $141 \times 291 \times 81$ , so we downsampled it once and generated a two-scale pyramid. The number of segments η was set to be 5, and only spectral magnitude was used here (i.e., parameters $α = 1$ and $β = 0$ ). The obtained results in Figs. 7a, 7b, 7c, 7d have shown that all kernels worked in a good and similar way except for the linear kernel, where some boundaries of color boxes were erroneous. The potential reason is that some nonlinear features about the distribution of pixels in spectral spaces cannot be correctly discovered and represented by the linear kernel, with the result that boundaries become ambiguous. However, nonlinear kernel functions provide a powerful and principled way of discovering nonlinear relations between color spectra.

It is well known that the Gaussian kernel is essentially a polynomial kernel of infinite degree [12]. The polynomial kernel can only use all monomials of degree up to d. The ANOVA kernel, however, allows more freedom in specifying the set of monomials, which is an extension of the polynomial kernel in some sense. Such intrinsic consistence among the ANOVA, polynomial, and Gaussian kernels necessarily results in their performance being similar in clustering or segmentation. Ideally, we select a kernel function and decide on parameters based on prior knowledge. Unfortunately, it is not always possible to make the correct choice of kernel in advance.

5B. Effect of Parameter Choice

In our algorithm, we have to input three parameters: η, α, and β. Next we still used the colorchecker image to study the variation of segmentation performance with these three parameters.

To different people, the ideal number of segments in this image is also different, depending on human vision. Panels (e), (f), (g), and (h) of Fig. 7 display the segmentation results containing 2, 4, 10, and 12 segments, respectively, where the polynomial kernel was adopted, $α = 1$ and $β = 0$ . It is clear that the proposed method can work well in each case. But we also note that when the number of segments gradually gets large and beyond some threshold value, segmentation tends to become a little poor. The first four largest eigenvectors of matrix P used to generate the segmentation in panel (f) are presented in panels (i)–(l), all of which contain obvious grouping cues.

To test the effect of weight coefficients, we now fix the kernel function (polynomial) and the number of segments $(η = 12)$ and simply vary weight coefficients α and β. It is known that α and β act as impact factors of three attributes (i.e., magnitude, slope, and curvature) to adjust the contribution of each attribute to the similarity measure. From the obtained segmentation results shown in the last row of Fig. 7, we can conclude that integrating these three attributes [e.g., panels (n) and (o)] usually works better than separately using any one of them [e.g., panels (m) and (p)]. When estimating parameters α and β, one should follow the expression below (called decision rule): $α ⩾ \max (β, 1 - α - β)$ . This rule is motivated by the empirical consistency of results that show that the magnitude plays the leading role in spectral color image segmentation and that slope and curvature generally serve as complementary information. In the following experiments, α and β are generally set to be 0.6 and 0.3, respectively, unless a specific assignment is given.

5C. Correctness

We define correctness as the ability of the proposed method to produce segmentations that agree with the ground truth [21]. In other words, correctness is the ability for correctly identifying color or structure in a spectral color image under a certain illumination condition. This is measured by the confusion matrix.

Two spectral color images, named “hand” and “toy0,” were used to test the correctness of the proposed method. Their corresponding color images in the RGB space are shown in panel (a) of Fig. 8 . These two images have been manually segmented in terms of color, and the ground truth is presented in panel (b). The “hand” image consists of three segments (clusters) in the ground truth: hand, black background (bbg), and white background (wbg). The “toy0” image is divided into four segments in the ground truth: dark blue $(db)$ , light blue $(lb)$ , green $(g)$ , and white $(w)$ . Since the image size of “hand” and “toy0” is $160 \times 148 \times 81$ and $448 \times 508 \times 31$ , respectively, the number of scales h should be 2 and 3, respectively, in our method. If our method was applied to these two images, the segmentation result, displayed in panel (c), was very good. Almost all regions were correctly identified, and there were very few misclassification of pixels. In the case of the “hand” spectral color image, the results of the confusion matrix are presented in Table 2 . This matrix shows that over 96% of pixels were correctly identified in each category. In the case of the “toy0” spectral color image, the results of the confusion matrix are presented in Table 3 . This matrix shows that over 97% of pixels were correctly identified in each category, excluding the w category in which only 80.27% of pixels were correctly identified.

If we used too many scales, i.e., if we overscaled, the segmentation performance would be as poor as expected, as shown in panel (c), where h was 3 and 4, respectively. It demonstrates that the combination of multiscale and Nyström methods is effective not only in speeding up the estimation of eigenvectors of large matrices but also in providing good segmentation by preventing boundaries between homogeneous regions from becoming ambiguous during downsampling.

5D. Comparison

We first compare our approach to spectral color image segmentation with some representative segmentation techniques based on clustering or discretization. Then comparisons are made with the segmentation that is executed in the RGB space. We have tried several different parameter settings in each method and show only the best result in comparison.

The k-means segmentation approach [22] segments the image by clustering in the spectral space, wherein $k = 3$ in the “hand” case and $k = 4$ in the “toy0” case. The result [Fig. 8e] was severely affected by illumination.

$PCA + k$ -means is used to execute k-means clustering in a low-dimensional space obtained with principal component analysis (PCA). Unlike the proposed method based on nonlinear kernel, PCA is only capable of discovering some linear features embedded in color spectra. Consequently, it will not improve the performance of segmentation too much, as shown in Fig. 8f.

Assuming that an image is a Gaussian mixture model (GMM), we can compute the class posterior probabilities of a GMM and get the image segmentation by discretizing the probabilities. The segmentation results for images “hand” and “toy0” are displayed in Fig. 8g. Likewise, this method is sensitive to illumination conditions. In the case of “toy0,” the $l b$ background region was divided into two parts due to the shade of the g region; however, the w region cannot be correctly detected.

To study the advantage of spectral color images over common color images in color representation, we also applied an approach to color image segmentation that Cour et al. proposed in [17] (for short, the CBS method) to corresponding color images [Fig. 8a] in the RGB space. The CBS method is based on the normalized cut graph partitioning framework of image segmentation and deals with large images by the strategy of multiple scales. The related code is available in [23]. The segmentation result is shown in Fig. 8h. It is evident that spectral color images provide better segmentation [Fig. 8c] due to the fact that they contain more color information.

5E. Stability with Respect to Image Choice

By stability we mean the ability of the proposed method to produce reliable segmentations on spectral color images with a variety of properties, which is similar to the definition in [21]. More examples of spectral color image segmentation are provided to test this ability in this subsection.

The tested spectral color images are introduced in Table 4 . They vary greatly in terms of content, image size, and spectral resolution: Some of them have clear features of highlight (e.g., pentest, braltest, and toy1) or shadow (e.g., braltest, scene, and toy2), and the others either have abundant color information (e.g., toy2, toy3, and toy4) or depict natural scenery (e.g., scene) and human body (e.g., younggirl and jussi). In these experiments, the ANOVA kernel was used, and parameters α and β were fixed to be 0.6 and 0.3, respectively; r was input beforehand as in Table 4, and h was estimated automatically during segmentation. The segmentation results are shown in Fig. 9 . Despite the variation of these images, the proposed method can work significantly well under all circumstances. This further strengthens our arguments: (1) The proposed approach to spectral color image segmentation is quite stable and is robust to illumination changes, from highlight to shadow, and (2) it is sensitive even to slight changes in color arising from hue or chroma.

Because simple eigendecomposition is done only on the top level, the proposed algorithm is efficient. Moreover, it can use massive parallel processing, especially at the finer (i.e., the most expensive) scales. Our current implementation takes less than $40 s$ . to complete all the computations for an $806 \times 650 \times 31$ spectral color image using Matlab on a Pentium 4, $3.0 GHz$ processor. Further optimization is still possible on a parallel computer.

6. CONCLUSIONS

In this work, we put forth a spectral kernel clustering method for multiple clusters that can be applied to spectral color image segmentation. We propose combining the multiscale and Nyström methods in spectral color image segmentation, which can accelerate the computation of eigenvectors of a normalized kernel matrix. We also propose slope and curvature of color spectra to incorporate geometric features of spectra into similarity measures, which can make full use of abundant color information contained in spectral color images.

We conducted experiments in which we provided an analysis regarding four kernel functions, evaluated the segmentation performance of the proposed method, and compared our method with other state-of-the-art methods. The experimental results showed that the proposed approach to spectral color image segmentation performs very well in distinguishing different colors in terms of hue and chroma.

ACKNOWLEDGMENTS

The authors thank Wenbin Chen for his useful discussions and Justus Randolph for his timely proofreading.

Table 1. Notation Used in This Paper

View Table | View all tables in this article

Table 2. Confusion Matrix (Percentage) for the “Hand” Image

View Table | View all tables in this article

Table 3. Confusion Matrix (Percentage) for the “Toy0” Image

View Table | View all tables in this article

Table 4. Tested Spectral Color Images

View Table | View all tables in this article

Fig. 1 Three-stage flowchart of spectral color image segmentation: preprocessing, multiscale strategy, and discretization.

Download Full Size | PDF

Fig. 2 Spectra contained in a spectral color image: (a) spectral color image, colorchecker, (b) corresponding color image in the RGB space, (c) spectra corresponding to marked points in panel (b).

Download Full Size | PDF

Fig. 3 Smoothing and normalizing of color spectra: (a) original spectra from the Munsell data set, (b) spectra after smoothing via cubic spline curve, (c) spectra after normalizing, (d) spectra after smoothing and normalizing.

Download Full Size | PDF

Fig. 4 Slope and curvature of the spectra of the colorchecker image at a wavelength of $625 nm$ .

Download Full Size | PDF

Fig. 5 Scale extension. The weight values are different for points in the 4-neighborhood and 8-neighborhood of the given point.

Download Full Size | PDF

Fig. 6 Multiscale segmentation of spectral color images. (Left) Downsampling input spectral color images. (Right) Upper three images represent the approximate eigenvectors, and the bottom one is the segmentation result.

Download Full Size | PDF

Fig. 7 Segmentation performance analysis. Transparent pseudocolors are laid over the color image to mark segmentation results. (a)–(d) Results using four kernels, linear, polynomial, Gaussian, and ANOVA; (e)–(h) segmentation results using the polynomial kernel when varying the number of segments; (i)–(l), top four largest eigenvectors of matrix P, which are used to generate the segmentation shown in (f), (m)–(p), results when varying weight coefficients, α and β.

Download Full Size | PDF

Fig. 8 Method comparison: (a) corresponding RGB images; (b) ground truth; (c) segmentation using our method based on appropriate scales; (d) still our method, but using too many scales; (e) directly applying k-means to spectral data; (f) applying k-means after reducing the dimensionality of spectral data with PCA; (g) applying GMM to spectral data; (h) applying the CBS method to the color images (b) in the RGB space.

Download Full Size | PDF

Fig. 9 Segmentation of spectral color images with a variety of properties. For the ease of visualization, we put the boundaries between different segments on the corresponding RGB images.

Download Full Size | PDF

1. P. Paclík, R. P. W. Duin, G. M. P. van Kempen, and R. Kohlus, “Segmentation of multispectral images using the combined classifier approach,” Image Vis. Comput. 21, 473–482 (2003). [CrossRef]

2. J. L. Crespo, R. J. Duro, and F. L. Pena, “Gaussian synapse ANNs in multi- and hyperspectral image data analysis,” IEEE Trans. Instrum. Meas. 52, 724–732 (2003). [CrossRef]

3. S. K. Pal and P. Mitra, “Multispectral image segmentation using the rough-set-initialized EM algorithm,” IEEE Trans. Geosci. Remote Sens. 40, 2495–2501 (2002). [CrossRef]

4. G. Mercier, S. Derrode, and M. Lennon, “Hyperspectral image segmentation with Markov chain model,” in IEEE International Proceedings of the Geoscience and Remote Sensing Symposium (IGARSS’03) (IEEE, 2003), Vol. 122, pp. 21–25.

5. A. Mohammad-Djafari, N. Bali, and A. Mohammadpour, “Hierarchical Markovian models for hyperspectral image segmentation,” in International Workshop on Intelligent Computing in Pattern Analysis/Systems, IWICPAS (Springer, 2006), pp. 416–424.

6. H. Kwon and N. M. Nasrabadi, “Kernel matched subspace detectors for hyperspectral target detection,” IEEE Trans. Pattern Anal. Mach. Intell. 28, 178–194 (2006). [CrossRef] [PubMed]

7. Y. Weiss, “Segmentation using eigenvectors: a unifying view,” in IEEE International Conference on Computer Vision (ICCV’99) (IEEE, 1999), pp. 975–982. [CrossRef]

8. A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: analysis and an algorithm,” in Proceedings of the Neural Information Processing Systems, NIPS’01 (MIT, 2001), Vol. 14, pp. 849–856.

9. S. X. Yu and J. Shi, “Multiclass spectral clustering,” in IEEE International Conference on Computor Vision (ICCV’03) (IEEE, 2003), pp. 313–319. [CrossRef]

10. N. Cristianini, J. Shawe-Taylor, and J. S. Kandola, “Spectral kernel methods for clustering,” in Proceedings of the Neural Information Processing Systems, NIPS’01 (MIT, 2001), pp. 649–655.

11. C. Fowlkes, S. Belongie, F. R. K. Chung, and J. Malik, “Spectral grouping using the Nyström method,” IEEE Trans. Pattern Anal. Mach. Intell. 26, 214–225 (2004). [CrossRef] [PubMed]

12. J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis (Cambridge U. Press, 2004). [CrossRef]

13. University of Joensuu Color Group, “Spectral database,” http://spectral.joensuu.fi/.

14. F. R. Bach and M. I. Jordan, “Learning spectral clustering,” in Proceedings of the Neural Information Processing Systems, NIPS’03 (MIT, 2003), http://books.nips.cc/papers/files/nips16/NIPS2003_AA39.pdf.

15. D. Achlioptas, F. McSherry, and B. Schölkopf, “Sampling techniques for kernel methods,” in Proceedings of the Neural Information Processing Systems, NIPS’01, (MIT, 2001), pp. 335–342.

16. C. K. I. Williams and M. Seeger, “Using the Nyström method to speed up kernel machines,” in Proceedings of the Neural Information Processing Systems, NIPS’00 (MIT, 2000), pp. 682–688.

17. T. Cour, F. Bénézit, and J. Shi, “Spectral segmentation with multiscale graph decomposition,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CPVR 2005 (IEEE, 2005), pp. 1124–1131.

18. E. Sharon, M. Galun, D. Sharon, R. Basri, and A. Brandt, “Hierarchy and adaptivity in segmenting visual scenes,” Nature 442, 810–813 (2006). [CrossRef] [PubMed]

19. D. Tolliver, R. T. Collins, and S. Baker, “Multilevel spectral partitioning for efficient image segmentation and tracking,” in the Seventh IEEE Workshop on Application of Computer Vision (WACA/MOTION’OE) (IEEE, 2005), Vol. 1, pp. 414–420. [CrossRef]

20. B. N. Parlett, The Symmetric Eigenvalue Problem (Prentice Hall, 1998). [CrossRef]

21. R. Unnikrishnan, C. Pantofaru, and M. Hebert, “Toward objective evaluation of image segmentation algorithms,” IEEE Trans. Pattern Anal. Mach. Intell. 29, 929–944 (2007). [CrossRef] [PubMed]

22. R. O. Duda, P. E. Hart, and D. F. Stork, Pattern Classification (Wiley, 2001).

23. T. Cour, F. Benezit, and J. Shi, “Multiscale NCut Image Segmentation Toolbar,” http://www.seas.upenn.edu/~timothee/software/ncut_multiscale/ncut_multiscale.html.

Segment	Detected hand	Detected $b b g$	Detected $w b g$
True hand	96.73	0.72	2.23
True $b b g$	0.00	99.19	0.99
True $w b g$	3.27	0.09	96.78

Segment	Detected $d b$	Detected $l b$	Detected g	Detected w
True $d b$	99.39	0.74	0.00	0.00
True $l b$	0.03	97.53	0.35	5.04
True g	0.58	1.29	99.65	14.69
True w	0.00	0.45	0.00	80.27

Image	Size	Segment $(η)$
pentest	$141 \times 153 \times 81$	5
braltest	$141 \times 131 \times 81$	5
younggirl	$147 \times 87 \times 61$	3
scene	$256 \times 256 \times 31$	4
jussi	$806 \times 650 \times 31$	3
toy1	$363 \times 330 \times 31$	3
toy2	$660 \times 330 \times 31$	3
toy3	$600 \times 564 \times 31$	4
toy4	$344 \times 576 \times 31$	4

Segment	Detected hand	Detected $b b g$	Detected $w b g$
True hand	96.73	0.72	2.23
True $b b g$	0.00	99.19	0.99
True $w b g$	3.27	0.09	96.78

Segment	Detected $d b$	Detected $l b$	Detected g	Detected w
True $d b$	99.39	0.74	0.00	0.00
True $l b$	0.03	97.53	0.35	5.04
True g	0.58	1.29	99.65	14.69
True w	0.00	0.45	0.00	80.27

Kernel-based spectral color image segmentation

Abstract

1. INTRODUCTION

2. SPECTRAL COLOR IMAGE PREPROCESSING

2A. Spectral Color Images

2B. Smoothing and Normalizing

2C. Geometric Description

2D. Spectrum Extension

3. SPECTRAL KERNEL CLUSTERING

4. MULTISCALE STRATEGY FOR SPECTRAL COLOR IMAGE SEGMENTATION

4A. Nyström Method

4B. Scale Extension

4C. Segmentation Algorithm

5. EXPERIMENTS

5A. Effect of Kernel Functions

5B. Effect of Parameter Choice

5C. Correctness

5D. Comparison

5E. Stability with Respect to Image Choice

6. CONCLUSIONS

ACKNOWLEDGMENTS

Cited By

Figures (9)

Tables (4)

Equations (28)

Journal of the Optical Society of America A

Notation	Meaning	Section Where Defined
ι	Number of data points or pixels	2A
ρ	Number of spectral bands of a spectral color image	2A
μ, ν	Spatial resolution (height, width) of a ρ-band spectral color image	2A
ϴ	Set of color spectra: $ϴ = {θ_{i}}_{i = 1}^{ι}$	2B
Φ	Slope of spectra: $Φ = {ϕ_{i}}_{i = 1}^{ι}$	2C
Ψ	Curvature of spectra: $Ψ = {ψ_{i}}_{i = 1}^{ι}$	2C
α, β, γ	Weight coefficients in the similarity measure	2D
Δ	Fused space of ϴ, Φ, and Ψ: $Δ = {δ_{i}}_{i = 1}^{ι}$	2D
η	Number of expected segments	3
$C (Y)$	Cost function with respect to the indicator matrix Y	3
K	Kernel matrix with entry $k_{i j}$	3
D	Diagonal matrix $D = diag (d_{1}, d_{2}, \dots, d_{ι})$ with $d_{i} = \sum_{j = 1}^{ι} k_{i j}$	3
P	Normalization of K: $P = D^{- 1 ∕ 2} K D^{- 1 ∕ 2}$	3
V	Eigenvectors of matrix P	3
h	Number of scales	4