Memory colours and colour quality evaluation of conventional and solid-state lamps

Kevin A. G. Smet; Wouter R. Ryckaert; Michael R. Pointer; Geert Deconinck; Peter Hanselaer

doi:10.1364/OE.18.026229

1. Introduction

The impact of the spectrum of a light source on the colour appearance of objects has been investigated for many decades. Several metrics have been proposed to evaluate and quantify the ability of a light source to render the colours of objects. Some metrics, like the CIE colour rendering index Ra [1,2], can be considered as a colour difference index, offering an objective measure of the colour rendering properties of a light source. Others, like Judd’s flattery index [3] and Thornton’s colour preference index [4], have focussed on the more subjective aspect of lighting colour quality.

An objective measure describes the shift in colour appearance with respect to an ‘optimum’ reference illuminant (a Planckian radiator or phase of daylight) and is required for many professional applications such as colour reproduction, printing and quality control. This measure may not, however, be the best answer to the needs of lighting designers, architects and those in the shop and retail sector. End users and consumers are usually more interested in the perceived colour quality of the lighting, i.e. how appealing objects look. This situation is perfectly illustrated by Neodymium incandescent lamps with a CIE R_a value of approximately 77, but “which are popularly sold for twice the price of normal incandescent lamps having a perfect score (Ra≈100)” [5,6]. Furthermore, over the past few years, it has become increasingly clear that the CIE colour rendering index correlates poorly with the visual appreciation of the light from many so-called ‘white’ LEDs [7–11].

Objective measures like the CIE Colour Rendering Index have the adoption of a reference illuminant in common. A Planckian radiator or a phase of daylight has been the obvious choice. The maximum value of the index is assigned to the reference illuminant and every deviation of the colours rendered by the test source with respect to colours rendered by the reference illuminant is penalized. This kind of objective metric does not allow for a meaningful evaluation of a light source with respect to end-user expectations. Within the CIE Technical Committee TC1-69 Colour Rendition of White Light Sources the evaluation of the colour quality of a light source is being investigated. It has become clear that in addition to an objective metric, a metric that describes the more subjective characteristics such as attractiveness and preference would be useful.

Evaluating the colour quality of light sources based on a comparison with a reference illuminant is difficult. Either one has to know which deviations should or should not be penalized or one should know which illuminant can be considered perfect. In fact, the knowledge of what objects ideally look like is required. Once this is known, the need for an illuminant as a reference becomes obsolete because reference can then be directly to the chromaticity of the ideal object.

Colours often seem ”wrong” when they are not what we expect or want them to be. Thus, it is reasonable to state that people are, consciously or subconsciously, judging the colour appearance of objects against the colours they mentally associate with those objects, either in memory or in preference. The importance of familiar objects and the colours associated with them from memory, i.e. memory colours [12], for colour perception in the natural world was first introduced by Hering who stated that we view the world through the spectacles of memory [13]. Memory colours and preferred colours have been studied in the past by many researchers [12,14–25] and have long been of interest in many different areas of colour research, such as colour reproduction [15,17,21,22,26] and colour rendering [3,4,25,27–30], because of their possible use as a reference to assess colour appearance, and hence the quality of a colour image or the colour quality of the lighting. Several decades have passed since Judd’s flattery index [3] and Thornton’s colour preference index [4], both based on the preference data obtained by Sanders [16] and Newhall [14].

There are several advantages of evaluating the colour quality of a light source by reference to memory colours. First and foremost, such evaluations are based directly on visual assessments of the colour appearance of real objects, which means that the correlation between the visual appreciation of the end-user and metric predictions should be high. Secondly, as such evaluations require no reference illuminant, it does not suffer from any of the associated difficulties. A potential drawback might be that memory colours could be different from culture to culture. However, whether a different set of memory colours results in a different overall colour quality assessment remains an open question, and will be the subject of future research.

In this paper, a memory colour based colour quality metric is defined. The colour quality of a light source is evaluated by comparison between the colour appearance of a set of familiar objects illuminated by the test source and the memory colours of the same objects as obtained by the authors in a previous study [25]. The performance of the metric is tested and validated by visual experiments conducted in two laboratories.

2. Method

2.1. Similarity distributions of familiar real objects

For the derivation of a colour quality metric based on memory colours the colour appearance of a set of nine familiar real objects, with colours distributed around the hue circle, was investigated in a series of visual experiments by Smet et al. [25]. Nine familiar objects were selected: a green apple, a banana, an orange, lavender, a smurf®, strawberry yoghurt, a sliced cucumber, a cauliflower and Caucasian skin. The objects were presented to a group of over 30 observers as illuminated by approximately one hundred different colours by placing them in a specially constructed LED illumination box. Inside the box, the objects were placed in front of a self-luminous back panel of approximately 5600 K which served as a constant adaptation white. Any clues to the colour of the illumination were masked creating the illusion that the objects themselves changed colour.

The observers were asked to rate, on a 5 point scale, the similarity of the perceived object colour to their idea of how the object looks like. The pooled colour appearance ratings were modelled in the uniform IPT colour space [31] by a modified bivariate Gaussian distribution R as described by Eq. (1) [32].

\begin{array}{l} R (P, T) = a_{1} + a_{2} \cdot S (P, T); \\ S (P, T) = \exp (- \frac{1}{2} (d^{2} (P, T))); \\ d^{2} (P, T) = {(X - X_{c})}^{T} \cdot Σ^{- 1} \cdot (X - X_{c}); \\ X = (\begin{matrix} P \\ T \end{matrix}); X_{c} = (\begin{matrix} a_{3} \\ a_{4} \end{matrix}); Σ^{- 1} = [\begin{matrix} a_{5} & a_{7} \\ a_{7} & a_{6} \end{matrix}]; \end{array}

Before determining the IPT chromaticities, corresponding object tristimulus values under illuminant D65, which is the IPT white point, were calculated using the CAT02 [33] chromatic adaptation transform. The model requires seven parameters: a ₁ and a ₂ to scale the ratings and a ₃ to a ₇ to describe the similarity distribution S(P,T). The distribution centre X _c, representing the most likely location of the memory colour, is located at (a ₃, a ₄), while the shape and orientation of the distribution is determined by the inverse of the covariance matrix Σ, described by the parameters a ₅-a ₇. The parameters a ₃-a ₇ that describe the similarity distributions S _i(P,T) in IPT colour space are given in Table 1 . An example of a rating distribution is shown in Fig. 1a . The centres and the 1d-contours of the distributions in IPT colour space are shown in Fig. 1b.

Table 1. Similarity distribution parameters for each of the ten familiar objects. Parameters a₃-a₄ represent the distribution centre or memory colour in IPT colour space, while a₅-a₇ describe the shape, size and orientation of the similarity distributions

View Table | View all tables in this article

Fig. 1 (a) The rating distribution R(P,T) and the mean observer rating (dots) for a “green apple” in IPT colour space. (b) The memory colours and the 1d-contours of the similarity distributions in IPT space of the set of familiar objects.

Download Full Size | PDF

The set of familiar objects investigated in the original paper by Smet et al. [25] has been extended by one more object, a sphere coated with grey Munsell N4 paint having a nearly flat spectral reflectance. This “neutral grey” colour was added, as it was thought to be important to take into account the neutrality of the light source colour.

Although the authors have found that the current set of 10 familiar objects was sufficiently accurate to predict the colour quality of three different sets of light sources from three different visual experiments, extending the set of familiar objects could further increase the metric’s robustness and accuracy.

2.2. A colour quality metric based on memory colours

The assessment of the colour quality of a light source can be based on the degree of similarity between the colour appearance of a set of familiar objects illuminated by the test source and the memory colours of the same objects.

First, for the entire set of familiar objects, the object chromaticity, X _i = (P _i,T _i), under the test light source is calculated in IPT colour space [31] by using the spectral reflectance of the objects and the CIE standard 10° observer. Because the white point of IPT colour space is illuminant D65, all tristimulus values under the test source should be transformed to their corresponding colours under D65 using the CAT02 chromatic adaptation transform prior to the calculation of the IPT coordinates.

Secondly, the function values of the corresponding similarity distributions S _i(X _i) are calculated with the object chromaticities X _i as input, resulting in a set of ten S _i values describing the degree of similarity with each object’s memory colour:

S_{i} (X_{i}) = e^{- \frac{1}{2} [{(X_{i} - a_{i, 3})}^{T} (\begin{matrix} a_{i, 5} & a_{i, 7} \\ a_{i, 7} & a_{i, 6} \end{matrix}) (X_{i} - a_{i, 4})]} (i = 1 .. 10)

The individual values of S _i are in the range of zero to one.

Finally, the general degree of memory colour similarity S _a is obtained by taking the geometric mean of the ten individual S _i values.

S_{a} = \sqrt[n]{\prod_{i = 1}^{n} S_{i}}

A geometric mean was chosen because it is less susceptible to outliers and it is more suitable for values that are exponential in nature, such as the function values of the similarity distributions. The S _a value ranges from zero to one.

In order to keep the hue specific information, individual S _i values can be represented in a spider web format [34], as shown in Fig. 2 for illuminant D65 (blue) and illuminant A (red). To keep the spider web diagram clear, the S _i values of objects with similar hues, as determined in the IPT colour space, have been averaged using the geometric mean: Red (S-sy); Orange (S-or, S-ha); Yellow (S-ba, S-cf); Green (S-ap, S-sc); Blue (S-sm, S-n4); Purple (S-la).

Fig. 2 Example of a spider web representation of a colour quality metric based on memory colours for illuminant D65 (blue) and Illuminant A (red).

Download Full Size | PDF

The spider web diagrams in Fig. 2 clearly indicate that neither illuminant D65 nor illuminant A are the perfect illuminants when reference is made to memory colours, as a perfect illuminant would have all S _i values equal to one. This is an important conclusion with respect to colour rendering systems attributing optimum values to these illuminants. Such a diagram also easily allows a hue specific comparison between different light sources. From the diagrams in Fig. 2 it is immediately clear that illuminant D65 has a much better rendering ability in the blue region, while illuminant A is better in orange region.

3. Validation

The performance of the general degree of similarity S _a as a colour quality descriptor was assessed by calculating the Pearson and Spearman correlation coefficients between the metric S _a values and the results of validation experiments carried out at the Light and Lighting Laboratory in Ghent. The metric was also validated using the visual appreciation results from an external study by Jost-Boissard et al. [35–37] at the University of Lyon. For comparison, the results for the CIE CRI [2] and the NIST CQS_a [38] are also presented.

3.1. Light and Lighting Laboratory validation experiments

3.1.1. Visual experiments

At the Light and Lighting Laboratory, a visual experiment was conducted with a mixture of both modern and classic light sources and with a set of objects spanning the entire hue circle. Six light sources, all having nearly equal correlated colour temperature (CCT = 2744 K ± 3%), were selected: a halogen lamp (H), a fluorescent lamp type F4 (F4), a Neodymium incandescent lamp (Nd), a Philips Fortimo LED module with a green filter (FG), an RGB LED lamp (RGB) and a LED cluster (LC) that was optimized to obtain a high S _a value. The set of light sources was composed of both conventional and modern sources, because a general colour quality metric should not depend on the type of light source. The spectral radiances of the sources were measured on a CERAM^® ceramic white tile placed at the object location, using a Bentham radiometric measuring head coupled to an Oriel Instruments 74055 MS260i spectrograph and an iDUS DV420A-BU2 CCD detector. A spectral bandwidth of 2.5 nm allowed for accurate determination of CIE 10 degree observer tristimulus values and associated coordinates [39]. The spectra are shown in Fig. 3 . The correlated colour temperature, the illuminance, the general degree of similarity, S _a, the classical CIE General Colour Rendering Index, R _a, and the NIST Colour Quality Scale CQS _a are summarized in Table 2 . The illumination level at the object location was 248 lux ± 2%.

Fig. 3 The spectral radiance of the six light sources used in the paired comparison experiment: F4 (black), FG (magenta), Nd (cyan), LR (blue), H (green) and RGB (red).

Download Full Size | PDF

Table 2. CCT, illuminance E, metric values S_a, R_a and CQS_a of the sources used in the experiment. Ranking is also included

View Table | View all tables in this article

The quality of the six light sources was investigated in terms of five descriptors: “preference”, “fidelity”, “vividness”, “naturalness” and “attractiveness”. The reasons for selecting these descriptors were several. First, attractiveness and naturalness were selected because they were also used in the visual experiments by Jost-Boissard et al. [37]. In another paper by Jost-Boissard et al. these two descriptors were used interchangeably with preference and fidelity [35]. Secondly, fidelity is also a term that is thought to describe colour quality as measured by a colour difference based metric, like the CIE CRI. Thirdly, previous colour quality metrics based on memory colours, e.g. Judd’s Flattery Index and Thornton’s Colour Preference Index attempted to estimate the colour quality of light sources in terms of preference. Finally, vividness was added as an extra term, because it was assumed to be closely related to colourfulness and saturation.

A group of 92 observers participated in the experiment: all had normal colour vision as tested by the 24 plate edition of the Ishihara test. The observers had no experience with colour experiments and it was left to the individuals to decide what was meant by each descriptor as this might shed some light on the differences in meaning of each of these concepts to a lay person. It might also give insight into the nature of colour quality assessment.

The experimental setup was a double-booth paired-comparison experiment in which the observers were shown, in successive order, a set of objects illuminated by two different light sources. The objects were placed inside a viewing booth which was painted a neutral Munsell N7 grey and whose dimensions were 60 cm (width) by 45 cm (depth) by 80 cm (height). The set of objects included were: an apple, a lemon, a Coca Cola^® can, a Smurf^® figurine, a jar of peas and carrots, a bottle of orange juice, a yellow bath duck, a Bounty^® candy bar, a roll of Mentos^®, a box of Leo^® cookies, a bag of Milka^® easter eggs, a jar of red cabbage, a picture of the ocean and foliage and an issue of a colour journal. The set of objects illuminated by light source FG is shown in Fig. 4 .

Fig. 4 The set of objects illuminated by light source “FG” as presented to the observer.

Download Full Size | PDF

Observers were given written instructions in which they were asked to form a general picture of the colour appearance of the first set of objects illuminated by a light source. Afterwards, they were asked to rate, with reference to the first set, the general appearance of the second identical object set, illuminated by a different light source, on a 7-point scale (−3..0..3) in terms of each of the quality descriptors listed above. In order to allow the observers to chromatically adapt to the applied light source, they had to look for at least 20 seconds at the presented object set before they were allowed to move on to the other light source.

3.1.2. Statistical analysis

Paired comparison data are often analysed using the method of Thurstone (case V). However, such an analysis offers only a rather coarse scaling [40]. Another method, which also makes use of the nominal ratings given by the observer, and which provides a more sensitive scaling, is the method developed by Scheffé [41]. In a Scheffé paired comparison experiment, an observer compares a pair of items by taking one as the reference and rating the other, on a 7- or 9-point scale. As one item of a pair is taken as the reference, the total number of pairs possible is doubled in a Scheffé setup: pair (1,2) is treated as different from pair (2,1). In an experiment with n = 6 items the total number of pairs is therefore 2*(n(n-1)/2) = 30 pairs. Preferentially each pair has to be rated by at least two different observers, amounting to a minimum of 60 observers. Fortunately, the latter requirement can be relaxed by assigning multiple pairs to an observer in such a way that each observer comes across each item only once. An observer having rated pair (1,2) can still rate the pairs formed by items 3,4,5 and 6. The minimum number of observers necessary for a Scheffé paired comparison analysis of 6 items thus becomes 2*(2*(n(n-1)/2)/(n/2)) = 20.

The Scheffé scale values α_i can be obtained from the paired comparison data as follows. Let x _ijk be the score given to pair comparison (i,j) by the kth observer. A probability matrix P _S can be constructed:

p_{(s, i j)} = \frac{1}{2 m} (\sum_{k = 1}^{m} x_{i j k} - \sum_{k = 1}^{m} x_{j i k}) (i, j = 1.. n; k = 1.. m)

With m the number of observers. The Scheffé scalings α _i, are then obtained as the row averages of the probability matrix P _S:

α_{i} = \frac{1}{n} \sum_{j = 1}^{n} p_{s, i j} (i = 1.. n)

The Scheffé interval scale values of the six light sources, and their visual ranking, are summarized in Table 3 for all the quality descriptors.However, Scheffé scale values α _i, can only assumed to be valid when the hypothesis of subtractivity is valid. The hypothesis states that there exist parameters α ₁, α ₂, … α _n characterizing the n items, such that the average preference π _ij for i over j is the difference of the corresponding parameters, i.e. π _ij = α _i – α _j [41].

Table 3. Scheffé scalings and rankings of the six light sources for the various quality descriptors. The average root-mean-square-differences (RMSD) between the individual observer ratings and the mean observer ratings are also given

View Table | View all tables in this article

The hypothesis was statistically tested for each quality descriptor using a F-test as described by Scheffé [41]. The hypothesis was found to be valid for all quality descriptors except naturalness, making the usefulness and interpretation of α _i values unclear [41]. For each quality descriptor, inter observer variability was assessed by calculating the average of the root-mean-square-differences between each observer’s scores and the mean of all observers. The average root-mean-square-differences (RMSD) were then rescaled from a 7-point scale to a 0-100 scale and are shown in Table 3. The values were found to be between 15% and 20% of the total rating scale, indicating a reasonable to good agreement between the different observers.

When comparing the different quality descriptors, it is clear from Table 3 that preference and attractiveness result in the same light source rank order and very similar Scheffé scalings, suggesting that observers prefer what they find most attractive. The Scheffé scalings for fidelity were found to be more similar to those for naturalness, even though they did not result in the same rank order. In fact, the rank order for fidelity was the same as that for preference and attractiveness (despite their slightly different Scheffé scalings). This discrepancy between fidelity and naturalness was surprising, because at the outset of the experiment, they were assumed to be synonymous. When questioned, observers reported that they interpreted the term fidelity as the realness of the object colours judged on what they remembered the objects to look like. The different rank order for naturalness, might be explained by the possible unreliable nature of the Scheffé scalings for naturalness, because of the violation of the hypothesis of subtractivity. Nevertheless, the results suggest that the terms fidelity and naturalness were not considered to be identical in the observers’ minds, because the observers apparently had no difficulty in judging fidelity, while they did in the case of naturalness. As for vividness, it is clear from the results that it was not judged as expected, i.e. as saturation or chroma. Although the RGB LED source did have the highest chroma enhancing effect on the objects, it was nonetheless not selected as the most vivid source by many of the observers because of the large hue shift (towards the red) of almost all of the objects. Other less chroma enhancing sources, like the Nd, the FG and the LC sources, but which introduced a much subtler hue shift, were found to result in a more vivid appearance of the objects. The effects of these sources on object appearance were also more preferred and found to be more attractive and real looking, suggesting that enhancing the chroma of an object has only a limited appeal and that careful consideration should be taken of the associated hue shifts.

From the rankings in Table 3 it is also clear that the halogen light source is not considered to be the optimum light source. Depending on which quality descriptor is being considered, one to three sources were found to score higher than the halogen light source. This was always the case for the Neodymium incandescent source, thereby confirming the statements by Davis [5] and Ohno [6] that a Neodymium incandescent source has a higher appreciation than a normal incandescent source.

The interdependence of the various colour quality descriptors was also investigated by a maximum likelihood common factor analysis with orthogonal rotation (varimax) [42]. It was found that loading the five descriptors onto two factors could explain 95.3% of the variance. The results of the factor analysis are shown in Fig. 5 . The reliability of the factor analysis was determined by the Kaiser-Meyer-Olkin measure (KMO) and Bartlett’s test of sphericity [43]. The KMO value was found to be 0.783, indicating that the degree of common variance among the 5 descriptors was “middling” bordering on “meritorious” and that factor analysis was appropriate for the given data set. Bartlett's test further supported the appropriateness of factor analysis (H₀: original correlation matrix is an identity matrix, χ² = 205.25,df = 10, p < 0.001).

Fig. 5 Factor analysis of the five colour quality descriptors investigated.

Download Full Size | PDF

The factor loadings of the five descriptors onto the two factors were: preference (0.80,0.57); fidelity (0.92,0.25); vividness (0.15,0.77); naturalness (0.96, 0.18) and attractiveness (0.72,0.67). Based on the factor loadings three groups could be distinguished (Fig. 5): vividness, attractiveness/preference and fidelity/naturalness.

The results of the factor analysis confirmed the similarity between attractiveness and preference. They also suggested that fidelity was indeed more similar to naturalness, than to any of the other descriptors, despite the differences in ranking. Nevertheless, despite the suggested similarity between these two descriptors, it still holds that in the observers’ mind they were not completely identical as indicated by the differences in difficulty in rating colour quality in terms of fidelity and naturalness (cfr. Violation of hypothesis of subtractivity). It should also be noted that unreliable nature of the scaled values for naturalness might have had an influence. The closeness of fidelity/naturalness and vividness to respectively the first and second factors also suggests that they might approximately be regarded as two basis descriptors spanning a two dimensional colour quality space. Therefore, preference/attractiveness could be considered as a combination of fidelity/naturalness and vividness.

3.1.3. Correlation with the general degree of memory colour similarity S_a

In Fig. 6 , the S _a values have been plotted together with the Scheffé scale values for attractiveness to illustrate the correspondence between the degree of memory colour similarity and the visual appreciation. To make a visual comparison easier, the Scheffé scale values were linearly rescaled from their 0-100 range to the range occupied by the metric values. For comparison, the correspondence with the classical CRI R _a and the NIST CQS _a was also presented. The S _a values clearly corresponded better with the visual results then the R _a values. The high ranking of the Nd lamp was also correctly predicted by S _a.

Fig. 6 Correspondence of the metric predictions and the visual appreciation of the attractiveness of the lighting quality of the six light sources. The Scheffé scale values have been linearly rescaled to the range occupied by the metric values.

Download Full Size | PDF

The performance, in terms of Pearson correlation, of the S _a, R _a and CQS _a metrics for the various quality descriptors is summarized in Table 4 .From Table 4 it is clear that the S _a values correlated very well with the visual appreciation in terms of preference and attractiveness. The correlation of the S _a values with fidelity and naturalness was lower, but it could still be considered satisfactory. The correlation for vividness was rather low. For all quality descriptors, the S _a values were found to correlate as well as or better than the R _a and CQS _a values with the results obtained from the psychophysical experiment. This is not surprising, considering that the CIE CRI is in fact an objective metric based on colour difference. The NIST CQS_a, a colour difference metric which does not penalize chroma enhancement, showed an increased correlation, relative to the CIE CRI, for vividness, preference and attractiveness. No differences were found between the NIST CQS_a and the CIE CRI for fidelity and naturalness.

Table 4. Pearson correlation coefficients between the metric values and the various quality descriptors

View Table | View all tables in this article

The Spearman correlation coefficients are summarized in Table 5 . Spearman correlation gives an assessment of the performance of a metric in terms of its ability to rank the light sources in the correct order. A metric with a high Pearson correlation, but without the ability to adequately rank the light sources (low Spearman correlation) is of no practical use. The high Spearman correlation coefficients of the S _a values with the visual appreciation in terms of the Scheffé scale values for preference, attractiveness and fidelity suggest that a metric based on memory colours is very capable of predicting the correct rank order of the light sources. The correlation for naturalness was rather low, but this might be due to the fact that the Scheffé scale values failed to adequately characterize the observer ratings in terms of naturalness (cfr. violation of hypothesis of subtractivity). On the other hand, the low Spearman correlation of the CIE CRI (and NIST CQS_a) indicates only a limited ability to correctly rank the light sources despite its moderate Pearson correlation, confirming the reported failure of the CIE CRI to correlate well with the visual appreciation of (some) narrow-band light sources [7–11].

Table 5. Spearman correlation coefficients between the metric values and the various quality descriptors

View Table | View all tables in this article

The Meng, Rosenthal and Rubin method [44] for comparing correlated (the same Scheffé values were used with each metric) correlation coefficients was used to test the significance of the previous findings. The p-values for a null hypothesis that the colour quality metric based on memory colours performs equally well as the CIE CRI or NIST CQS_a are summarized in Table 6 . For preference, fidelity and attractiveness, the metric was found to be significantly better (p < 0.1) than the CIE CRI and NIST CQS_a at predicting the visual rank order of the light sources. No differences were found between the CIE CRI and the NIST CQS_a.

Table 6. Significance of the difference in predictive performance of the S_a, R_a and CQS_a values using Meng, Rosenthals and Rubin's method for comparing correlated correlation coefficients. Values shown are the two-tailed p-values with the null hypothesis that the two compared correlation values are equal

View Table | View all tables in this article

3.2. Visual experiments by Jost-Boissard et al

3.2.1. Experiments

To confirm the results of the validation experiments held at the Light and Lighting Laboratory in Ghent, the performance of the S _a-metric was investigated using the visual appreciation results obtained in an independent study by Jost-Boissard et al. [35,36].

In a first series of experiments [35], nine light sources with a correlated colour temperature of approximately 3000 K were compared: a halogen source, a fluorescent source and 7 different LED clusters. The quality of the lighting was assessed based on the colour appearance of a set of objects which consisted of a tomato, a red apple, an orange, a banana, a lemon, an endive, a leek, a green apple and a courgette. All objects were displayed simultaneously and the level of illumination at the object location was 230 lux ± 3%. A group of 45 observers (21 female, 24 male) participated in the first experiment. None of the observers showed any signs of colour deficiency as tested by the Farnsworth D15 test. The observers were shown all possible light source combinations in a random order using a double viewing booth setup. For each lighting combination observers had to assess which light source had the best lighting quality (a forced choice paired comparison method).

Jost-Boissard et al. used a Thurstone analysis (case V) [40] to establish an interval scale for the attractiveness of the lighting of each of the nine sources.

In a Thurstone analysis, the paired comparison data are summarized in a probability matrix P(i>j), the matrix elements being the proportion of times that item i was selected above item j. Thurstone case V scalings are obtained by transforming the P matrix of probabilities to a Z matrix of unit normal deviates or z-scores:

P_{T} (i > j) = (\begin{matrix} \begin{matrix} p 11 & p 12 \\ p 21 & p 22 \end{matrix} & \begin{matrix} \dots & p 1 n \end{matrix} \\ \begin{matrix} ⋮ \\ p n 1 \end{matrix} & \begin{matrix} ⋱ \\ p n n \end{matrix} \end{matrix}) \to Z = (\begin{matrix} \begin{matrix} z 11 & z 12 \\ z 21 & z 22 \end{matrix} & \begin{matrix} \dots & z 1 n \end{matrix} \\ \begin{matrix} ⋮ \\ z n 1 \end{matrix} & \begin{matrix} ⋱ \\ z n n \end{matrix} \end{matrix})

Finally, the matrix is averaged over the items in the experiment:

s_{i} = \sum_{j \neq i, j = 1}^{n} z_{i j} (i = 1.. n)

In a second series, similar experiments were performed for a set of eight 4000 K light sources consisting of a fluorescent source and seven different LED clusters that were similar to the ones from the 3000K experiments, except that they were set to the higher correlated temperature of 4000 K. The illumination level at the object location was 210 lux ± 3%. A group of 36 observers (17 female, 19 male) with normal colour vision participated in the experiment. Thurstone analysis (case V) [40] was again used to establish an interval scale for the attractiveness of the lighting of each of the eight sources.

3.2.2. Correlation with the general degree of memory colour similarity S_a

The performance of the general degree of similarity S _a as a colour quality descriptor was again assessed by calculating the Pearson and Spearman correlation coefficients between the metric S _a values and Thurstone scale values. All results are shown in Table 7 .From Table 7 it is clear that although a colour quality evaluation based on memory colours performs better than the CIE CRI and the NIST CQS, the overall level of correlation of the S _a values with visual results is only moderate and considerably lower than the correlation obtained in the Light and Lighting Laboratory experiments. This could be explained by the fact that in the visual experiments by Jost-Boissard et al. no cyan, blue or magenta objects were present, while this region of the hue circle was being taken into consideration in the calculation of the S _a values (and R _a and CQS _a values). To investigate the impact of this effect, the S _a, R _a and CQS _a values were recalculated using only the test samples/objects in the green-yellow-red region of the hue circle. S _a values were recalculated with the blue and purple samples (Smurf^® and lavender) omitted; R _a values were recalculated using special colour rendering indices 1-4, 9-11 & 13-14; and CQS _a values were calculated using special indices 7-15. The results for the modified metrics are shown in Table 8 and Fig. 7 . For comparison, the Thurstone scale values, linearly rescaled from their 0-100 range to the range occupied by the metric values, were also plotted.Comparing these results with those found using all the samples and objects a substantial increase in correlation for the S _a values was indeed observed, while there was only a very slight increase for the R _a and CQS _a values. The results illustrated in Fig. 7 confirm the previous findings that the S _a values correlate very well with the visual appreciation of light sources.

Table 7. Pearson and Spearman correlation coefficients of S_a, R_a and CQS_a values with the visual appreciation results obtained by Jost-Boissard et al.

View Table | View all tables in this article

Table 8. Pearson and Spearman correlation coefficients of the recalculated S_a, R_a and CQS_a values, by using only special indices in the green-yellow-red region of the hue circle), with the visual appreciation results obtained by Jost-Boissard et al.

View Table | View all tables in this article

Fig. 7 Correspondence of the metric predictions and the attractiveness of the lighting quality for the 3000K and 4000K sources. The Thurstone scale values have been linearly rescaled to the range occupied by the metric values.

Download Full Size | PDF

The significance of the differences in correlation between the S _a, R _a and CQS_a values was checked using the method of Meng, Rosenthal and Rubin [44]. Results (p-values for H₀: “no difference”) are shown in Table 9 .The p-values for the Pearson and Spearman correlation confirm that a colour quality metric based on memory colours performs statistically significantly (p < 0.05) better than the CIE CRI and the NIST CQS for both the 3000K and 4000K light sources. The NIST CQS_a was found to perform significantly (p < 0.1) better than the CIE CRI in all cases except for the ranking of the 4000 K light sources. The p-values obtained are much smaller than those found for the validation experiments held at the Light and Lighting Laboratory. The reason for the much smaller p-values is in part due to the number of light sources used: nine/eight versus six; a correlation is more significant when it is based on a larger number of samples. Another possible reason is that the Jost-Boissard data set consisted almost exclusively out of LEDs for which the S _a and R _a values resulted in radically different rank orders, while the addition of classic sources, in the experiments conducted by the authors, resulted in an increased correlation for CIE CRI, thereby making a statistical significant difference more difficult to establish. The choice of using a mixture of light source technologies was however intentional, because this allowed to verify whether the S _a-metric is also able to rank a mixture of classic and modern light sources. A good correlation for one type of light sources might not necessarily result in a good correlation for another type. A clear example, is the CIE CRI which is assumed to work well with classic sources, while it is known to fail for some LED sources. Another example of a metric that was found to work well for only one type of light source is a colour rendering index based on the feeling of contrast [45]. This metric essentially compares the gamut area of 4 test samples under the test source to that under D65, making it highly biased towards chroma enhancing light sources, such as those used in the experiments by Jost-Boissard et al. (max Spearman ρ = 0.92). However, for a mixture of conventional and modern light sources, (cfr. the Light and Lighting experiments) the metric was found to correlate poorly with the visual results (max Spearman ρ = 0.09).

Table 9. Significance of the difference in predictive performance of the S_a, R_a and CQS_a values using Meng, Rosenthals and Rubin's method for comparing correlated correlation coefficients. Values shown are the two-tailed p-values with the null hypothesis that the two compared correlation values are equal

View Table | View all tables in this article

4. Conclusions

In this paper, a metric based on memory colours for evaluating the colour quality of a light source was described. The metric evaluates the colour quality by comparing the colour appearance of a set of ten familiar objects illuminated by the test source with the memory colours of the same objects as obtained by the authors in a previous study. The performance of the metric was assessed by calculating the Pearson and Spearman correlation coefficients between the metric predictions and the visual appreciation of the colour quality of the lighting as assessed in three psychophysical experiments. Two of the three experiments were performed independently from the experiment conducted by the authors.

In the series of visual experiments performed by the authors, observers were asked to rate the colour quality of a light source in terms of preference, fidelity, vividness, naturalness and attractiveness. The metric was found to correlate very well with preference, attractiveness and fidelity and moderately well with vividness and naturalness. Observers’ ideas about naturalness were, surprisingly, not identical to fidelity. In fact, the assessment of naturalness seems to be rather difficult as became clear by the rather low statistical reliability.

Factor analysis revealed that that the five descriptors could be summarized in three groups: vividness, preference/attractiveness and fidelity/naturalness. The factor loadings of preference/attractiveness group suggested that they were a combination of the vividness and the fidelity/naturalness descriptors.

A high correlation between the metric values and the visual appreciation of the colour quality of the sets of light sources investigated in two other psychophysical experiments confirmed the previous results.

Comparison of the visual appreciation results of the two independent studies and the metric values revealed the importance of the choice of object set, i.e. visual experiment and metric should evaluate the same regions of the hue circle.

In both the visual appreciation experiments as well as the memory colour experiments by Smet et al. [25], observers had to rely on their memory to make a judgement of the colour appearance of the presented object(s). The high correlation values between the memory colour based colour quality metric and the results of the visual appreciation experiments are therefore a strong indication that memory colours do indeed play a vital role in the way people judge the colour appearance of the world around them, thereby supporting Hering’s idea about the spectacles of memory [13].

In summary, evaluating the colour quality of a light source by reference to memory colours was found to correlate well with the visual appreciation of the lighting quality of all three sets of light sources and was also found to be significantly (p < 0.1) better than the CIE CRI and NIST CQS_a in correctly ranking the light sources. The strong and consistent ability of the metric to predict the attractiveness and preference suggests that it might be a worthwhile metric, in addition to the current CIE CRI (and NIST CQS_a), to evaluate the colour quality of a light source.

Acknowledgements

The authors would like to thank Sophie Jost-Boissard for kindly sharing the results of her visual experiments in order to help validate a memory colour based colour quality metric.

References and links

1. CIE, “Method of Measuring and Specifying Colour Rendering Properties of Light Sources,” in CIE13.2–1974(CIE, Vienna, Austria, 1974).

2. CIE, “Method of Measuring and Specifying Colour Rendering Properties of Light Sources,” in CIE13.2–1995(CIE, Vienna, Austria, 1995).

3. D. B. Judd, “A flattery index for artificial illuminants,” Illum. Eng. 62, 593–598 (1967).

4. W. A. Thornton, “A validation of the color preference index,” Illum. Eng. 62, 191–194 (1972).

5. W. Davis and Y. Ohno, “Approaches to color rendering measurement,” J. Mod. Opt. 56(13), 1412–1419 (2009). [CrossRef]

6. Y. Ohno, and W. Davis, “Color Quality and Spectra,” in Photonics Spectra (2008). [PubMed]

7. P. Bodrogi, P. Csuti, P. Hotváth, and J. Schanda, “Why does the CIE Colour Rendering Index fail for White RGB LED Light Sources?” in CIE Expert Symposium on LED Light Sources: Physical Measurement and Visual and Photobiological Assessment(Tokyo, Japan, 2004).

8. W. Davis, and Y. Ohno, “Toward an Improved Color Rendering Metric,” in SPIE 2005(2005), pp. 59411G–59418.

9. F. Szabó, J. Schanda, P. Bodrogi, and E. Radkov, “A Comparative Study of New Solid State Light Sources,” in CIE Session 2007(2007).

10. N. Narendran, and L. Deng, “Color Rendering Properties of LED Light Sources,” in Solid State Lighting II: Proceedings of SPIE(2002).

11. T. Tarczali, P. Bodrogi, and J. Schanda, “Colour Rendering Properties of LED Sources,” in CIE 2nd LED Measurement Symposium(Gaithersburg, 2001).

12. C. J. Bartleson, “Memory colors of familiar objects,” J. Opt. Soc. Am. 50(1), 73–77 (1960). [CrossRef] [PubMed]

13. M. D. Fairchild, Color Appearance Models, (John Wiley & Sons, Chichester, 2005), p. 158.

14. S. M. Newhall, R. W. Burnham, and J. R. Clark, “Comparison of Successive with Simultaneous Color Matching,” J. Opt. Soc. Am. 47(1), 43–54 (1957). [CrossRef]

15. C. J. Bartleson, “Color in memory in relation to photographic reproduction,” Photon. Sci. Eng. 5, 327–331 (1961).

16. C. L. Sanders, “Colour preferences for natural objects,” Illum. Eng. 54, 452–456 (1959).

17. P. Siple and R. M. Springer, “Memory and preference for the colors of objects,” Percept. Psychophys. 34(4), 363–370 (1983). [CrossRef] [PubMed]

18. J. Pérez-Carpinell, M. D. de Fez, R. Baldoví, and J. C. Soriano, “Familiar objects and memory color,” Color Res. Appl. 23(6), 416–427 (1998). [CrossRef]

19. S. N. Yendrikhovskij, F. J. J. Blommaert, and H. de Ridder, “Representation of memory prototype for an object color,” Color Res. Appl. 24(6), 393–410 (1999). [CrossRef]

20. P. Bodrogi, and T. Tarczali, “Corresponding colours: the effect of colour memory.,” in Colour Image Science CIS'2000 Conference(Derby, UK, 2000).

21. P. Bodrogi and T. Tarczali, “Colour memory for various sky, skin, and plant colours: Effect of the image context,” Color Res. Appl. 26(4), 278–289 (2001). [CrossRef]

22. P. Bodrogi, and T. Tarczali, “Investigation of Colour Memory,” in Colour Image Science: Exploiting Digital Media, L. W. MacDonald, and M. R. Luo, eds. (John Wiley & Sons Limited, Chichester, 2002), pp. 23–48.

23. A. C. Hurlbert and Y. Ling, “If it's a banana, it must be yellow: The role of memory colors in color constancy,” J. Vis. 5(8), 787–787 (2005). [CrossRef]

24. M. Olkkonen, T. Hansen, and K. R. Gegenfurtner, “Color appearance of familiar objects: effects of object shape, texture, and illumination changes,” J. Vis. 8(5), 13–16 (2008). [CrossRef] [PubMed]

25. K. Smet, W. R. Ryckaert, M. R. Pointer, G. Deconinck, and P. Hanselaer, “Colour Appearance Rating of Familiar Real Objects,” Colour Research and Application DOI: 10.1002/col.20620 (2010).

26. S. N. Yendrikhovskij, F. J. J. Blommaert, and H. de Ridder, “Color reproduction and the naturalness constraint,” Color Res. Appl. 24(1), 52–67 (1999). [CrossRef]

27. C. L. Sanders, “Assessment of color rendition under an iIlluminant using color tolerances for natural objects,” Illum. Eng. 54, 640–646 (1959).

28. K. Smet, S. Jost-Boissard, W. R. Ryckaert, G. Deconinck, and P. Hanselaer, “Validation of a colour rendering index based on memory colours,” in CIE Lighting Quality & Energy Efficiency(CIE, Vienna, 2010), pp. 136–142.

29. K. Smet, W. R. Ryckaert, G. Deconinck, and P. Hanselaer, “A colour rendering metric based on memory colours (MCRI),” in CREATE 2010(CREATE, Gjövik, Norway, 2010), pp. 354–356.

30. K. Smet, W. R. Ryckaert, S. Forment, G. Deconinck, and P. Hanselaer, “Colour rendering: an object based approach,” in CIE Light and Lighting Conference with Special Emphasis on LEDs and Solid State Lighting (Budapest, Hungary, 2009).

31. F. Ebner, and M. D. Fairchild, “Development and testing of a color space (IPT) with improved hue uniformity,” in IS&T 6th Color Imaging Conference(Scottsdale, Arizona, USA, 1998), pp. 8–13.

32. NIST/SEMATECH, “e-Handbook of Statistical Methods, The Multivariate Normal Distribution,” http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc542.htm. (2010).

33. N. Moroney, M. D. Fairchild, R. W. G. Hunt, C. Li, M. R. Luo, and T. Newman, “The CIECAM02 color appearance model,” in IS&T/SID Tenth Color Imaging Conference(2002), p. 23.

34. P. van der Burgt and J. van Kemenade, “About color rendition of light sources: The balance between simplicity and accuracy,” Color Res. Appl. 35, 85–93 (2010).

35. S. Jost-Boissard, M. Fontoynont, and J. Blanc-Gonnet, “Colour Rendering of LED Sources: Visual Experiment on Difference, Fidelity and Preference,” in CIE Light and Lighting Conference with Special Emphasis on LEDs and Solid State Lighting (Budapest, 2009).

36. S. Jost-Boissard, M. Fontoynont, and J. Blanc-Gonnet, (personal communication, 2009).

37. S. Jost-Boissard, M. Fontoynont, and J. Blanc-Gonnet, “Perceived lighting quality of LED sources for the presentation of fruit and vegetables,” J. Mod. Opt. 56(13), 1420 (2009). [CrossRef]

38. W. Davis and Y. Ohno, “Color quality scale,” Opt. Eng. 49(3), 033602 (2010). [CrossRef]

39. Y. Ohno, “Spectral Colour Measurement,” in Colorimetry: Understanding the CIE System, J. Schanda, ed. (CIE Central Bureau, Vienna, Austria, 2007), pp. 5.1–5.30.

40. L. L. Thurstone, “A law of comparative judgment,” Psychol. Rev. 101(2), 266–270 (1994). [CrossRef]

41. H. Scheffé, “An analysis of variance for paired comparisons,” J. Am. Stat. Assoc. 47(259), 381–400 (1952). [CrossRef]

42. A. T. Basilevsky, Statistical Factor Analysis and Related Methods: Theory and Applications (Wiley-Interscience, Chichester, 1994).

43. A. Field, Discovering Statistics Using SPSS (SAGE Publications Ltd, London, UK, 2009).

44. X. L. Meng, R. Rosenthal, and D. B. Rubin, “Comparing correlated correlation coefficients,” Psychol. Bull. 111(1), 172–175 (1992). [CrossRef]

45. K. Hashimoto, T. Yano, M. Shimizu, and Y. Nayatani, “New method for specifying color-rendering properties of light sources based on feeling of contrast,” Color Res. Appl. 32(5), 361–371 (2007). [CrossRef]

Object name	Symbol	a ₃	a ₄	a ₅	a ₆	a ₇
Green apple	S-ap	−0.0907	0.3906	46.0900	18.9538	−0.2833
Ripe banana	S-ba	0.1553	0.3676	66.0833	23.3494	−0.8224
Orange	S-or	0.3085	0.4683	40.8171	20.6933	−6.9833
Dried lavender	S-la	0.0191	−0.1258	296.9222	127.4322	−52.0363
Smurf^®	S-sm	−0.1203	−0.2083	112.8298	64.3715	−45.8178
Strawberry yoghurt	S-sy	0.1725	0.0162	66.1341	71.8095	−26.2745
Sliced Cucumber	S-sc	−0.0329	0.2245	107.4140	36.7800	3.0505
Cauliflower	S-cf	0.0390	0.0523	233.1061	60.9832	−40.6899
Caucasian skin (hand)	S-ha	0.1481	0.1312	102.3266	81.1393	−21.9418
Munsell N4 grey sphere	S-n4	−0.0190	−0.0164	843.3289	342.1807	−261.1875

Light source	CCT (K)	E (lx)	S_a		CIE CRI R_a		NIST CQS_a
Light source			S _a	Rank	R _a	Rank	CQS _a	Rank
F4	2657	239	0.6672	5	52.8	5	53.9	5
FG	2878	255	0.7787	3	80.6	3	87.2	3
Nd	2743	246	0.7841	2	73.7	4	87.0	4
LC	2798	246	0.7899	1	81.0	2	89.0	2
H	2640	249	0.7662	4	99.6	1	97.2	1
RGB	2747	251	0.6548	6	31.9	6	50.5	6

SCHEFFE	Preference	Fidelity	Vividness	Naturalness	Attractiveness
S _a	0.95**	0.84**	0.57	0.87**	0.96**
R _a	0.78*	0.78*	0.13	0.88**	0.74*
CQS _a	0.87*	0.78*	0.40	0.86**	0.86**

SCHEFFE	Preference	Fidelity	Vividness	Naturalness	Attractiveness
S _a	0.94**	0.94**	0.77	0.60	0.94**
R _a	0.49	0.49	0.14	0.60	0.49
CQS _a	0.49	0.49	0.14	0.60	0.49

	Preference	Fidelity	Vividness	Naturalness	Attractiveness
H₀: “S _a = R _a”	0.067	0.067	0.137	1.000	0.067
H₀: “S _a = CQS_a”	0.067	0.067	0.137	1.000	0.067
H₀: “R _a = CQS_a”	1.000	1.000	1.000	1.000	1.000

Memory colours and colour quality evaluation of conventional and solid-state lamps

Abstract

1. Introduction

2. Method

2.1. Similarity distributions of familiar real objects

2.2. A colour quality metric based on memory colours