Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Towards an optimum colour preference metric for white light sources: a comprehensive investigation based on empirical data

Open Access Open Access

Abstract

Colour preference is a critical dimension for describing the colour quality of lighting and numerous metrics have been proposed. However, due to the variation amongst psychophysical studies, consensus has not been reached on the best approach to quantify colour preference. In this study, 25 typical colour quality metrics were comprehensively tested based on 39 groups of psychophysical data from 19 published visual studies. The experimental results showed that two combined metrics: the arithmetic mean of the gamut area index (GAI) and colour rendering index (CRI) and the colour quality index (CQI), a combination of the correlated colour temperature (CCT) and memory colour rendering index (MCRI), exhibit the best performance. Qp in the colour quality scale (CQS) and MCRI also performed well in visual experiments of constant CCT but failed when CCT varied, which highlights the dependence of certain metrics on contextual lighting conditions. In addition, it was found that some weighted combinations of an absolute gamut-based metric and a colour fidelity metric exhibited superior performance in colour preference prediction. Consistent with such a result, a novel metric named MCPI (colour preference index based on meta-analysis) was proposed by fitting the large psychophysical dataset, and this achieved a significantly higher weighted average correlation coefficient between metric predictions and subjective preference ratings.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

The colour preference of lighting is a much explored topic in the field of lighting quality evaluation [13]. The aim of the present research was to investigate under which light source an illuminated object looks more appealing in terms of colour [47] and to explore the factors that influence that visual attribute [811]. Most importantly, it has been recognized that an effective metric needs to be established that correlates adequately with the loosely-defined term ‘visual appreciation’ (i.e. Preference, Attractiveness and Pleasantness) as experienced by human observers [1216].

During recent years, researchers associated with the Commission Internationale de l'Eclairage (CIE) have developed and tested numerous metrics for quantifying colour preference, especially within the framework of CIE TC 1–69 (Colour rendition by white light sources) and TC 1–91 (New Methods for Evaluating the Colour Quality of White-Light Sources). Despite their great effort however, no official CIE recommendation for a specific colour preference metric has been forthcoming. To further investigate this topic, CIE has recently established a research forum (RF-03, Matters Related to Colour Rendition), which is closely associated with second top priority topic ‘Colour Quality of Light Sources Related to Perception and Preference’ in the CIE Research Strategy [17]. Thus, it is expected that the investigation of colour preference assessment will remain a popular topic for some time.

Psychophysical experiments are the most reliable way to investigate colour preference and test associated objective metrics. The reason for adopting this approach is that no matter how well-founded a metric seems to be in theory, it will not be regarded as a reliable measure without visual validation. On the other hand, it is precisely due to the variation and uncertainty in psychophysical studies that development of a canonical metric has proved difficult to achieve.

According to the current literature, the colour preference of lighting is correlated to several contextual factors, including Correlated Colour Temperature (CCT) [18], illuminance [11], lighting application [9], cultural background [10], gender [8], familiarity [5] and colour features [19]. It is not possible for a single study to consider all of these variables simultaneously, and so the most common approach is to control (or simply set aside) most of the factors while only discussing a single target variable. Due to the diversity of possible experimental configurations, it is common, and unavoidable, that psychophysical studies with different experimental settings lead to contradictive results [20]. Such a situation highlights the inseparability of the findings and conclusions of these studies and the corresponding preconditions. In other words, it is easy to derive a metric by fitting the visual data from a single experiment with a defined setup, but it is difficult to ensure that the proposed metric has strong validity for other conditions.

To attempt to solve this problem, and promote statistical robustness, a sensible protocol is to draw universal conclusions from a large collection of psychophysical data. Theoretically, such an approach is more conducive to estimate the true strength of association amongst several related experiments and derive more robust conclusions. Using certain statistical implementations, for example, meta-analysis, the sampling error or within-study variance can also be simultaneously corrected [21]. For instance, by a meta-analysis of the correlation coefficients between metric predictions and subjective ratings of colour preference and naturalness, Smet et al. evaluated the performance of 13 typical colour rendition metrics with regard to the psychophysical data of 7 visual studies [1]. Then, based on a larger psychophysical dataset (15 visual studies of colour preference and 12 studies of naturalness), these authors further validated the superiority of their proposed Memory Colour Rendering Index (MCRI) [22]. Inspired by those studies, in 2017, Liu et al. proposed a Gamut Volume Index (GVI) for quantifying the colour preference of lighting, which was based on analysis of visual data from 8 studies [12].

The theoretical advantages of the above-mentioned studies over a single visual experiment are obvious. It must be pointed out however, that for these analyses with large datasets, uneven and improper sampling might also lead to unexpected bias in the results. For instance, Smet et al. [1], mainly focused on lighting scenarios of approximately similar CCTs while ignoring the scenarios of different CCTs, thus, the deficiency of the MCRI [12] under multiple-CCT conditions was not demonstrated. The GVI metric considered lighting conditions with multiple-CCTs [12], but due to improper interpretation of the psychophysical results obtained in light booth experiments, some visual data were overweighed: those data were obtained from visual tests conducted in light booths in which multiple objects were consecutively observed under the same group of light sources and the average preference ratings of the observers were found to be highly consistent in rank order, regardless of the object viewed. As revealed in our latest work, such consistency was due to the underlying cognitive mechanism during consecutive visual judgement in the booth, which was caused by the inherent stimulus configuration in that kind of experiment [3].

It is also worth noting that, although these studies have employed research data from multiple experiments, the size of the dataset remains a limiting factor. Therefore, further analysis based on a larger psychophysical dataset with more reasonable and comprehensive sampling is still needed.

The purpose of this work was to interpret and compare the performance of existing, state-of-the-art, colour quality metrics for colour preference prediction. In addition, an updated and superior metric based on a suitably robust visual dataset is desired. To be specific, this study describes a comprehensive comparison of 25 colour quality metrics that can be used for the prediction of colour preference, with 39 groups of psychophysical data, from 19 published visual studies (see Supplement 1, Table S1 for detailed information). The size of the dataset for this work is considerably larger than that of Smet et al. (7 groups of data from 6 studies [1] and 21 groups of data from 15 studies [22], respectively) and Liu et al. (32 lighting scenarios from 8 studies [12]), and the number of test metrics is also larger, and includes many newly-proposed measures from recent literature. Noteworthily, when developing the GVI metric [12], one lighting scenario was defined as a visual test with a certain combination of test object and test light source. This protocol, however, has been proved by our latest work to be inappropriate [3]. Therefore, in this work, we averaged all the data for different test objects and different observers with regard to the same group of light sources and then defined the average values as one group of visual data. This rationalization is demonstrated below. Note to that, if the previous protocol (i.e. a lighting scenario) was followed to define a dataset, the current dataset would contain, in total, 86 lighting scenarios. It is also worth noting that the large majority of psychophysical data collected in this work were published in the last 3 years, and they have thus not been systematically examined in former studies [1,12,22].

In the following sections, the colorimetric rationale and corresponding classifications of the 25 tested metrics will be introduced in Section 2, followed by a short introduction and explanation of the visual data collected from 19 published studies in Section 3. In Section 4, the performance of the metrics for each visual study will be assessed by the use of the Pearson correlation coefficient between metric predictions and subjective colour preference ratings, and the general performance of a metric will be represented by a weighted average correlation coefficient based on individual correlation coefficients of all the studies. Every effort has been made to obtain a comprehensive interpretation of the results and to avoid bias. The possible impact of certain factors on metric performance, such as CCT, chromaticity, lighting environment, illuminance and time course of chromatic adaptation, was also considered in varying degrees and will be discussed as appropriate. In addition, an updated metric based on the large psychophysical dataset will be presented. Finally, in Section 5, conclusions will be drawn, together with insights for future work. It is to be hoped that the results of the current research will provide an effective validation of previous knowledge and an essential supplement to current knowledge.

2. Colour quality metrics

Twenty-five colour quality metrics were tested in this study. They include the CIE Colour Rendering Index (CRI: Ra) [23], CRI-CAM02UCS [24], Colour Quality Scale (CQS: general scale Qa, colour fidelity scale Qf, gamut area scale Qg in version 9.0.3 and colour preference scale Qp in version 7.4) [13], CRI2012 [25], colour fidelity score Rf (also known as the CIE Rf [26]) and colour gamut score Rg in IES TM-30-18 [14], Color Preference Index (CPI) [27], Mean Chroma Shift in CQS (ΔC*), used in [2,28], Gamut Area Index (GAI) [29], Colour Discrimination Index (CDI) [30], Cone Surface Area (CSA) [31], Feeling of Contrast Index (FCI) [32], Gamut Volume Index (GVI) [12], Memory Colour Rendering Index (MCRI) [4], Colour Quality Index (CQI [2] and CQI’ [28]), mean of GAI and CRI (GAI-CRI) [18], Sneutral [33], White Sensation (WS) [34], Percent of Tint [35], Daylight Spectrum Index (DSI) [36] and Duv [37]. For detailed information (e.g. colour rendition intent and calculation protocol) and deeper understanding of these metrics, the reader is referred to the cited articles as well as to the review by Houser et al. [38].

Strictly speaking, Duv, a measure of the distance of the test source chromaticity coordinates from the point on the Planckian locus with the same CCT, in the CIE u’v’ chromaticity diagram, is not a measure of the colour quality of lighting. According to current literature, however, such a metric has been shown to correlate well with perceived colour pRef. [3943] and thus, in this study, the performance of this metric is included. Similarly, many of the metrics listed above were not specifically proposed for the evaluation of colour preference, but in current research these metrics have usually been included in the comparative analysis [1,2,12,18,28,4446]. Thus, they are included here.

Based on the calculation protocol, rather than the colour rendition intent, the 25 colour quality metrics have been divided into seven subgroups, as described below. The reason for this division is that metrics that share a similar calculation protocol commonly exhibit similar correlation with subjective colour preference ratings, whereas their original colour rendition intent might be different [12]. For instance, both GVI [12] and CDI [30] are based on the colour gamut but the former was initially proposed for colour preference while the latter for colour discrimination.

2.1 Colour difference based metrics

The colour-difference based metrics are relative measures that require a defined reference. These metrics refer to the colour difference (or chroma shift) between each sample in a set of colours under a test light source and a reference illuminant of the same CCT, and they were commonly developed for quantifying colour fidelity. In this work, nine colour-difference based metrics were included: CRI [23], CRI-CAM02UCS [24], Qa, Qf, Qp in CQS [13], CRI2012 [25], Rf in IES TM-30-18 [14], CPI [27] and ΔC* in CQS, used in [2,28]. Some of these metrics (e.g. CRI-CAM02UCS, Qf, CRI2012 and Rf) are updated versions of the CIE CRI, based on revised models and new experimental work. In the calculation of other metrics (e.g. Qa, Qp, CPI and ΔC*), chroma enhancement and colour shift were considered. It is important to note that, because of the necessity of defining a reference illuminant, only light sources with the same CCT can be compared using these metrics.

2.2 Relative gamut-based metrics

The relative gamut-based metrics (Qg [13] and Rg [14]) use the gamut area of a set of colour samples illuminated by the test light source. The values of these relative metrics are normalized by reference to the gamut area of the same colour samples viewed under a reference illuminant of the same CCT. Thus, similar to the colour-difference based metrics, values of these metrics for light sources with different CCTs cannot be compared because the normalization method is different.

2.3 Absolute gamut-based metrics

A major difference between absolute gamut-based metrics (GAI [29], CDI [30], CSA [31], GVI [12] and FCI [32]) and relative gamut-based metrics is that the absolute measures do not rely on a reference light source of the same CCT. Thus, values of these absolute metrics for light sources with different CCTs are comparable. To be specific, GAI, CDI and FCI use constant reference illuminants while CSA and GVI are not reliant on any reference illuminant.

2.4 Memory based metric

The MCRI metric quantifies the colour rendition of a light source by comparison between the rendered colours of real familiar objects (e.g., fruits, flowers, skin, neutral grey) and the associated memory colours of the observer [4], and it is described by empirically derived similarity functions. The validity of the MCRI in colour preference prediction has been demonstrated by Smet et al. [1,22].

2.5 Combined metrics

Recently, researchers have agreed that it is insufficient to describe the colour quality of lighting with only one metric and thus, there is a growing trend towards the use of combined metrics. Three combined metrics were tested in the present study: the GAI-CRI, CQI and CQI’. In 2008, Rea et al. first raised the idea of characterizing the colour quality with both the GAI and the CRI, and proposed the optimum value ranges of those metrics for the overall performance of discrimination, naturalness and vividness [15]. Later, Smet [1], Jost-Boissard [47] and Liu [12] used the arithmetic mean of CRI and GAI to quantifying colour preference, although the implied equal weighting of the two metrics was selected arbitrarily. The Colour Quality Indices (CQI and CQI’) were developed by Khanh et al. [2,28]. where CQI is a linear combination of CCT and MCRI and CQI’ is a linear combination of CCT, MCRI and ρC*.

2.6 Whiteness metrics

The whiteness of lighting refers to the colour of a light that does not provide any hue sensation (i.e. neither red nor green, nor yellow nor blue). Three metrics, Sneutral [33], White Sensation (WS) [34] and Percent of tint [35] were tested in this study. In earlier work, based on visual experiments and empirical data, it was found that Sneutral and WS correlated well with the judgement of subjective whiteness, and the whiteness of lighting was found to be closely associated with colour preference, as long as the test light sources were of obviously different whiteness values (i.e. different chromaticities) [45,46].

2.7 Other metrics

In addition to the metrics described above, a further two measures were included in this study. The first is Duv, which quantifies the distance of the u’v’ chromaticity of the test light source from the Planckian locus at the same CCT. The second is the Daylight Spectrum Index (DSI) [36], a metric that assesses the affinity of light sources with daylighting by comparison of the spectral power distributions.

3. Psychophysical data

For this work, 39 sets of data from 19 published psychophysical studies of colour preference were collected [5,6,8,18,4345,4758]. These data contain the Spectral Power Distributions (SPDs) of the test light sources in each study, together with the corresponding average colour preference ratings given by the observers. A brief description of these visual studies can be found in Supplement 1, Table S1, together with the methods used to derive the data for analysis in this study.

Note that, for the experiments where multiple objects were consecutively observed under a defined group of light sources in a light booth, one group of visual data here refers to the average values of colour preference ratings of all the experimental objects and observers for each test light source. This approach avoids overweighting the results of studies which had multiple objects. As demonstrated in our recent work [3], the average colour preference ratings of observers for test light sources with different objects (i.e. lighting scenarios) were consistent in terms of rank order due to the cognitive mechanism during consecutive visual judgement in light booths.

Specifically, amongst the collected published data, 9 groups of visual experiments from 7 studies [5,6,8,43,44,47,52] have such a feature, and they include 56 lighting scenarios in total. For instance, in the work of Jost-Boissard et al. [44]., two groups of experiments were carried out (light sources with CCT values of 3050 K and 3950 K, respectively) with four lighting scenarios (fruits and vegetables with different colours) in each group. In that work, the average preference ratings of observers for the test light sources within each group were consistent in rank order [3], which indicated that the impact of the lighting on the observer was much stronger than that of the objects. Furthermore, it has been found that, in such work, the Pearson correlation coefficients between the average scores of scenarios in each group, and the scores of individual scenarios, were very high (r > 0.95). Similarly, for all the 56 lighting scenarios involved in this study, only 4 scenarios exhibited moderately high correlations (0.68 < r < 0.87, with a mean of 0.77) between average and individual scores, while for the other 52 scenarios, the correlations were exclusively very strong (r > 0.92). Therefore, it seems safe to conclude that the average scores of lighting scenarios are not only representative of the colour quality of the test light sources, but also effective and essential in avoiding data bias. For instance, in recent work, 19 different objects were evaluated under the same group of light sources and the results were very similar [5,6]. If those data were related to all 19 lighting scenarios, they would be overemphasized.

As shown in Table S1 in Supplement 1, the collected visual data were divided into two groups according to the CCTs of the test light sources. In addition, the data were also divided according to the viewing environment (i.e. viewing in rooms as opposed to viewing in light booths) of each test. There is also the potential for other factors to impact on the subjective results of the observers, including the time course of chromatic adaption, consistency of chromaticities, illuminance level, colour rendition properties, etc. Detailed analysis and discussion will be provided in the following section.

4. Results and discussion

4.1 Overall analysis

The performance of 25 test metrics applied to group of visual data is illustrated in Fig. 1. The abscissa refers to the metrics which are grouped by their calculation protocol, as described in Section 2 above. The ordinate denotes the psychophysical studies, grouped by CCT type (constant CCT or multiple CCTs) and viewing environment (light booth or room). The values of the Pearson correlation coefficients are denoted by colour, with red for strong and positive correlation, blue for strong and negative correlation and green for no correlation. Note that in Wei et al. (2014) [49] only two light sources were tested and those sources had identical CRI-CAM02UCS, Qf and GAI-CRI values. Mathematically, the Pearson correlation coefficients between values of these three metrics and the preference scores could not be calculated. Thus, the corresponding correlation are denoted in black in Fig. 1.

 figure: Fig. 1.

Fig. 1. Pearson correlation coefficients between predictions of 25 test metrics and average observer colour preference ratings in each group of visual experiment. C and M respectively denote constant and multiple CCTs. B and R respectively denote light booth and room.

Download Full Size | PDF

It is clear from Fig. 1 that no metric performed well for all the visual experiments and their performance varied with CCT value, test environment, as well as their colorimetric rationale (i.e. calculation protocols). These results highlight the need to evaluate the performance of the metrics based on multiple visual results since good performance for a single condition does not guarantee acceptable accuracy for other conditions.

Table 1 summarizes the overall performance of each test metric. In this table, $\bar{r}$ denotes the weighted average Pearson correlation coefficient, a measure that represents the overall correlation based on the combination of individual correlations between metric predictions and subjective preference scores. This measure was calculated by the equation proposed by Hunter and Schmidt [21], which provides an underestimated, and hence cautious, prediction for data with medium to large correlation coefficients [59].

$$\bar{r} = \sum\nolimits_{i = 1}^K {{N_i}{r_i}} /\sum\nolimits_{i = 1}^K {{N_i}}$$

In Eq. (1), ${r_i}$ denotes the individual correlation coefficient for a given metric in a given visual experiment, ${N_i}$ is the weighting factor for each group of psychophysical data, K describes the number of visual data ratings in the study (K = 39). Note that, in former work, Smet et al. [1]. and Liu et al. [12]., only the number of observers was used to define ${N_i}$. In this work, the number of light sources has also been considered since the increase in the number of test sources could influence the validity of results. In fact, it is noticeable that in the current literature, there is a growing trend to use a large number of light sources [50,51,5558]. Thus, in Table 1, the results of the use of different weighting schemes are shown, with different values of ${N_i}$ (i.e. ${N_i}$ equal to a constant, equal to the number of observers, or equal to the product of the number of observers and the number of light sources).

Tables Icon

Table 1. Weighted average Pearson correlation Coefficients between metric predictions and the preference scores of individual studiesa

The p-value in Table 1 denotes the statistical significance of ${H_0}:\bar{r} = 0$ (no correlation) calculated according to the procedure described by Smet et al. [3]. For a given metric, the difference amongst the individual correlation coefficients in the 39 groups of visual studies was quantified by the variance of the population correlation $\sigma _\rho ^2$, which was computed by subtracting the sampling error variance $\sigma _e^2$ from the variance of the sample correlation $\sigma _r^2$, following the procedure of Hunter and Schmidt [15]. The symbols in Eqs. (2)–(4) have similar meaning to those in Eq. (1), and $\bar{N}$ denotes the average value of ${N_i}$.

$$\sigma _\rho ^2 = \sigma _r^2 - \sigma _e^2$$
$$\sigma _r^2 = \sum\nolimits_{i = 1}^K {[{N_i}{{({r_i} - \bar{r})}^2}]} /\sum\nolimits_{i = 1}^K {{N_i}}$$
$$\sigma _e^2 = {(1 - {\bar{r}^2})^2}/(\bar{N} - 1)$$

Note that a large and positive value of $\bar{r}$ and a small value of p ($\alpha$=0.05/25 = 0.002 after correcting for multiple comparisons) refer to strong, significant and positive correlation, and a small value of $\sigma _\rho ^2$ indicates a stable and robust correlation amongst multiple cases.

As shown in Table 1, regardless of the weighting schemes (i.e. the definition of ${N_i}$ in Eq. (1)), the overall performance of GAI-CRI and CQI is always superior, which highlights the advantage of using combined metrics. The weighted average correlation coefficient of CQI is the optimum measure if the individual correlation coefficients are weighted by the number of observers, while for the other two weighting schemes, GAI-CRI exhibits the highest overall correlation. As stated earlier, the number of test sources in each case must also be taken into consideration. Thus, in the following, only the results that are weighted by the product of number of observers and the number of light sources is used. Note that, although weighting schemes have an impact on the results, that influence is not obvious, as shown in Table 1. Meanwhile, weighted correlation coefficients of approximately 0.7 do not indicate very strong correlation. Considering the large variations in the experimental settings among the various psychophysical studies however, as well as the statistical significance and robustness shown in Table 1, the overall correlation coefficients can be regarded as acceptable.

Apart from GAI-CRI and CQI, the weighted average correlation coefficients of Qp and MCRI are also acceptable, with $\bar{r}$ values equal to approximately 0.6. Generally, the weighted average correlation coefficient of Qp is higher while the robustness (stability of individual correlations, quantified by $\sigma _r^2$) of MCRI is stronger, as shown Table 1. It can be seen from Fig. 1 that both Qp and MCRI performed poorly in psychophysical studies with light sources of multiple CCTs. Such a problem is particularly severe for Qp and that is why its robustness is weaker than that of MCRI.

The relatively good performance of GAI-CRI, CQI, Qp and MCRI is strengthened by the comparison of correlation coefficients method proposed by Meng et al. [60]. The null hypothesis of such an analysis is that the two correlation values being compared are equal and by calculating the two-tailed p-values, users could accordingly accept (p > 0.05) or reject (p < 0.05) the null hypothesis, and thus find the metric with stronger correlation. Specifically, in this study the method of Meng et al. was used to determine whether the correlation between the preference scores and metric predictions of GAI-CRI, CQI, Qp and MCRI, were statistically significantly different from the correlations of the preference scores and other metric predictions. It was found that the performance of GAI-CRI is significantly superior to that of other metrics with the only exception that its advantage over CQI is not significant. Except for GAI-CRI, the correlations of CQI are significantly higher than those of any other metric. Similarly, the performance of Qp and MCRI is also significantly better than other metrics except for GAI-CRI and CQI (see Table S2 of Supplement 1 for detailed results).

Several important conclusions can be derived from the above results. First, many researchers agree that a combination of a gamut-based metric and a fidelity-based metric should provide a good description of the colour quality of a light source [16,38]. This opinion is strengthened by the good performance of GAI-CRI in predicting colour preference. Also, MCRI is an effective metric for quantifying colour preference, whose validity has been well demonstrated by its proposers [1,22] and confirmed by many of the visual experiments involved in this study, as illustrated in Fig. 1. As reported in our earlier work however [12], that measure did not work well for visual experiments that used sets of lamps with different values of CCT. In this work, based on the enlarged and updated psychophysical dataset, the advantage and disadvantage of MCRI is further explored. As noted earlier [12], the unsatisfactory performance of MCRI for multiple-CCT conditions might be ascribed to there being insufficient consideration given to chromatic adaptation during the visual experiments. Based on the comparison between MCRI and CQI (i.e. linear and weighted combination of MCRI and CCT) in Fig. 1, it seems that a simple weighted correction of CCT could effectively solve this problem. It is clear from Fig. 1 that Qp in CQS exhibits good performance for visual experiments with lamps of constant CCT but does not perform well when the CCT is a variable. Such results should be attributed to its rewarding of chroma increases and reliance of reference light sources, respectively. On one hand, Qp is essentially a colour fidelity metric with consideration given to chroma enhancement and thus, in a sense, it could be regarded as another form of “combined” metric. The success of Qp in constant-CCT conditions further consolidates the idea that colour fidelity and colour gamut co-determine the colour quality. On the other hand, unlike colour fidelity, colour preference is not necessarily associated with a reference source of the same CCT, as argued by some researchers [1,27,61]. Thus, the reliance of Qp on a reference light source actually limits its applicability and, as stated earlier, if sources have different CCTs, their Qp values are incomparable due to the inconsistency of the reference sources and that is the reason why Qp does not work in multiple-CCT conditions.

Apart from the four metrics described above, no other measures exhibit acceptable performance judged from the values of the weighted average correlation coefficient shown in Table 1. For certain specific cases, however, the performance of some of the other metrics is acceptably good. For example, the relative, gamut-based metrics (Rg and Qg) for visual tests carried out in a light booth with light sources of constant CCT, and the absolute gamut-based metrics (GAI, CDI, CSA) for all the tests conducted in a light booth. This result corroborates the idea that colour preference is closely related to chroma (gamut) enhancement [12,13,47,49]. In addition, it also raises a new question as to why, for the experiments conducted in rooms as opposed to light booths, the correlation between predictions based on gamut-based metrics and preference ratings is not high. This issue will be discussed in the next section.

In addition, in some instances the variation in the values of a test metric significantly impacts on its effectiveness. For instance, in our previous work, we demonstrated that the whiteness of lighting (i.e. Sneutral and WS) correlated well with colour preference as long as the test light sources were of very different whiteness values (i.e. chromaticities) [45,46]. This result is also shown by the data in Fig. 1. For cases with different chromaticities [8,18,43,45,48,52,53], there are high correlations between Sneutral and WS values and subjective preference scores, while for the cases with metameric lighting (light sources with almost identical chromaticities) [50,5558], the correlations were very weak. This finding highlights again the deficiency of conclusions drawn from a single-case study, since in a given experiment the influence of a certain metric might be intentionally or unintentionally masked or overemphasized by the specific experimental setup. As for the performance of Duv, similar results could also be found. It is widely reported that negative Duv values are preferred by observers [3943] so in Fig. 1, bluer colours are to be expected for that metric.

Except for Qp, the colour-difference based metrics (CRI, CRI-CAM02UCS, Qa, Qf, CRI2012, Rf, CPI and ΔC*) did not perform well and this can be attributed to their original colour rendition intent (to quantify colour fidelity rather than colour preference) as well as to their reliance upon reference illuminants.

4.2 Detailed analysis

To further investigate how experimental setup influences the performance of the test metrics, the psychophysical dataset was divided into 4 groups, according to the CCT type and the viewing environment of each individual study, and then analyzed separately. It should be noted that some confounding variables were not strictly controlled so it is not always wise to simply ascribe the difference of metric performance to CCT/environment combinations. This grouping method was used only because it provided more detailed information on the metric performance and thus facilitated further investigation and comprehensive analysis.

Figure 2 shows that the relative performance of most metrics varies with experimental setup. For each kind of experiment, it seems that the optimal weighted average Pearson correlation coefficient converges on a certain limit. For example, weighted $\bar{r}$ values are no larger than 0.91 for light booth experiments with multiple CCTs and the weighted value $\bar{r}$ is no greater than 0.84 for constant-CCT experiments in a booth. As we believe, aside from the precision of metrics, such limits were also co-determined by the inconsistency of the visual data from the different experiments, as well as the uncertainty associated with psychophysical tests.

 figure: Fig. 2.

Fig. 2. Performance of 25 test metrics under four different experimental conditions, quantified by weighted average Pearson correlation coefficients between predictions using the metrics and average experimental preference scores.

Download Full Size | PDF

4.2.1 Constant CCT vs multiple CCTs

Among the 17 groups of visual data acquired in light booths, 11 groups used light sources with constant values of CCT and the other six groups adopted multiple CCT light sources. As shown in Fig. 2, comparison between the performance of the various metrics under these two conditions (blue bars vs red bars) provides interesting results. First, it is clear that relative metrics (from CRI to Rg on the abscissa) did not perform well when the value CCT varied. As noted earlier, this poor performance could be mainly attributed to the dependence on a reference light with the same CCT as the test source. Due to the difference in colour rendition intent, the performance of most of the colour-difference based metrics is still poor even for the conditions with constant CCT. On the contrary, metrics that consider chroma enhancement (Qp, △C*, Rg, Qg) exhibit good performance for constant CCT conditions, which strengthens the idea that colour preference correlates with colour gamut.

The superior performance of GAI, CDI and CSA for both conditions not only consolidates this opinion, but also demonstrates the advantage of absolute gamut metrics when evaluating the colour preference of lighting. In brief, colour fidelity is a relative concept with regard to a “reference” while colour preference should be independent of any reference other than colour memory [4]. It is noteworthily that not every absolute gamut-based metric shows acceptable performance. As shown above, the performance of FCI and GVI is not good. This failure might be ascribed to the improper selection of colour samples when calculating these metrics. Furthermore, consistent with the above discussion, MCRI performed well when the test light sources were of constant CCT whereas when CCT differed, it performed badly, which is possibly due to an inaccurate chromatic adaption transform embedded in the algorithm. Again, such an issue could be addressed by CQI, with a simple CCT-correction.

In addition to this, the performance of GAI-CRI was good for both conditions. For Sneutral and WS, they only performed well when the CCT differed, which could be explained by the previous statements with regard to the variation of values of the metric. For Duv, similarly, it did not perform well since many psychophysical tests used light sources of constant Duv values [50,5558], thus the impact of this measure was weakened.

Finally, it is shown that the accuracy of DSI in predicting colour preference is acceptable for multiple CCT conditions. A possible explanation of this is that such a measure indicates the proximity of the test SPD to the SPD of CIE illuminant D65. Therefore, light sources with higher CCTs, which are reported to be preferred by observers [5,6,8,18,48,52], generally correspond to higher DSI values.

Obviously, the differences between the two lighting conditions noted above should be mainly attributed to the distinction between experimental light sources (especially CCT types), since the variations among different viewing environments have been minimized by using neutral light booths.

4.2.2 Viewing environment and chromaticity consistency

For the 31 groups of data with light sources of constant CCT, 11 groups of experiments were conducted in light booths while the other 20 groups in rooms. As illustrated in Fig. 2 (red bars), many metrics performed well (weighted $\bar{r}$ values range from 0.7 ∼ 0.8) in light-booth experiments, including Qp, △C*, Rg, Qg, GAI, CDI, CSA, MCRI, GAI-CRI, CQI and CQI’. Specially, several gamut-based metrics (i.e. Rg, Qg, GAI, CDI, CSA) exhibited good performance, which confirms the finding that chroma enhancement promotes colour preference. As for the experiments conducted in rooms however, (solid line with dots), only the performance of Qp, CRI2020, MCRI, GAI-CRI and CQI can be regarded as acceptable (weighted $\bar{r}$ > 0.6). Thus, the most noticeable distinction between the two lighting conditions is that gamut-based metrics do not work well in experiments in rooms. This result can also be seen in Fig. 1.

Two confounding factors both lead to this result. First, the viewing environment influences visual perception. When observing the illuminated object(s) in light booths, it was easy for observers to concentrate on the colour of the objects in such a simple and stable environment. When the psychophysical tests were carried out in rooms, however, it is unavoidable that many contextual factors, as well as colour, had an impact on the visual colour perception of the observers. For example, the direction and distribution of the lighting, the specific scene (kitchen, living room, dining room, etc.) and the overall related ambience, etc. Therefore, it is possible that in room-experiments, the positive effect of chroma enhancement on preference judgement is weakened. In addition to the influence of the viewing environment, the chromaticity consistency of the light sources used in room experiments might be an influence. That is, for the 11 groups of experiments conducted in a light booth, only two groups (Wei et al. (2017) [54] and Esposito et al. [55]) used sources of constant chromaticity, while for the 20 groups of tests in rooms, all adopted metameric sources. The advantage of using such sources is that it removes the impact of chromaticity and shortens the time required for chromatic adaptation when the light source is changed. It should be noted however, that to generate light sources with consistent chromaticity but different colour rendition properties, researchers usually have to make a compromise between colour fidelity and colour gamut. Thus, in the psychophysical studies carried out in rooms, many of the light sources had poor colour fidelity and a distorted colour gamut. According to past studies, excessive saturated colours also impairs pRef. [12,14,47,50,51,54]. This fact could well explain the failure of gamut-based metrics, the upgrade of colour fidelity metrics, as well as the success of the GAI-CRI metric for such cases.

Based on this discussion, it is necessary to conduct further experiments to clarify that the favorable impact of chroma enhancement on colour preference is based on the precondition of high colour fidelity. Such an issue should be further investigated by the accumulation of supplementary data (e.g. visual data obtained in light-booth experiments with light sources of consistent chromaticity and data acquired in a room with light sources of high colour fidelity and different chromaticities).

4.2.3 Study with multiple CCTs in the room

As indicated by the dash line (with triangles) in Fig. 2, only two groups of data derived in a room, with multiple CCTs were collected in the work of Khanh et al. [51]. Compared with other experiments, these two sets of data used light sources with more complicated colorimetric settings and thus posed a more severe examination of the test metrics. Specifically, the 36 sources had four different CCT values with nine different over-saturation levels at each CCT. The degree of over-saturation was quantified by values of △C*, which also implied different levels of colour fidelity and colour gamut. The 36 light sources were tested together in a “fixed random” order, and observers had to make judgments under the influence of multiple variables. Due to the compounding effect of the viewing environment and diversified colorimetric parameters, as well as the limited amount of visual data, it is not sensible to compare the results of this kind of experiment with others and attribute the difference to a specific factor. The results, however, could also be interpreted by the above theory in terms of colour fidelity and colour gamut. As illustrated in the dash line in Fig. 2, only MCRI, GAI-CRI and CQI exhibited a reasonable performance in this case, while for all the colour fidelity metrics and gamut-based metrics, the performance was poor. To further explain finding, the performance of four typical metrics is shown in Fig. 3.

 figure: Fig. 3.

Fig. 3. Correlations between subjective preference scores and values of ΔC*, GAI, CRI and MCRI in Experiment 1 of Khanh et al. [51].

Download Full Size | PDF

From Fig. 3(a) it can be seen that many of the light sources used in this study suffered from problems of over-saturation, i.e. the preference scores begin to decrease as the values of ΔC* become greater than four. Similarly, in Fig. 3(b) it can be seen that the over-saturation issue is present for all values of CCT. It can also be seen in Figs. 3(a) and (b) that the positive effect of CCT on preferences scores is still present amongst light sources with similar degrees of over-saturation. This could explain why the absolute gamut-based metrics outperformed the relative gamut-based metrics in this experimental situation. The colour fidelity metrics however, also show positive correlations since, as shown in Fig. 3(c), light sources with extremely low colour fidelity (i.e. excessively large colour gamut) were not preferred. In other words, those cases with a qualified colour fidelity property which avoided over-saturation, turned out to be a crucial factor for colour preference. Finally, as shown in Fig. 3(d), in spite of the confounding impact of CCT and degree of over-saturation, MCRI performed moderately well, with the largest MCRI values corresponding to optimal preference ratings. Thus, the superiority of MCRI can also be explained by the “gamut-fidelity” theory. From past studies, memory colours are always considered as being more saturated and preferred [6264] and too much saturation with poor colour fidelity thus contradicts colour memory.

Based on the above analysis, it can be concluded that the performance of color quality metrics is a direct function of experimental settings, notably the colorimetric properties (CCT, chromaticity, fidelity and gamut properties, etc.) of the test sources. Two existing metrics, GAI-CRI and CQI were found to perform well in spite of the large variations in the experiments represented in the psychophysical dataset. Furthermore, there seems to be strong evidence that the use of a combined colour fidelity metric and absolute gamut-based metric can lead to accurate prediction of colour preference.

4.3 Colour preference index based on meta-analysis

The results discussed above have demonstrated the superiority of combined metrics, especially those that include colour gamut and colour fidelity. Similar results have been reported in past literature [1,15,16] although often based on limited psychophysical datasets [2,28] or merely by theoretical assumption (e.g., the weights of the GAI-CRI were determined arbitrarily [1,12,47]).

This study, however, has tried to establish a combined metric based on a large-scale psychophysical dataset. The protocol is to define a metric, the MCPI (Colour Preference Index based on Meta-analysis), as a linear combination of two metrics chosen from the 25 test metrics. The proposed format is shown below, where M1, M2 are values of the colour quality metrics, and c1, c2 are weights. A linear combination protocol is adopted and, while nonlinear combinations might give better performance, it is simple and straightforward, and prevents potential problems with over-fitting.

$$MCPI = c1\ast M1 + c2\ast M2$$
$$c1 \in [0,1],c2 \in [0,1],c1 + c2 = 1$$

The optimum values of c1 and c2 in Eq. (5) were determined by fitting the parameters in the visual dataset with the calculated metrics with a criterion to maximize the weighted average Pearson correlation coefficient defined in Eq. (1), with ${N_i}$ equal to the product of the number of observers and the number of light sources. After testing all the pair-wise linear combinations of the 25 metrics and defining the possible values of c1 and c2 with a sampling step of 0.01, the optimal MCPI metric was determined, as is given in Eq. (7). In total, $C_{25}^2$× 101 = 32,825 weight schemes were tested, and the weighted average Pearson correlation coefficient increased from 0.73 (GAI-CRI) to 0.81.

$$MCPI = 0.62{Q_a} + 0.38CDI$$

The advantage of the proposed MCPI metric was further validated by the comparison of correlation coefficients method used by Meng et al. [59] According to this analysis, the performance of the MCPI is significantly better (p<0.00001) than any of other the metrics involved in this study. To present more detail, the metric performance of the MCPI and the four best metrics discussed earlier (i.e. GAI-CRI, CQI, MCRI and Qp) are compared in Fig. 4 where it can be seen the proposed MCPI outperformed the other four measures, especially for the cases that used constant CCT in rooms, and cases that used multiple values of CCT.

 figure: Fig. 4.

Fig. 4. Pearson correlation coefficient for the comparison between MCPI and the four best other metrics, GAI-CRI, CQI, MCRI and Qp for all datasets.

Download Full Size | PDF

In addition, the best linear combination of metrics when one of those metrics is the CRI, the de facto standard, is calculated with the result that a combination of 0.51CRI + 0.49CDI shows the best performance, with a weighted average Pearson r value of 0.76. Furthermore, if both CRI and GAI are used, the best weighting scheme is 0.42CRI + 0.58GAI, with a weighted average Pearson r value also equal to 0.76. This latter value is considered to be different from the simply-averaged values of GAI and CRI used before [1,12,47].

It should be noted that the proposed MCPI is again a combination of an absolute gamut-based metric (CDI) and a fidelity-based metric (Qa) and this result would seem to further consolidate this concept. After investigating the top-100 combinations of metrics according to Eq. (5) and (6), it was found that 85 are composed of an absolute gamut metric and a fidelity metric. In addition, for the 49 combined metrics with the highest weighted average correlations, all fall into this category. Therefore, we would like to mention again that a balance of colour gamut and colour fidelity promotes colour preference.

5. Conclusions

This work is based on 39 groups of visual colour preference data acquired from 19 published studies and the performance of 25 colour quality metrics was calculated for each group. The results indicate that, for most of the test metrics, their performance was greatly impacted by the experimental settings of the individual studies, especially by the colorimetric properties of the test light sources. This suggests that single-case studies might be limiting in their ability to provide a generally applicable result and highlights the advantage of the meta-analysis approach. According to the results of this study, combined metrics, especially those that combine an absolute colour gamut-base metric with a colour fidelity-based metric, are more suitable for assessing the colour preference of lighting. Amongst the current metrics, GAI-CRI and CQI exhibited the most robust performance. Qp and MCRI were also found to perform well in visual experiments where the light sources were of similar CCT, but when CCT was a variable, their performance was poor. A novel metric named MCPI is proposed. It was derived by fitting the large psychophysical dataset and optimizing the parameters in a linear equation. Thus, it is suggested that the increase in the size of the visual dataset, as well as more comprehensive analysis, leads to this result.

As in many similar analyses [1,12,46,47], the experimental variables amongst different cases were not strictly controlled. In the above discussion, the influence of CCT, chromaticity, viewing environment and colour rendition properties (e.g. colour gamut and colour fidelity) was analyzed at different levels. The impact of further variables, for example, illuminance level and time course of chromatic adaption, has not been discussed. Strict control of all variables has theoretical advantages in revealing potential “causal relationships” in this field of research. It must be recognized however, that conclusions drawn from such rigorous studies under specifically-defined conditions might not be applicable for other cases. For example, by strictly controlling the illuminance level in all experiments, this parameter could be essentially eliminated from the discussion, but it would be necessary to conduct all the experiments at a number of fixed levels of illuminance to make the resulting metric truly universal.

Thus, in this study the data available (See Supplement 1 Table S1) has been taken at face value and used to derive a linear combination of two metrics that has wide application. Despite this apparent limitation the psychophysical studies collected for this study cover a wide range of variations in experimental parameters, leading to a recommendation that the MCPI (Colour Preference Index based on Meta-analysis) be considered for further investigation and potential application.

Finally, it should be noted that, although a very large psychophysical dataset has been assembled as part of this work, the size of the dataset is still a limiting factor when taking into consideration the large number of variations amongst different scenes and their associated light sources. It should also be noted that, although the proposed MCPI outperformed other metrics in explaining the visual data, strictly speaking such accuracy can only be regarded as fitting accuracy. Thus, in future work, carefully designed and appropriately focused visual experiments are needed to further validate our results.

Funding

Young Talent Project of Wuhan City of China (2016070204010111); National Natural Science Foundation of China (61505149).

Acknowledgments

The authors would like to thank Peter Bodrogi, Sophie Jost-Boissard, Minchen Wei, Michael Royer, Fuzheng Zhang and Qing Wang for sharing their research data.

Disclosures

The authors declare no conflicts of interest.

Supplemental document

See Supplement 1 for supporting content.

References

1. K. Smet, W. R. Ryckaert, M. R. Pointer, G. Deconinck, and P. Hanselaer, “Correlation between color quality metric predictions and visual appreciation of light sources,” Opt. Express 19(9), 8151–8166 (2011). [CrossRef]  

2. T. Q. Khanh, P. Bodrogi, Q. Vinh, and D. Stojanovic, “Colour preference, naturalness, vividness and colour quality metrics, Part 1: Experiments in a room,” Lighting Res. Technol. 49(6), 697–713 (2017). [CrossRef]  

3. W. Chen, Z. Huang, Q. Liu, M. R. Pointer, Y. Liu, and H. Gong, “Evaluating the color preference of lighting: the light booth matters,” Opt. Express 28(10), 14874–14883 (2020). [CrossRef]  

4. K. Smet, W. R. Ryckaert, M. R. Pointer, G. Deconinck, and P. Hanselaer, “Memory colours and colour quality evaluation of conventional and solid-state lamps,” Opt. Express 18(25), 26229–26244 (2010). [CrossRef]  

5. Z. Huang, Q. Liu, S. Westland, M. R. Pointer, M. R. Luo, and K. Xiao, “Light dominates colour preference when correlated colour temperature differs,” Lighting Res. Technol. 50(7), 995–1012 (2018). [CrossRef]  

6. Z. Huang, Q. Liu, Y. Liu, M. R. Pointer, M. R. Luo, Q. Wang, and B. Wu, “Best lighting for jeans, Part 1: Optimizing colour preference and colour discrimination with multiple correlated colour temperatures,” Lighting Res. Technol. 51(8), 1208–1223 (2019). [CrossRef]  

7. J. He, Y. Lin, T. Yano, H. Noguchi, S. Yamaguchi, and Y. Matsubayashi, “Preference for appearance of Chinese complexion under different lighting,” Lighting Res. Technol. 49(2), 228–242 (2017). [CrossRef]  

8. Z. Huang, Q. Liu, Y. Liu, M. R. Pointer, A. Liu, P. Bodrogi, and T. Q. Khanh, “Gender difference in colour preference of lighting: a pilot study,” Light Eng. 28(04-2020), 111–122 (2020). [CrossRef]  

9. Y. Lin, M. Wei, K. Smet, A. Tsukitani, P. Bodrogi, and T. Q. Khanh, “Colour preference varies with lighting application,” Lighting Res. Technol. 49(3), 316–328 (2017). [CrossRef]  

10. A. Liu, A. Tuzikas, A. Zukauskas, R. Vaicekauskas, P. I. Vitta, and M. Shur, “Cultural preferences to color quality of illumination of different artwork objects revealed by a color rendition engine,” IEEE Photonics J. 5(4), 6801010 (2013). [CrossRef]  

11. Q. Zhai, M. R. Luo, and X. Liu, “The impact of illuminance and colour temperature on viewing fine art paintings under LED lighting,” Lighting Res. Technol. 47(7), 795–809 (2015). [CrossRef]  

12. Q. Liu, Z. Huang, K. Xiao, M. R. Pointer, S. Westland, and M. R. Luo, “Gamut Volume Index: a color preference metric based on meta-analysis and optimized colour samples,” Opt. Express 25(14), 16378–16391 (2017). [CrossRef]  

13. W. Davis and Y. Ohno, “Color quality scale,” Opt. Eng. 49(3), 033602 (2010). [CrossRef]  

14. A. David, P. T. Fini, K. W. Houser, Y. Ohno, M. Royer, K. Smet, M. Wei, and L. Whitehead, “Development of the IES method for evaluating the color rendition of light sources,” Opt. Express 23(12), 15888–15906 (2015). [CrossRef]  

15. M. S. Rea and J. P. Freyssinier-Nova, “Color rendering: A tale of two metrics,” Color Res. Appl. 33(3), 192–202 (2008). [CrossRef]  

16. R. Dangol, M. Islam, M. H. LiSc, P. Bhusal, M. Puolakka, and L. Halonen, “Subjective preferences and colour quality metrics of LED light sources,” Lighting Res. Technol. 45(6), 666–688 (2013). [CrossRef]  

17. International Commission on Illumination. http://cie.co.at/research-strategy.

18. Z. Huang, Q. Liu, M. R. Pointer, W. Chen, Y. Liu, and Y. Wang, “Color quality evaluation of Chinese bronzeware in typical museum lighting,” J. Opt. Soc. Am. A 37(4), A170–A180 (2020). [CrossRef]  

19. R. Lasauskaite Schüpbach, M. Reisinger, and B. Schrader, “Influence of lighting conditions on the appearance of typical interior materials,” Color Res. Appl. 40(1), 50–61 (2015). [CrossRef]  

20. Q. Liu, Z. Huang, B. Wu, Y. Liu, H. Lin, and W. Wang, “Evaluating Colour Quality of Lighting: Why Meta-analysis Is Needed,” in 15th China International Forum on Solid State Lighting (2018), pp. 1–4.

21. F. L. Schmidt and J. E. Hunter, Methods of meta-analysis: Correcting error and bias in research findings (Sage publications, 2014).

22. K. Smet and P. Hanselaer, “Impact of cross-regional differences on color rendition evaluation of white light sources,” Opt. Express 23(23), 30216–30226 (2015). [CrossRef]  

23. D. Nickerson and C. W. Jerome, “Color rendering of light sources: CIE method of specification and its application,” Illum. Eng. 12(1-2), 7 (2016). [CrossRef]  

24. M. R. Luo, “The quality of light sources,” Color. Technol. 127(2), 75–87 (2011). [CrossRef]  

25. K. Smet, J. Schanda, L. Whitehead, and M. R. Luo, “CRI2013: A proposal for updating the CIE colour rendering index,” Lighting Res. Technol. 45(6), 689–709 (2013). [CrossRef]  

26. H. Yaguchi, A. David, T. Fuchida, K. Hashimoto, G. Heidel, W. Jordan, S. Jost-Boissard, S. Kobayashi, T. Kotani, M. R. Luo, Y. Mizokami, Y. Ohno, P. Pardo, K. Richter, K. Smet, K. Teunissen, A. Tsukitani, M. Wei, L. Whitehead, and T. Yano, “CIE 224:2017: CIE 2017 Colour Fidelity Index for accurate scientific use,” Color Res. Appl. 42(5), 590 (2017). [CrossRef]  

27. W. Thornton, “A validation of the color-preference index,” J. Illum. Eng. Soc. 4(1), 48–52 (1974). [CrossRef]  

28. T. Q. Khanh, P. Bodrogi, Q. Vinh, and D. Stojanovic, “Colour preference, naturalness, vividness and colour quality metrics, Part 2: Experiments in a viewing booth and analysis of the combined dataset,” Lighting Res. Technol. 49(6), 714–726 (2017). [CrossRef]  

29. J. P. Freyssinier and M. Rea, “A two-metric proposal to specify the color-rendering properties of light sources for retail lighting,” Proc. SPIE 7784, 77840V (2010). [CrossRef]  

30. W. A. Thornton, “Color-discrimination index,” J. Opt. Soc. Am. 62(2), 191–194 (1972). [CrossRef]  

31. S. Fotios and G. Levermore, “Perception of electric light sources of different colour properties,” Lighting Res. Technol. 29(3), 161–171 (1997). [CrossRef]  

32. K. Hashimoto, T. Yano, M. Shimizu, and Y. Nayatani, “New method for specifying color-rendering properties of light sources based on feeling of contrast,” Color Res. Appl. 32(5), 361–371 (2007). [CrossRef]  

33. A. Kevin, D. Geert, and H. Peter, “Chromaticity of unique white in object mode,” Opt. Express 22(21), 25830–25841 (2014). [CrossRef]  

34. Q. Wang, H. Xu, and J. Cai, “Chromaticity of white sensation for LED lighting,” Chin. Opt. Lett. 13(7), 073301 (2015). [CrossRef]  

35. M. Rea and J. Freyssinier, “White lighting: A provisional model for predicting perceived tint in “white” illumination,” Color Res. Appl. 39(5), 466–479 (2014). [CrossRef]  

36. I. Acosta, J. León, and P. Bustamante, “Daylight spectrum Index: A new metric to assess the affinity of light sources with daylighting,” Energies 11(10), 2545 (2018). [CrossRef]  

37. Y. Ohno, “Practical Use and Calculation of CCT and Duv,” Leukos 10(1), 47–55 (2014). [CrossRef]  

38. K. W. Houser, M. Wei, A. David, M. R. Krames, and X. S. Shen, “Review of measures for light-source color rendition and considerations for a two-measure system for characterizing color rendition,” Opt. Express 21(8), 10393–10411 (2013). [CrossRef]  

39. E. E. Dikel, G. J. Burns, J. A. Veitch, S. Mancini, and G. R. Newsham, “Preferred chromaticity of color-tunable LED lighting,” Leukos 10(2), 101–115 (2014). [CrossRef]  

40. Y. Ohno and M. Fein, “Vision experiment on acceptable and preferred white light chromaticity for lighting,” in Proceedings of the CIE 2014 Lighting Quality and Energy Efficiency (2014), pp. 192–199.

41. Y. Ohno and S. Oh, “Vision experiment II on white light chromaticity for lighting,” in Proceedings of the CIE 2016 Lighting Quality and Energy Efficiency (2016), pp. 175–184.

42. X. Feng, W. Xu, Q. Han, and S. Zhang, “LED light with enhanced color saturation and improved white light perception,” Opt. Express 24(1), 573–585 (2016). [CrossRef]  

43. Y. Liu, Q. Liu, Z. Huang, M. R. Pointer, L. Rao, and Z. Hou, “Optimising colour preference and colour discrimination for jeans under 5500 K light sources with different Duv values,” Optik 208, 163916 (2020). [CrossRef]  

44. S. Jost-Boissard, M. Fontoynont, and J. Blanc-Gonnet, “Perceived lighting quality of LED sources for the presentation of fruit and vegetables,” J. Mod. Opt. 56(13), 1420–1432 (2009). [CrossRef]  

45. Z. Huang, Q. Liu, M. R. Pointer, M. R. Luo, B. Wu, and A. Liu, “White lighting and colour preference, part A: correlation analysis and metrics validation based on four groups of psychophysical studies,” Lighting Res. Technol. 52(1), 5–22 (2020). [CrossRef]  

46. Z. Huang, Q. Liu, M. R. Luo, M. R. Pointer, B. Wu, and A. Liu, “The whiteness of lighting and colour preference, Part 2: A meta-analysis of psychophysical data,” Lighting Res. Technol. 52(1), 23–35 (2020). [CrossRef]  

47. S. Jost-Boissard, P. Avouac, and M. Fontoynont, “Assessing the colour quality of LED sources: Naturalness, attractiveness, colourfulness and colour difference,” Lighting Res. Technol. 47(7), 769–794 (2015). [CrossRef]  

48. Q. Wang, H. Xu, F. Zhang, and Z. Wang, “Influence of color temperature on comfort and preference for LED indoor lighting,” Optik 129, 21–29 (2017). [CrossRef]  

49. M. Wei, K. W. Houser, G. R. Allen, and W. W. Beers, “Color Preference under LEDs with Diminished Yellow Emission,” Leukos 10(3), 119–131 (2014). [CrossRef]  

50. M. Royer, A. Wilkerson, M. Wei, K. Houser, and R. Davis, “Human perceptions of colour rendition vary with average fidelity, average gamut, and gamut shape,” Lighting Res. Technol. 49(8), 966–991 (2017). [CrossRef]  

51. T. Q. Khanh, P. Bodrogi, Q. T. Vinh, X. Guo, and T. T. Anh, “Colour preference, naturalness, vividness and colour quality metrics, Part 4: Experiments with still life arrangements at different correlated colour temperatures,” Lighting Res. Technol. 50(6), 862–879 (2018). [CrossRef]  

52. W. Chen, Z. Huang, L. Rao, Z. Hou, and Q. Liu, “Research on Colour Visual Preference of Light Source for Black and White Objects,” LNEE 600, 43–50 (2020). [CrossRef]  

53. Z. Huang, Q. Liu, M. R. Luo, M. R. Pointer, Y. Liu, Y. Wang, and X. Wu, “Whiteness and preference perception of white light sources at 5500 K with positive and negative Duv values,” Optik, submitted for publication.

54. M. Wei and K. W. Houser, “Systematic Changes in Gamut Size Affect Color Preference,” Leukos 13(1), 23–32 (2017). [CrossRef]  

55. T. Esposito and K. Houser, “Models of colour quality over a wide range of spectral power distributions,” Lighting Res. Technol. 51(3), 331–352 (2019). [CrossRef]  

56. F. Zhang, H. Xu, and H. Feng, “Toward a unified model for predicting color quality of light sources,” Appl. Opt. 56(29), 8186–8195 (2017). [CrossRef]  

57. M. Royer, A. Wilkerson, and M. Wei, “Human perceptions of colour rendition at different chromaticities,” Lighting Res. Technol. 50(7), 965–994 (2018). [CrossRef]  

58. M. Royer, M. Wei, A. Wilkerson, and S. Safranek, “Experimental validation of color rendition specification criteria based on ANSI/IES TM-30-18,” Lighting Res. Technol. 52(3), 323–349 (2020). [CrossRef]  

59. A. P. Field, Discovering Statistics Using SPSS (SAGE Publications, 2009).

60. X.-I. Meng, R. Rosenthal, and D. Rubin, “Comparing Correlated Correlation Coefficients,” Psychol. Bull. 111(1), 172–175 (1992). [CrossRef]  

61. D. B. Judd, “A Flattery Index for Artificial llluminants,” Illum. Eng. 62, 593–598 (1967).

62. C. Bartleson, “Memory Colors of Familiar Objects,” J. Opt. Soc. Am. 50(1), 73–77 (1960). [CrossRef]  

63. J. Pérez-Carpinell, D. de Fez, R. Baldov, and J. Soriano, “Familiar objects and memory color,” Color Res. Appl. 23(6), 416–427 (1998). [CrossRef]  

64. K. Smet, W. Ryckaert, M. R. Pointer, G. Deconinck, and P. Hanselaer, “Colour Appearance Rating of Familiar Real Objects,” Color Res. Appl. 36(3), 192–200 (2011). [CrossRef]  

Supplementary Material (1)

NameDescription
Supplement 1       Description of collected psychophysical data and detailed results of correlation comparison

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (4)

Fig. 1.
Fig. 1. Pearson correlation coefficients between predictions of 25 test metrics and average observer colour preference ratings in each group of visual experiment. C and M respectively denote constant and multiple CCTs. B and R respectively denote light booth and room.
Fig. 2.
Fig. 2. Performance of 25 test metrics under four different experimental conditions, quantified by weighted average Pearson correlation coefficients between predictions using the metrics and average experimental preference scores.
Fig. 3.
Fig. 3. Correlations between subjective preference scores and values of ΔC*, GAI, CRI and MCRI in Experiment 1 of Khanh et al. [51].
Fig. 4.
Fig. 4. Pearson correlation coefficient for the comparison between MCPI and the four best other metrics, GAI-CRI, CQI, MCRI and Qp for all datasets.

Tables (1)

Tables Icon

Table 1. Weighted average Pearson correlation Coefficients between metric predictions and the preference scores of individual studiesa

Equations (7)

Equations on this page are rendered with MathJax. Learn more.

r ¯ = i = 1 K N i r i / i = 1 K N i
σ ρ 2 = σ r 2 σ e 2
σ r 2 = i = 1 K [ N i ( r i r ¯ ) 2 ] / i = 1 K N i
σ e 2 = ( 1 r ¯ 2 ) 2 / ( N ¯ 1 )
M C P I = c 1 M 1 + c 2 M 2
c 1 [ 0 , 1 ] , c 2 [ 0 , 1 ] , c 1 + c 2 = 1
M C P I = 0.62 Q a + 0.38 C D I
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.