Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Optical characterization of the psychophysical surface gloss space in the presence of surface haze

Open Access Open Access

Abstract

An unanimous framework for surface gloss perception does not exist yet, although an orthogonal two-dimensional gloss space (distinctness-of-image and contrast gloss) has already been proposed. Haze, a rarely studied attribute of glossiness, is considered in this study and introduced into this existing space. A set of samples with surface haziness is psychophysically and optically characterized. The perceived glossiness correlates with the visual dimension of contrast gloss, and with metrics based on “contrast” - between the specular highlight and sample background – obtained from measurements with different optical instruments. The suitability and advantages of the image-based gloss meter (iGM), introduced earlier by the authors [J Coat Technol Res. 19, 1567 (2022) [CrossRef]  ], are thereby illustrated in comparison to the conventional gloss meter standards.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The visual appearance of products and surface finishes tremendously influences our product judgements, quality impressions, and purchase decisions. As a result, manufacturers from various industries have a high interest in predicting and optically characterizing how people will perceive their end products. In 2006, the Commission Internationale de l’Eclairage (CIE) introduced colour, gloss, texture and translucency as the four basic attributes for the evaluation of visual appearance, of which colour and gloss have been receiving the most attention [1]. Three-dimensional colour spaces with attributes of hue, lightness and chroma are widely known. A similar space for gloss is however not available yet, neither is there a consensus on its dimensions [2,3]. The authors of this study presented five attributes for the optical characterization of surface gloss, based on the initial framework introduced by Hunter [4]: specular gloss (perceived brilliance of the specular reflected highlights), DOI (Distinctness Of the reflected Image), haze (semi-specular reflection adjacent to reflected images), contrast or contrast gloss (contrast between specular reflecting areas and adjacent areas) and surface-uniformity gloss – consisting of gloss unevenness (gloss non-uniformities across a surface), surface texture (visible periodic structures on a surface) and orange peel (surface irregularity resembling the skin of an orange) [5]. These attributes are however largely interdependent, and many studies have revealed that two to four attributes are typically sufficient to describe the perceived glossiness in controlled circumstances [2,6]. In 1987, Billmeyer et al. presented a first step towards a psychophysical space for gloss by determining two basic dimensions for the glossiness of painted surfaces, namely DOI (cf. Hunter) for high gloss surfaces and contrast gloss – described as the relative perceived brightness difference between areas with specular reflection and areas relatively far from the specular direction – for low gloss surfaces [7]. In 2001, Ferwerda et al. obtained a gloss space for rendered achromatic spheres with the same two independent dimensions [3,8]. Furthermore, they concluded that contrast gloss was a general gloss dimension which – besides matte surfaces – also influenced the perceived glossiness of mid and high gloss surfaces. Also Di Cicco et al. obtained a two-dimensional gloss model using metrics for the blurriness (inverse of sharpness, cf. DOI) and contrast (cf. contrast gloss) for the perceived glossiness of 17th century paintings [9]. Leloup et al. derived a model for contrast gloss (the Contrast Gloss CG formula of Eq.(1), expressed in Contrast Gloss Units CGU) as the difference between the (compressed) highlight luminance ${L_i}$ and the (compressed) background luminance ${L_b}$. They further indicated that CG actually described the perceived brightness of the reflected highlight with luminance Li, presented against the background with luminance Lb [10]. The formula was initially developed for high gloss (glass) samples having a good perceptual DOI, with back sides painted in different achromatic colours, but their subsequent study revealed that Eq.(1) could generally describe the perceived glossiness of samples with a similar perceptual DOI, as long as the reflected highlight is distinguishable [11].

$$CG = \; 28L_i^{1/3} - \; 21L_b^{1/3}\; \; [{CGU} ]$$

Many alternative dimensions and models have been presented in various gloss studies, such as the inclusion of “coverage” (the size of the highlight) and “depth” (perceived depth due to binocular disparity) by Marlow et al. [12]. Wills et al. for example suggested a two-dimensional space consisting of diffuseness – the perceived brightness of diffusely reflecting areas – and contrast gloss for computer generated pictures of (blob-shaped) objects [13]. Toscani and colleagues expanded the study from Wills et al. and concluded on three perceptual dimensions for achromatic surface reflection: “lightness” (diffuse reflections), “gloss” (high contrast sharp specular highlights) and “metallicity” (more broad specular reflections) [14]. Their results suggest that observers could tell apart diffuse and specular reflections, and “gloss” would be perceived almost independent from “lightness” (weak negative correlation). This contradicts with the contrast gloss model.

Here, we concentrate on the first mentioned two-dimensional gloss space (DOI and contrast gloss) and further consider the haze attribute. Reflection haze is applicable to many products, from an unwanted effect to intentional surface haziness. Some practical examples of hazy surfaces are rough base materials with a clear varnish (e.g. metallic vehicle paints), mirroring surfaces with a layer of dirt or grease on top, and partially polished materials [15]. Haze, also referred to as bloom, has various definitions. In the American Society for Testing and Materials (ASTM) Standard E284, haze is described as: (1) “scattering of light at the glossy surface of a specimen responsible for the apparent reduction of contrast of objects viewed by reflection at the surface”; (2) “cloudy appearance attributable to light scattering”; or (3) “percent of reflected light scattered by a specimen having a glossy surface so that its direction deviates more than a specified angle from the direction of specular reflection” [16]. Similar definitions are available from CIE [17]. The first two definitions describe perceptual haziness, while the third definition explains the standardized instrumental evaluation method for surface haze. Reflection haze is mentioned as an important gloss attribute in various studies (e.g. Hunter [4], Billmeyer et al. [7], Wills et al. [13], Ged et al. [18]), but the number of psychophysical studies considering haziness is very limited. Vangorp et al. studied haze and provided evidence that it is a distinct perceptual dimension of surface gloss. They introduced “halo energy” as a new metric for the perceived haziness. It is calculated from the Bidirectional Reflectance Distribution Function (BRDF), which describes the full reflection behavior of a surface, as the remaining energy surrounding a simple BRDF model fitted to the central BRDF peak [15].

The main goal of the current work is to study the gloss space and its optical evaluation for hazy samples. Therefore, a set of glass samples with various levels of haziness, of which the back side is painted black or white, is selected and studied. Following the ASTM definitions of haze, the authors believe that a good perceptual DOI is required before one can consider the haze attribute. This approach is supported by Vangorp et al. who represented haze as a bloom or halo around sharp reflections [15]. Accordingly, all the selected experiment samples have a clear and sharp reflected light source image. A psychophysical experiment based on the method of paired comparison is conducted and psychometric scales are developed for the overall glossiness and its two main perceptual attributes, defined as sharpness (cf. DOI) and brightness (cf. contrast gloss) of the reflected highlight. The interdependency of the obtained attribute scales is investigated, as to evaluate the suitability of the beforementioned gloss space in describing hazy samples. Furthermore, the scales are correlated to optical measurements with various dedicated measurement instruments: an Imaging Luminance Measurement Device (ILMD) – TechnoTeam LMK - for the measurement of CG of Eq. (1); a commercial gloss meter – Rhopoint Instruments IQ [19] - for the evaluation of standardized gloss metrics; and the image-based gloss meter (iGM) [5] – developed by the authors of this study – for the evaluation of their novel gloss metrics. In the end, a framework for the measurement of the gloss space dimensions with the iGM is proposed, and discussed in relation to the existing standardized metrics for gloss evaluation.

2. Methods

2.1 Psychophysical scaling experiment

2.1.1 Experiment setup

The experiment setup consists of the light booth presented in Fig. 1. It contains a long rectangular box with four Philips T8 light sources (1.20 m long, a luminous flux of 3350 lm per light source and a correlated color temperature of 4000 K), and has a 1000 mm long rectangular exit aperture with a width of 5 mm. The internal walls of the box are painted in a matte white finish and a diffuser plate is placed inside the box against the aperture, in favor of a good luminance uniformity. The outside of the box is painted black for a maximum contrast between the aperture and its surround. A light baffle blocks the direct visibility of the light source for the observer. The average aperture luminance is about 6600 cd.m-2, while the average sample illuminance (measured in the direction perpendicular to the sample surface) is approximately 35 lux. The illumination and viewing angles are both 60° with respect to the sample normal direction. The distances between the light source, the sample surface, and the observer chin rest are each approximately 50 cm. For accommodation at 50 cm (on the sample surface) and 100 cm (on the light source), the human eye Depth-of-Field (DoF) is approx. 40 to 70 cm and 70 to 190 cm, resp. Accordingly, the sample surface and the light source image were not simultaneously within focus, and eye accommodation of the participants was required to collect cues on both focus levels [20]. The experiment samples are mounted in a cartridge with eight sample slots, allowing four comparisons to be made per cartridge. This cartridge is inserted and slides between a base plate and a black top plate with two rectangular windows (of 33 mm x 100 mm). As such this allows the comparison of one sample pair at the time. A keyboard is available to the observers with five response buttons. During the experiment, the observers are seated and can freely move their head in the neighborhood of a chin rest, which allows them to view the movement of the specular light source image relative to the surface. Observers start each observation with their chin on the chin rest.

 figure: Fig. 1.

Fig. 1. Schematic and picture of the experiment setup. (1. Light source box with diffuser; 2. Light baffle; 3. Presented pair of experiment samples; 4. Observer; 5. Response keyboard)

Download Full Size | PDF

 figure: Fig. 2.

Fig. 2. Picture of the ten experiment samples imaged on the experiment setup (at the chin rest) of Fig. 1. A high perceptual DOI is observable for each sample.

Download Full Size | PDF

2.1.2 Samples

Ten experiment samples were selected from a set of custom prepared glass samples (Pilkington Optiwhite and standard float glass), etched and partially polished to different levels of surface glossiness and haziness. The haziness of the sample set, displayed in Fig. 2, ranges from severe haziness (e.g. samples 1-4) to untreated glass without haze (samples 9-10). All samples have a high perceptual DOI, meaning that the reflected rectangular light source image is not deformed and its edges are clearly visible and considerably sharp (see Fig. 2). The back side of the first nine glass samples is painted black, which eliminates the back reflection of light rays transmitted through the bulk material. Sample 10 is however painted white, in order to obtain two untreated samples (samples 9 and 10) without haze and with a highly different contrast. Leloup et al. used such untreated samples to develop the Contrast Gloss formula [10], so their glossiness should be accurately described by Eq. (1). The specular gloss of the samples was measured with a specular gloss meter at 60° and ranges from 65 Gloss Units (GU) to 100 GU.

2.1.3 Experiment procedure

Various psychophysical scaling methods for physical samples are available, such as ranking, paired comparison or rating [21]. Ranking experiments require observers to rank a set of samples according to a criterion, and deliver an ordinal scale – the rank order of the samples. Paired comparison experiments require observers to compare one sample pair at a time, resulting in a more elaborated experiment design and a practical limit to the total amount of samples. Yet an interval scale can be obtained, which also incorporates the relative distances between the samples from the probabilities of choosing one sample over the others. In elaborated forms, such as the Maximum Likelihood Difference Scaling (MLDS) procedure, triads or quadruples are used and compared, where differences between three or four samples, respectively, are judged in each scene [22]. The experiment design is however considerably complex for physical samples. In rating experiments, observers assign a rating to a presented sample. The method requires many repetitions for the development of a consistent scale.

In this work, interval scales are created for a set of samples using the paired comparison procedure. As such, both the experiment design as the observer tasks remain relatively simple, and various statistics and scale processing methods are available for this method: intra- and inter-observer consistency, scale subtractivity (see further), and standard error (SE) estimates of the scale points. The experiment design used by Leloup et al. was selected [11]. This method combines a two-alternative forced-choice (2AFC) paired comparison – where observers have to choose between two alternatives based on a certain criterion – with the method of Scheffé [23] – which requires observers to rate the magnitude of the difference between the two alternatives. For each displayed pair of stimuli, observers are instructed to evaluate the attributes by indicating whether “the left stimulus is much more”, “the left stimulus is more”, “both stimuli are equal”, “the right stimulus is more”, or “the right stimulus is much more”, using the buttons as displayed in Fig. 1. When indicating that both stimuli are equal, the observer is instructed to still point out one. In this sense, 2AFC data is available for each pair of stimuli.

There are two experiment sessions. First observers are asked to judge the “glossiness”, while the second session comprises a sequential evaluation of the attributes “sharpness” and “brightness”. Observers are not aware of the task of the second session or any notions of brightness or sharpness when performing the first part. With this approach, it is attempted to not confuse or influence the observer with possible cues for glossiness judgement. No definition of glossiness is given. Instead, the observers’ understanding of glossiness is evaluated with a short ranking task for a set of seven unlabeled Natural Color System (NCS) gloss scale samples (black samples with 60° specular gloss between 2 GU and 95 GU) in a light booth with a linear specular light source. The two attributes judged during the second session were defined according to the two orthogonal dimensions in the gloss space proposed earlier. Sharpness refers to the perceived sharpness or distinctness of the edges of the reflected light source image, akin to DOI. Brightness is defined as the perceived brightness of the reflected rectangular light source, which relates to the CG formula of Leloup et al. [11]. As to avoid any bias or misinterpretations, observers were specifically told to compare the sharpness of the edges of the reflected light source image for sharpness, and the brightness of the region inside of the rectangular light source image for brightness. Before each session, observers performed a short training in order to get familiar with the experiment procedure and experiment samples. During the sessions, observers judged all possible 45 combinations (sample pairs) of the ten experiment samples in an optimal presentation order designed by Ross [24]. This design ensures that all stimuli are compared to one another (all 45 combinations), that every sample appears as often on the left as on the right side during the experiment (although perfect balancing is not possible for ten samples), and that the time gap between two consecutive presentations of the same sample is maximized. Furthermore there are at least three sample pairs in between pairs involving a same stimulus. Consequently, the experiment could be conducted in groups of four pairs consisting of eight different samples (thus full sample cartridges). In the second session, the observers first judged the sharpness for four pairs (one cartridge), and subsequently the brightness of those pairs. This approach reduces the sample handling effort, while having three judgements between the sharpness and brightness observations of identical pairs. Each session ends with a repetition of seven randomly selected sample pairs, used to check the observer repeatability (see next section), influences of the presentation side (left versus right position) and observer fatigue. Between sessions or observers, the sequence of sample pairs and/or their sides (left and right position of each pair) is inversed. The experiment process and data collection is guided by custom algorithms coded in MATLAB 2021.

The total experiment duration is approx. one hour per observer. Both experiment sessions were performed by 16 observers, of which two (authors) were familiar with glossiness and with the background of the experiment. Most participants were undergraduate students or PhD researchers, or university staff. Ten participants originated from Europe (six from Belgium), six participants had a non-European nationality. There were five female participants. All observers had normal or corrected-to-normal visual acuity. Each observer signed an informed consent document prior to their participation, which contained information on the research goals, the experiment activities and the participant’s rights.

2.1.4 Experiment data analysis

All statistics, scale evaluations and modelling are performed in MATLAB 2021. The average observer repeatability is first determined from the seven repeated sample pairs of each observer. It is calculated as the probability that observers make the same sample choice in the repeated pairs compared to their corresponding answers during the experiment. Next, the intra- and inter-observer consistency values are determined – from the 2AFC data – via the methods documented by Kendall et al. [25]. The inter-observer consistency is determined with the coefficient of agreement u of Eq. (2), where $\Sigma $ is the sum of the number of agreements between all possible pairs of observers, m the number of observers, and n the number of samples. The corresponding statistical test with a null-hypothesis of randomly judging observers uses the test value ${\chi ^2}$ from Eq. (3) with a Chi-square distribution with $\nu $ degrees of freedom. The inter-observer consistency is considered acceptable if the p-value of this test is smaller than 0.05 (less than 5% chance that the number of agreements could be reached or exceeded). The intra-observer consistency $\zeta $ for an observer is calculated using Eq. (4), with d the number of circular triads in the experiment responses and n the amount of samples (see [25] for more information). A circular triad occurs when the paired comparison responses result in a circular sample ranking, for example ranking (1;2;3;1). The distribution of the probability that a randomly answering observer exceeds a number of circular triads was deduced by Alway et al. for ten samples [26]. Kendall et al. recommends to set the maximum permitted amount of circular triads to a 5% probability that randomly answering observers would not exceed this limit, which corresponds to 22 circular triads and a minimum coefficient of consistency of 0.45. An observer with many circular triads might have misunderstood the task, changed his interpretation of the task, or suffered from regular wandering of attention. A generally low observer consistency would however indicate that sample ranking on a linear scale, or consistent sample differentiation is impossible.

$$u = \; \frac{{2\Sigma }}{{\left( {\begin{array}{{c}} m\\ 2 \end{array}} \right)\left( {\begin{array}{{c}} n\\ 2 \end{array}} \right)}} - 1$$
$${\chi ^2} = \; \frac{4}{{m - 2}}\left( {\Sigma - \frac{1}{2}\left( {\begin{array}{{c}} n\\ 2 \end{array}} \right)\left( {\begin{array}{{c}} m\\ 2 \end{array}} \right)\left( {\frac{{m - 3}}{{m - 2}}} \right)} \right)\,\,\,\,\,\,\,\,\nu = \; \left( {\begin{array}{{c}} n\\ 2 \end{array}} \right)\left( {\frac{{m({m - 1} )}}{{{{({m - 2} )}^2}}}} \right)$$
$$\zeta = \; 1 - \frac{{24d}}{{{n^3} - 4n}}$$

The psychometric scales are derived using multiple processing methods, as to evaluate the variability of the scales when using the observer ratings or 2AFC data: a method proposed by Scheffé [23], a Generalized Linear Model (GLM) method proposed by Rajae-Joordens et al. [27], and an MLDS method included in the MATLAB “Palamedes” toolbox from Kingdom et al. [21].

Scheffé presented an experiment and processing method that combines paired comparison with an additional rating of the difference between sample pairs. The collected observer data of the 45 sample comparisons of each observer is first processed into the following ratings: (2) left much more, (1) left more, (0) equal, (-1) right more, (-2) right much more. The estimated scale values are then derived from these scores. From the error sum of squares, a Yard stick Y0.05 is presented by Scheffé, which represents a standard error of the sample estimates (see Eqs. (4.12) and (4.25) from [23]). Two samples are considered significantly different at a 95% confidence level if their scale estimates differ at least the value of Y0.05. Scheffé further documents some remaining statistical tests. The first one verifies the presence of significant order effects in the data (a difference of sample presentation at the left or right side). Next, subtractivity can be investigated, which indicates whether the obtained scale estimates are a good representation of the perceived differences between samples. The authors refer to the work of Scheffé [23] for details on the described methods.

The method from Rajae-Joordens et al. is based on the well-known method from Thurstone [28]. In essence, Thurstone assumes a normal distribution of the observer probabilities in a pair of samples to obtain the difference in scale estimates for those samples. A final scale is obtained by minimizing the error over all pairs. Rajae-Joordens et al. implements this method in a GLM format, all details can be found in [27]. Accordingly, a regression matrix with 90 equations – one for each possible sample pair (45 pairs a vs. b, and 45 pairs b vs. a) - is created in our study that assigns the difference between the (unknown) scale estimates of each pair to the calculated sample difference from Thurstone. The method also accounts for the number of observations by assigning weights - the total number of judgements for that pair – to each equation. Pairs with perfect agreement between all observers (probabilities of 0% and 100%, the so-called Houck-Donner effect) are adjusted with the correction for extreme fractions proposed by Rajae-Joordens et al. The model’s solution, via the “fitglm” function in MATLAB, comprises the scale estimates with their standard error values.

Finally, the observers’ 2AFC data is fed to the MLDS processing function – “PAL_MLDS_Fit” – from the MATLAB toolbox Palamedes (Kingdom et al. [21]). The applied default algorithm (by Maloney and Yang) creates the scales via a similar GLM optimization method. The corresponding standard errors are estimated in a subsequent Bootstrap analysis – “PAL_MLDS_Bootstrap”.

2.2 Optical evaluation instruments and methods

2.2.1 Imaging luminance measurement device

Luminance images of all experiment samples are captured with an ILMD (type TechnoTeam LMK 5 with 12 mm effective focal length lens), placed at the observer eye position in the experiment setup. By way of example the luminance image of sample 4 is presented in Fig. 3. Both the sample surface, as the reflected light source image are approximately within the DoF of the ILMD. The luminance of the reflected light source image Li and the background sample luminance Lb are extracted from the images, as to evaluate the contrast gloss via Eq. (1). Leloup et al. mentioned that in the case of samples with a perceptual DOI without any deformations in the reflected light source image, Li and Lb can be averaged over the reflected light source region (highlight), and over (arbitrary) regions above or below the specular highlight, respectively. For samples with a reduced DOI and/or deformed image however, Li was quantified as the average luminance of all image pixels above a threshold of 80% of the maximum luminance, while the measurement of Lb was performed at lower reflection angles (closer to the normal direction) in order to remove the influence of the scattered light in the proximity of the reflected highlight. This approach produced CG values that showed good correspondence with visual gloss judgements for sample groups with a similar perceptual DOI [11]. All samples in the current study have a perceptual DOI without deformations. Accordingly, Li is averaged over the highlight. Due to the haziness of the samples, Lb will largely depend on the selected measurement zone. In order to study the influence of the measurement zone for Lb on the correlation between CG and the perceived glossiness, Lb is determined in rectangular areas centered around 0.6°, 1.5° and 2.4° off-specular angle on both sides of the highlight (denoted as “zone 1”, “zone 2” and “zone 3”, resp.), and is further also measured at the 0° angle (perpendicular to the surface) where there is no influence of the sample haziness (cf. Leloup et al.). The evaluation is performed at three adjacent horizontal positions on the sample (left, middle and right), and the mean and standard deviation of the CG over these positions are determined for later analysis. The data is provided in Table S1 (Supplement 1). By way of example, the areas for evaluation of Lb at the three specified off-specular angles (zone 1 to 3), and measurement positions (left, middle and right) for Li and Lb are also indicated in Fig. 3.

 figure: Fig. 3.

Fig. 3. The recorded luminance image of experiment sample 4 with its corresponding evaluation regions for Li (rectangle 1-3) and Lb (zone 1: rectangle 4-6 and 13-15; zone 2: rectangle 7-9 and 16-18; zone 3: rectangle 10-12 and 19-21)

Download Full Size | PDF

2.2.2 ASTM gloss metrics with the IQ gloss meter (Rhopoint Instruments)

The IQ gloss meter from Rhopoint Instruments is a commercially available instrument that incorporates the same capability as a standard three-angle specular gloss meter (specular gloss measurement at 20°, 60° and 85° incident and reflection angles, further denoted as SGU 20, SGU 60 and SGU 85, respectively) [29,30]. In addition, it contains a linear CMOS sensor at the 20° reflection angle for the evaluation of IQ DOI and IQ haze according to measurement principles defined in ASTM D5767 (version of 2004) and ASTM E430 [31,32]. ASTM DOI is defined by Eq. (5) using the ratio between the average reflected luminous flux in a small off-specular interval (Fos,0.3°, at 0.3° from both sides of the specular direction) and the average luminous flux in an interval centered around the specular direction Fs,20°. ASTM Haze is defined by Eq. (6) as the integrated off-specular luminous flux (Fos,2°,sample, in regions centered at a 2° angle at both sides from the specular direction) divided by the total reflected luminous flux of a reference sample Fs,20°,ref. (a black glass reference sample for specular gloss of 100 GU).

$$ASTM\; DOI\textrm{} = \; \; \left( {1 - \frac{{{F_{os,0.3^\circ }}}}{{{F_{s,20^\circ }}}}} \right)\textrm{x}\; 100\; \; [\%]$$
$$ASTM\; Haze = \; \; \left( {\frac{{{F_{os,2^\circ ,sample}}}}{{{F_{s,20^\circ ,ref}}}}} \right)\; \textrm{x}\; 100\; \; [\%]$$

Finally, also the relative peak reflectance Rspec is measured with the linear sensor as the ratio of the peak signals of the evaluated sample and the reference sample. The IQ metrics and their denotations are summarized in Table 1. The mean and standard deviation values are determined over six measurement positions of each experiment sample – two positions towards the top, in the center and towards the bottom of the shown sample surface - for all the evaluations with the gloss meters in this study, and further used for correlation analysis. The results are provided in Table S2 (Supplement 1) and visualized in Fig. 4.

 figure: Fig. 4.

Fig. 4. Measurement results of (a) the specular gloss and (b) DOI and haze for the experiment sample set (averaged over six evaluation positions), performed with the Rhopoint IQ instrument (all metrics from Table 1). The error bars represent the standard deviation (-σ / +σ) of the measured quantity over the six positions on each sample.

Download Full Size | PDF

Tables Icon

Table 1. Metrics available in the IQ and iGM gloss meters for the gloss attributes specular gloss, DOI, haze and contrast. Novel metrics, introduced in the iGM in former work by the authors, are underlined

2.2.3 Gloss characterization with the iGM

The iGM (image-based gloss meter) was developed by the authors in earlier work [5]. It applies a 5MP 1/2.5” colour CMOS sensor in a 60° parallel-beam specular gloss meter geometry. The device can determine the standard 60° specular gloss. Furthermore, multiple image-based metrics were introduced for the characterization of surface gloss attributes (a summary of the metrics applied in this study is provided in Table 1) and are briefly summarized here. For more detailed information on the iGM, the authors refer interested readers to Refs. [5] and [33]. The iGM specular luminance is measured as the mean signal over an amount of highlight pixels within a rectangular highlight mask of the same size as the source aperture image (see Fig. 9(a) from [5]). It is proportional to the luminance of the reflected light source, and can thus be interpreted as an additional quantity related to specular gloss. The steepness of the edge of the reflected light source image (iGM slope sharpness) was proposed as an appropriate metric for the perceptual DOI. It is evaluated as the steepest slope of a normalized “signal line” averaged over the reflected source aperture edges in the measurement picture. Haze (iGM haze) is measured as the integrated luminous flux in arbitrarily chosen off-specular regions at both sides from the specular direction, divided by the total reflected luminous flux from a gloss reference sample. This is similar to the ASTM E430 standard method, yet at a reflection angle of 60° instead of 20°. In addition, a Michelson contrast-based haze metric (iGM MC haze) was introduced. As specified in Eq.(7), it is derived from the Michelson contrast of the luminous flux at both sides from the edges (Nin at the inside and Nout at the outside) of the reflected source aperture image and expressed in haze units (HU). A region of 40 pixels at each side of the edges is however ignored to avoid a possible correlation between the iGM MC haze and the iGM slope sharpness (the latter is calculated from the image data in that ignored region). Finally, the instrument contains an additional aspecular light source perpendicular to the surface (0° angle) for contrast evaluation. A first contrast metric (iGM contrast, expressed in Contrast Units CU) was developed from the CIE definition for “psychometric contrast” [34], and is evaluated as the ratio of the iGM specular luminance (with active specular light source) to the mean luminous flux of a central region on the sensor when only the aspecular light source illuminates the sample. A subsequent study on the iGM provided proof that this metric was directly proportional to the psychometric contrast of Eq. (8) with Lhl and Lbg being the luminance of the highlight and background (at large angular distance from the specular direction), resp. It further documented an iGM evaluation method for the CG of Eq.(1) (iGM CG) via luminance measurements with an ILMD [33]. The measurement values obtained for each metric with the iGM (mean values with standard deviation over six positions) are provided in Table S3 (Supplement 1).

$$iGM\; MC\; haze = \left( {1 - \frac{{{N_{in}}\; - \; {N_{out}}}}{{{N_{in}}\; + \; {N_{out}}}}} \right)\textrm{x}\; 100\; \; [{HU} ]$$
$$iGM\; contrast\textrm{}\sim \textrm{}Psychometric\; contrast = \frac{{{L_{hl}} - {L_{bg}}}}{{{L_{bg}}}}\; \; [{CU} ]$$

2.3 Correlation methods

The correlations between the psychometric scales and instrumental evaluations are analyzed using the Curve Fitting Tool app (version 3.6) in MATLAB. Scatterplots are created for all combinations of scales or/and measurement metrics, and further investigated to select an appropriate fitting function for each combination. Each fit is in the MATLAB app determined (“LinearLeastSquares” and “NonlinearLeastSquares” methods), with its corresponding coefficient of determination (R2) and Root Mean Square Error values (RMSE). Only monotonic fitting functions are used with maximum three fitting coefficients (to avoid overfitting issues). Partial correlations are analyzed with SPSS. They determine the unique amount of shared variance left over between two variables after the variance of other (control) variables has been considered. A (two-tailed) statistical test indicates the significance of this shared variance [35].

3. Results

3.1 Psychophysical experiment

In a first step, the observer repeatability is calculated. The observers are overall very stable in their answers with an average repeatability of 94%, 82%, and 80% for glossiness, brightness and sharpness judgements of repeated pairs, resp. The inter-observer consistency (u: 0.76 for glossiness and 0.77 for brightness and sharpness) leads to a rejection (p < 0.001) of the hypothesis of randomly responding observers (Eq. (3)). Concerning the intra-observer consistency, the limit of 22 circular triads is exceeded for the glossiness judgements of a single observer, whose data is discarded. Overall, a low average amount of 1.2, 2.1 and 2.9 circular triads (average coefficient of consistency $\zeta $ of 0.97, 0.95 and 0.93) is obtained for glossiness, brightness and sharpness, resp. Probably the aspect of glossiness was thus not clear to the discarded observer, or changed during the experiment session. The high repeatability and consistency for each attribute however indicates that the experiment tasks were clear and executable for the other observers.

The statistical results (listed in Table 2) indicate yard stick values which are much smaller than the scale range of each attribute (which equals unity). Consequently, significant different sample pairs must be available, in essence all pairs with scale estimates differing more than the yard stick value. Further results in Table 2 indicate a rejection of significant order effects, but also a rejection of the hypothesis of subtractivity. Upon rejection of the subtractivity, Scheffé suggests that differences between resulting scale estimates might not always be a good representation of the perceived difference between the samples. The authors of the current study argue that the rejection can be due to large supra-threshold differences between some consecutive experiment samples. Sample 1 is for example almost never selected in comparisons while samples 9 and 10 are almost always selected. Indeed, the hypothesis of subtractivity is not rejected in an analysis neglecting these three samples, which would indicate that subtractivity holds for at least part of the scale. Furthermore, Scheffé states that even upon rejection of the subtractivity, the obtained scale is still valuable as the best available estimate of the perception.

Tables Icon

Table 2. Statistical evaluations of the experiment data

The psychometric scales with their Yard stick or SE, according to the beforementioned three processing methods, were derived and are displayed in Fig. S1 (Supplement 1). The data was normalized (to scales from zero to one) by applying a linear rescaling. A good agreement in scale estimates and standard errors between the different processing methods is observed. The minimum coefficient of determination (R2) of 0.97 (for linear fit) occurs between the Scheffé and MLDS scales for glossiness, which indicates a good correlation despite the highly different processing of both methods. The GLM and MLDS optimization methods construct scales relative to assigned references (sample 10 for the GLM, samples 1 and 10 for the MLDS method). As a result, no SE information is available for these samples.

The GLM scales, displayed in Fig. 5, were selected for further analysis in this study, as this processing method has already been successfully applied in similar studies [36,37]. A summary of the GLM scale data is provided in Table S4 (Supplement 1). An analysis of the correlation between the scales is provided in Fig. 6 and indicates an excellent linear correlation between glossiness and brightness, with a high R2 of 0.99. The correlation between glossiness and sharpness is less pronounced, yet still significant with an R2 of 0.89. A partial correlation analysis in SPSS reveals that glossiness still shares a highly significant amount (p < 0.001) of 94% of remaining variance with brightness after accounting for its variance with sharpness (control variable). Sharpness on the other hand shares a non-significant (p = 0.25) amount of only 19% variance with glossiness after accounting for brightness. In addition, the significance of differences in strength between the correlations is determined with a t-statistic from Chen et al. (see [35] section 8.6.2). The correlation of glossiness with brightness is significantly stronger than with sharpness (${t_{difference}} = 6.04$, p > 0.999), confirming the former results.

 figure: Fig. 5.

Fig. 5. Normalized psychometric scales (versus the sample number) for glossiness, brightness and sharpness derived via GLM optimization. The error bars of each sample estimate represent the SE (-SE / +SE). There is no SE information available for sample 10, as this sample is selected as the scale reference in the GLM optimization.

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. The correlations between the experimentally obtained GLM scales for glossiness and its attributes (a) brightness and (b) sharpness are displayed, together with linear fits and corresponding R2 and RMSE values. The error bars represent the SE (-SE / +SE).

Download Full Size | PDF

3.2 Correlations with ILMD

The ILMD measurements reveal a good linear correlation between the CG of Eq. (1) and the observed glossiness and brightness (R2 value listed in Table 3). CG with Lb at zone 2 – displayed in Fig. 7 – results in the best regression result with the glossiness, although the differences with CG using other zones are minor. However, partial correlations between glossiness and CG_1 (where Lb is the largest due to the haziness), while accounting for CG_0 (Lb at 0°, no haziness), reveal that there is (just) statistically significant remaining variance (p = 0.03). As such, an influence of the off-specular haze signal is observed in the correlation to glossiness. A similar observation is valid for brightness (p = 0.002).

 figure: Fig. 7.

Fig. 7. The perceived glossiness plotted against CG with Lb measured in zone 2. A linear fit with R2 and RMSE statistics is also displayed. The vertical and horizontal error bars indicate the standard error of the perceptual scale and the standard deviation of the CG over the three measurement positions on the luminance image (-σ / +σ), resp.

Download Full Size | PDF

3.3 Correlations with gloss meters

The R2 values for glossiness, brightness and sharpness with the IQ and iGM metrics (see Table 1) are presented in Table 4. As the glossiness and brightness scales were closely correlated, we discuss them together in the subsequent analysis. Linear correlations are appropriate for the majority of metrics, but the correlations with IQ Rspec, iGM specular luminance, and iGM slope sharpness are logarithmic or power function fits ($\textrm{a}.lo{g_{10}}(x )+ b$ or $\textrm{a}.{x^{1/3}} + b$). The 1/3-exponent power function is used, as this fits the data and the exponent already encountered in psychophysical assessments of brightness or contrast through luminance measurements [10].

The results in Table 4 suggest that all related metrics (besides iGM contrast) have a considerable correlation with the glossiness (and brightness) and sharpness, with minimum R2 values of 0.75 and 0.55, resp. There are however substantial differences in R2 values between the metrics. From the well-known specular gloss metrics (SGU 20/60/85), IQ SGU 85 correlates best with glossiness (R2 of 0.94). This seems rather counter intuitive, because observers judged the samples at 60° angle. Furthermore the standards for specular gloss advise the 20° and 60° evaluation angles (SGU 20 and SGU 60) for glossy (and hazy) samples [30], but SGU 20 and SGU 60 show an inferior correlation with the perceived glossiness with ranking inconsistencies (Kendall rank correlation coefficient of 0.82 and 0.78, resp.). Even though the specular gloss might correlate, it most probably cannot explain the root cause of the perception of high gloss samples, as it integrates the reflected luminous flux in a large region around the specular direction while observers have access to more refined cues of the reflected image and haziness. The IQ Rspec and iGM specular luminance metrics show a good correlation with the glossiness scale. This can be expected because the metrics are related to the luminance of the reflected light source Li (ILMD) and thus to CG_noLb from Table 3. Also the IQ haze and iGM haze, which solely take into account the amount of scattered flux in an off-specular region (ASTM E430), perform good in describing the sample glossiness. The best correlations for glossiness are obtained with IQ DOI and iGM MC haze (R2 of 0.97, see Fig. 8(a) and Fig. 8(b)). Both metrics are calculated using both the off-specular luminance - in a region close to the specular direction - and the specular luminance, and have an excellent linear intercorrelation (R2 of 0.99). The untreated samples 9 and 10 (labeled on Fig. 8) are however not distinguishable by a specular gloss meter, as it is minorly influenced by the underlying surface colour layer (white versus black) and merely measures front surface reflection with a collimated light beam. The authors solved this shortcoming in earlier work by introducing the iGM contrast and iGM CG metrics via an aspecular light source [5,33]. Figure 9(a) and Fig. 9(b) display their respective relation with glossiness. A huge difference in iGM contrast between the samples 9 and 10 can be observed, because the iGM contrast is (approximately) proportional to the inverse of the luminance Lbg at 0°, which is almost two orders of magnitude different between the black and white samples. The iGM CG metric differentiates well between sample 9 and 10 and correlates good with the perceived glossiness (see Fig. 9(b)). This metric was developed to measure CG_0 (Table 3) on the iGM [33], which explains the correlation.

Tables Icon

Table 3. The coefficient of determination (R2) values are listed for linear correlations of the psychometric scales glossiness and brightness with contrast gloss (measured with the ILMD)

 figure: Fig. 8.

Fig. 8. Graphs of the perceived glossiness versus (a) IQ DOI, and versus (b) iGM MC haze. A linear fit with R2 and RMSE values is added to the plots. The vertical and horizontal error bars indicate the standard error of the perceptual scale and the standard deviation over the six measurement positions (-σ / +σ). The untreated samples (sample 9 and 10) are labeled on the graphs.

Download Full Size | PDF

 figure: Fig. 9.

Fig. 9. Graphs of the perceived glossiness versus (a) iGM contrast, and (b) iGM CG. A linear fit with R2 and RMSE values is added to the second graph. The vertical and horizontal error bars indicate the standard error of the perceptual scale and the standard deviation over the six measurement positions (-σ / +σ).

Download Full Size | PDF

In order to reveal their relative importance, a partial correlation analysis is performed on the presented metrics. Therefore, the reported non-linear relations between metrics and specific attributes were first linearized (via the transformations indicated in Table 4). When accounting for IQ Rspec or IQ haze, glossiness and IQ DOI share a significant remaining variance (p = 0.01). IQ Rspec or IQ haze however do not share significant variance with glossiness when accounting for IQ DOI (p = 0.78, p = 0.60, resp.). Similarly, the remaining variance between glossiness and iGM MC haze is significant when accounting for iGM specular luminance or iGM haze (p = 0.03 or p < 0.001), while the remaining variance between glossiness and iGM specular luminance or iGM haze is not significant when accounting for iGM MC haze (p = 0.63 or p = 0.33). According to these results, a significantly higher amount of variance is explained by taking into account a ratio (IQ DOI) or Michelson contrast (iGM MC haze) between the specular and off-specular region, opposed to solely considering the specular region (IQ Rspec or iGM specular luminance) or the off-specular region (IQ haze or iGM haze).

Tables Icon

Table 4. The coefficient of determination (R2) values are listed for the correlation of the psychometric scales for glossiness, brightness and sharpness with the measurement metrics of the IQ and the iGM gloss meters. Most correlations are linear ($\bold{{a}.{x} + {\boldsymbol b}}$), exceptions are indicated with (*) $\bold{{a}.{\boldsymbol lo}{{\boldsymbol g}_{10}}({\boldsymbol x} )+ {\boldsymbol b}}$ logarithmic function fit, or (**) $\bold{{a}.{{\boldsymbol x}^{1/3}} + {\boldsymbol b}}$ power function fit. The fit with the largest R2 value is indicated in bold

Concerning correlations with sharpness, only the IQ DOI, iGM MC haze or iGM slope sharpness are considered. It is not likely that any other metric (based on an integrated flux (SGU 20/60/85), the specular or off-specular regions, or the contrast between the specular and 0° region (iGM CG)) could reliably characterize the perceived sharpness. The highest performing metrics for the perceived sharpness are the IQ DOI (R2 of 0.90) and the iGM slope sharpness (R2 of 0.99), obtained from measurements with the IQ and the iGM, resp. The graphs and fit evaluations in Fig. 10 however suggest that the iGM instrument outperforms the IQ gloss meter in describing the perceived sharpness. A partial correlation analysis confirms this assumption: sharpness shares a highly significant amount of variance with iGM slope sharpness when accounting for IQ DOI (p < 0.001) or iGM MC haze (p < 0.001). The remaining variance between sharpness and IQ DOI or iGM MC haze is however on the edge of significance when accounting for iGM slope sharpness (p = 0.07 or p = 0.05).

 figure: Fig. 10.

Fig. 10. Graphs of the perceived sharpness versus (a) IQ DOI with its linear fit and (b) iGM slope sharpness with a power function fit. The vertical and horizontal error bars indicate the standard error of the perceptual scale and the standard deviation over the six measurement positions (-σ / +σ).

Download Full Size | PDF

4. Discussion

The correlation results between the psychometric scales for glossiness, brightness and sharpness (Fig. 6) suggest that observers based themselves on similar information to judge the attributes glossiness and brightness. Leloup et al. hypothesized that observers may use the most salient available cue in their judgment of surface gloss [11]. In this sense, observers might have used the brightness as primary cue to distinct between the samples, which is probably easier than judging the sharpness (the edges are clear and sharp for each sample). Furthermore, a high significant linear correlation exists between glossiness or brightness and contrast gloss CG (Eq. (1)) evaluations by aid of an ILMD. This is consistent with the work from Leloup et al., who stated the agreement between CG and the observed brightness for samples without haze [10]. Also Hermans et al. obtained a linear correlation between CG and their brightness model CAM18sl (the perceived brightness of a central luminous region viewed against a uniform background [38]). In addition, measurements with the ILMD, IQ and iGM instruments reveal that the best performing metrics (CG_2, IQ DOI and iGM MC haze) are all based on a type of “contrast” between the specular highlight and its near off-specular region (a difference in compressed luminance, a signal ratio and a Michelson contrast, resp.). In general the obtained results thus suggest that the glossiness of the set of hazy samples correlates well with the perceived brightness of the highlight and contrast-based metrics. This agrees with the ASTM definitions for reflection haziness (see Introduction), which mention “apparent reduction of contrast” and “cloudy appearance”, but simultaneously contradicts the current measurement standard for haze (cf. IQ haze or iGM haze) that solely measures the off-specular reflected flux.

Our results indicate that the iGM slope sharpness correlates significantly better to the perceived sharpness than the IQ DOI. This should not be surprising, as the iGM slope sharpness is characterized as the steepest slope of the reflected image, while the IQ DOI is based on a ratio of the signal at each side of the edge without considering the actual steepness of the edge in between those regions. The IQ DOI – which measures according to the ASTM D5767 (2004) principles – in turn correlates very well with the perceived glossiness of the hazy samples. This suggests that the two dimensions of the gloss space - DOI (sharpness) and contrast gloss (brightness) – are not represented by separate measurement metrics for our hazy samples. Based on the current results the authors would suggest the iGM slope sharpness as a metric for DOI, and the IQ DOI or iGM MC haze as a metric for the brightness of hazy samples. A recent update of the ASTM D5767 standard (2018) seems to address our concern and now favors an Image Clarity (IC) measurement method for DOI based on a sliding optical pattern mask with gratings of five different sizes [39]. It remains to be further investigated to what extent the IC or the iGM slope sharpness measurements relate to the perceived DOI of different materials.

A high intercorrelation is observed between the brightness and the sharpness attributes, which is due to the nature of the etched and partially polished samples: less etched or polished surfaces lead to brighter and sharper highlights. Only few samples (e.g. samples 2 and 7, see Fig. 5) substantially deviate from this observation. Furthermore samples with a more severe treatment typically have a lower specular and higher off-specular signal. This introduces a good linear correlation between the specular (e.g. Li) and off-specular (e.g. Lb) signal for the black samples of this study (R2 between Li and Lb equals 0.93, 0.96, and 0.88 for Lb at zone 1, 2, and 3, resp.). The sample illuminance during the experiment was also rather small (500 lux is for example recommended for visual assessment [40]), which limits the visual difference between the black and white samples. In addition, our study examines only semi to high-gloss samples with a considerably sharp perceptual DOI, made from glass with a similar type of treatment (if treated). All together this explains why a relatively good correlation was maintained for most examined metrics in Table 3 or Table 4, even those solely focusing on the specular or off-specular region or an integrated value covering both. As such the available data only permits us to indicate a limited (yet significant) improvement of the contrast-based metrics and the iGM slope sharpness in explaining the observers’ judgements of glossiness (brightness) and sharpness, resp. Further validation of our findings is thus necessary on larger sets of (semi to high-gloss) hazy materials, which should include different materials with different lightness or colour, in various illumination conditions.

This study does not consider samples with a reduced or deformed perceptual DOI, nor matte to semi-gloss samples. For such samples, one could say that the appropriateness of an integrated specular gloss metric (SGU 20/60/85) increases for samples with a more scattered and spread reflection behavior. Leloup et al. further indicated (on painted paper and glass samples) that their CG could still explain the perceived glossiness for samples with a similar shape or deformation of the reflected image (similar perceptual DOI), as long as there is still an observable highlight region. They however measured the background luminance Lb further away from the specular direction (close to the surface normal direction, cf. CG_0 in Table 3) [11]. The iGM CG measures in a similar way, and may be suited in this case. In previous work, the authors illustrated the capability of the iGM slope sharpness to evaluate the DOI of glass and paper samples from matte to high-gloss. In fact, the metric was able to dissociate the glass and paper samples due to a high difference in iGM slope sharpness, while the IQ DOI was not. In the case of matte samples, the reflected light is mostly scattered and no reflected highlight is distinguishable anymore. The SGU 85 is advised in this case. Obein et al. also mentioned that observers seem to use the sample lightness as a cue for glossiness in this situation [6], while Toscani et al. obtained lightness as an independent attribute of surface reflection [14].

5. Conclusion

This work documents the glossiness perception and its optical evaluations for a set of ten glass samples with considerable differences in haziness. Psychometric scales were developed for the glossiness - together with its two attributes brightness (cf. contrast gloss) and sharpness (cf. DOI) - using visual data collected from 16 observers according to the method of paired comparison. An excellent linearity was observed between the scales for glossiness and brightness, suggesting that observers used similar cues to judge both attributes.

The psychometric scales were correlated to optical evaluations with an ILMD (contrast gloss CG model of Eq. (1)), the Rhopoint IQ commercial gloss meter, and an image-based gloss meter iGM designed by the authors in previous work. The CG model showed a good linear correlation with the perceived glossiness and brightness. Furthermore, measurement metrics based on a “contrast” between the specular and its close off-specular region correlated best with brightness. The iGM sharpness metric had the most accurate correlation with the sharpness. A strong intercorrelation between many metrics was however noticed, mainly due to limited variability in the sample set. This complicates conclusions on any superiority of metrics. Our findings are further limited to the used samples and future research is necessary to validate them on larger and more diverse sample sets.

In summary, this study provided insights in the perception and optical evaluation of the glossiness of hazy samples, while considering the opportunities of image-based gloss evaluation.

Funding

Rhopoint Instruments ltd. (Collaboration partner).

Acknowledgments

The authors would like to thank the company Rhopoint Instruments ltd. for providing numerous experiment samples, and for their generous support in the development and manufacture of the iGM gloss meter prototype. This research is part of a PhD project in collaboration with, and partly funded by, the company Rhopoint Instruments Ltd. (St. Leonards-On-Sea, UK.

Disclosures

Rhopoint Instruments Ltd. (F)

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. M. Pointer, P. Hanselaer, P. Bodrogi, E. Burini, J. Campos, C. Andrew, and S. Cheung, “CIE 175:2006 A Framework for the Measurement of Visual Appearance,” (2006).

2. F. B. Leloup, G. Obein, M. R. Pointer, and P. Hanselaer, “Toward the soft metrology of surface gloss: A review,” Color Res. Appl. 39, 559–570 (2014). [CrossRef]  

3. F. Pellacini, J. Ferwerda, and D. Greenberg, “Toward a psychophysically-based light reflection model for image synthesis,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ‘00 (ACM Press/Addison-Wesley Publishing Co, 2000), pp. 55–64.

4. R. S. Hunter, “Methods of determining gloss,” J. Res. Natl. Bur. Stand. (1934).18, (1937).

5. S. Beuckels, J. Audenaert, P. Hanselaer, and F. B. Leloup, “Development of an image-based measurement instrument for gloss characterization,” JCT Res. 19(5), 1567–1582 (2022). [CrossRef]  

6. G. Obein, K. Knoblauch, and F. Viénot, “Difference scaling of gloss: Nonlinearity, binocularity, and constancy,” J. Vis. 4(9), 4–720 (2004). [CrossRef]  

7. F. W. Billmeyer and F. X. D. O’Donnell, “Visual gloss scaling and multidimensional scaling analysis of painted specimens,” Color Res. Appl. 12(6), 315–326 (1987). [CrossRef]  

8. J. A. Ferwerda, F. Pellacini, and D. P. Greenberg, “Psychophysically based model of surface gloss perception,” Hum. Vis. Electron. Imaging VI 4299, 291–301 (2001). [CrossRef]  

9. F. Di Cicco, M. W. A. A. Wijntjes, and S. C. Pont, “Understanding gloss perception through the lens of art: Combining perception, image analysis, and painting recipes of 17th century painted grapes,” J. Vis. 19(3), 7 (2019). [CrossRef]  

10. F. B. Leloup, M. R. Pointer, P. Dutré, and P. Hanselaer, “Luminance-based specular gloss characterization,” J. Opt. Soc. Am. A 28(6), 1322 (2011). [CrossRef]  

11. F. B. Leloup, M. R. Pointer, P. Dutré, and P. Hanselaer, “Overall gloss evaluation in the presence of multiple cues to surface glossiness,” J. Opt. Soc. Am. A 29(6), 1105 (2012). [CrossRef]  

12. P. J. Marlow, J. Kim, and B. L. Anderson, “The perception and misperception of specular surface reflectance,” Curr. Biol. 22(20), 1909–1913 (2012). [CrossRef]  

13. J. Wills, S. Agarwal, D. Kriegman, and S. Belongie, “Toward a perceptual space for gloss,” ACM Trans. Graph.28, (2009).

14. M. Toscani, D. Guarnera, G. C. Guarnera, J. Y. Hardeberg, and K. R. Gegenfurtner, “Three Perceptual Dimensions for Specular and Diffuse Reflection,” ACM Trans. Appl. Percept.17, (2020).

15. P. Vangorp, P. Barla, and R. W. Fleming, “The perception of hazy gloss,” J. Vis. 17(5), 19 (2017). [CrossRef]  

16. ASTM, ASTM E284-17: Standard Terminology of Appearance (American Society for Testing and Materials, 2017).

17. CIE, E-ILV: CIE S017:2020 International Lighting Vocabulary (2nd Edition) (2020).

18. G. Ged, G. Obein, Z. Silvestri, J. Le Rohellec, and F. Viénot, “Recognizing real materials from their glossy appearance,” J. Vis. 10(9), 18 (2010). [CrossRef]  

19. Rhopoint Instruments, “IQ,” https://www.rhopointinstruments.com/product/rhopoint-iq-goniophotometer-20-60-85/.

20. P. Bernal-Molina, R. Montés-Micó, R. Legras, and N. López-Gil, “Depth-of-field of the accommodating eye,” Optom. Vis. Sci. 91(10), 1208–1214 (2014). [CrossRef]  

21. F. A. A. Kingdom and N. Prins, Psychophysics: A Practical Introduction (Second Edition) (Elsevier, 2016).

22. L. T. Maloney and J. N. Yang, “Maximum likelihood difference scaling,” J. Vis. 3(8), 5–585 (2003). [CrossRef]  

23. H. Scheffé, “An Analysis of Variance for Paired Comparisons,” J. Am. Stat. Assoc. 47, 381–400 (1952).

24. R. T. Ross, “Discussion: Optimal orders in the method of paired comparisons,” J. Exp. Psychol. 25(4), 414–424 (1939). [CrossRef]  

25. A. M. G. Kendall and B. B. Smith, “On the Method of Paired Comparisons,” Biometrika 31(3-4), 324–345 (1940). [CrossRef]  

26. G. Alway, “The Distribution of the Number of Circular Triads in Paired Comparisons,” Biometrika 49(1-2), 265–269 (1962). [CrossRef]  

27. R. Rajae-Joordens and J. Engel, “Paired comparisons in visual perception studies using small sample sizes,” Displays 26(1), 1–7 (2005). [CrossRef]  

28. L. L. Thurstone, “A law of comparative judgment,” Psychol. Rev. 34(4), 273–286 (1927). [CrossRef]  

29. ASTM, ASTM D523 - 14 (2018): Standard Test Method for Specular Gloss (American Society for Testing and Materials, 2018).

30. ISO, ISO Standard 2813 (2014): Paints and Varnishes — Determination of Gloss Value at 20°, 60° and 85° (International Organization for Standardization, 2014).

31. ASTM, ASTM D5767 - 95 (2004): Standard Test Methods for Instrumental Measurement of Distinctness-of-Image Gloss of Coating Surfaces (American Society for Testing and Materials, 2004).

32. ASTM, ASTM E430 - 19 (2019): Standard Test Methods for Measurement of Gloss of High-Gloss Surfaces by Abridged Goniophotometry (American Society for Testing and Materials, 2019).

33. S. Beuckels, J. Audenaert, F. Leloup, and P. Hanselaer, “Contrast gloss evaluation by use of a camera-based gloss meter,” in Proceedings of SPIE (Unconventional Optical Imaging III) (SPIE Photonics Europe, 2022), Vol. 1213610, p. 8.

34. A. Korn, H. R. Blackwell, H. Brettel, A. P. Ginsburg, B. Inditsky, and I. Overington, “CIE 095:1992 Technical report: Contrast and visibility,” (1992).

35. A. Field, Discovering Statistics Using IBM SPSS Statistics (SAGE Publications, 2018).

36. G. H. Scheir, M. Donners, L. M. Geerdinck, M. C. J. M. Vissenberg, P. Hanselaer, and W. R. Ryckaert, “A psychophysical model for visual discomfort based on receptive fields,” Light. Res. Technol. 50(2), 205–217 (2018). [CrossRef]  

37. Y. Cui, W. Song, and X. Li, “A perceived sharpness metric based on the luminance slope and overshoot of the motion-induced edge-blur profile,” J. Soc. Inf. Disp. 20(12), 680–691 (2012). [CrossRef]  

38. S. Hermans, K. A. G. Smet, and P. Hanselaer, “Exploring the applicability of the CAM18sl brightness prediction,” Opt. Express 27(10), 14423 (2019). [CrossRef]  

39. ASTM, ASTM D5767 - 18 (2018): Standard Test Methods for Instrumental Measurement of Distinctness-of-Image Gloss of Coating Surfaces (2018).

40. NBN, “NBN EN 12464-1: Light and lighting - Lighting of work places - Part 1: Indoor work places,” (2021).

Supplementary Material (1)

NameDescription
Supplement 1       Supplemental document

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (10)

Fig. 1.
Fig. 1. Schematic and picture of the experiment setup. (1. Light source box with diffuser; 2. Light baffle; 3. Presented pair of experiment samples; 4. Observer; 5. Response keyboard)
Fig. 2.
Fig. 2. Picture of the ten experiment samples imaged on the experiment setup (at the chin rest) of Fig. 1. A high perceptual DOI is observable for each sample.
Fig. 3.
Fig. 3. The recorded luminance image of experiment sample 4 with its corresponding evaluation regions for Li (rectangle 1-3) and Lb (zone 1: rectangle 4-6 and 13-15; zone 2: rectangle 7-9 and 16-18; zone 3: rectangle 10-12 and 19-21)
Fig. 4.
Fig. 4. Measurement results of (a) the specular gloss and (b) DOI and haze for the experiment sample set (averaged over six evaluation positions), performed with the Rhopoint IQ instrument (all metrics from Table 1). The error bars represent the standard deviation (-σ / +σ) of the measured quantity over the six positions on each sample.
Fig. 5.
Fig. 5. Normalized psychometric scales (versus the sample number) for glossiness, brightness and sharpness derived via GLM optimization. The error bars of each sample estimate represent the SE (-SE / +SE). There is no SE information available for sample 10, as this sample is selected as the scale reference in the GLM optimization.
Fig. 6.
Fig. 6. The correlations between the experimentally obtained GLM scales for glossiness and its attributes (a) brightness and (b) sharpness are displayed, together with linear fits and corresponding R2 and RMSE values. The error bars represent the SE (-SE / +SE).
Fig. 7.
Fig. 7. The perceived glossiness plotted against CG with Lb measured in zone 2. A linear fit with R2 and RMSE statistics is also displayed. The vertical and horizontal error bars indicate the standard error of the perceptual scale and the standard deviation of the CG over the three measurement positions on the luminance image (-σ / +σ), resp.
Fig. 8.
Fig. 8. Graphs of the perceived glossiness versus (a) IQ DOI, and versus (b) iGM MC haze. A linear fit with R2 and RMSE values is added to the plots. The vertical and horizontal error bars indicate the standard error of the perceptual scale and the standard deviation over the six measurement positions (-σ / +σ). The untreated samples (sample 9 and 10) are labeled on the graphs.
Fig. 9.
Fig. 9. Graphs of the perceived glossiness versus (a) iGM contrast, and (b) iGM CG. A linear fit with R2 and RMSE values is added to the second graph. The vertical and horizontal error bars indicate the standard error of the perceptual scale and the standard deviation over the six measurement positions (-σ / +σ).
Fig. 10.
Fig. 10. Graphs of the perceived sharpness versus (a) IQ DOI with its linear fit and (b) iGM slope sharpness with a power function fit. The vertical and horizontal error bars indicate the standard error of the perceptual scale and the standard deviation over the six measurement positions (-σ / +σ).

Tables (4)

Tables Icon

Table 1. Metrics available in the IQ and iGM gloss meters for the gloss attributes specular gloss, DOI, haze and contrast. Novel metrics, introduced in the iGM in former work by the authors, are underlined

Tables Icon

Table 2. Statistical evaluations of the experiment data

Tables Icon

Table 3. The coefficient of determination (R2) values are listed for linear correlations of the psychometric scales glossiness and brightness with contrast gloss (measured with the ILMD)

Tables Icon

Table 4. The coefficient of determination (R2) values are listed for the correlation of the psychometric scales for glossiness, brightness and sharpness with the measurement metrics of the IQ and the iGM gloss meters. Most correlations are linear ( a . x + b ), exceptions are indicated with (*) a . l o g 10 ( x ) + b logarithmic function fit, or (**) a . x 1 / 3 + b power function fit. The fit with the largest R2 value is indicated in bold

Equations (8)

Equations on this page are rendered with MathJax. Learn more.

C G = 28 L i 1 / 3 21 L b 1 / 3 [ C G U ]
u = 2 Σ ( m 2 ) ( n 2 ) 1
χ 2 = 4 m 2 ( Σ 1 2 ( n 2 ) ( m 2 ) ( m 3 m 2 ) ) ν = ( n 2 ) ( m ( m 1 ) ( m 2 ) 2 )
ζ = 1 24 d n 3 4 n
A S T M D O I = ( 1 F o s , 0.3 F s , 20 ) x 100 [ % ]
A S T M H a z e = ( F o s , 2 , s a m p l e F s , 20 , r e f ) x 100 [ % ]
i G M M C h a z e = ( 1 N i n N o u t N i n + N o u t ) x 100 [ H U ]
i G M c o n t r a s t P s y c h o m e t r i c c o n t r a s t = L h l L b g L b g [ C U ]
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.