Quality checks

Throughout the different processing steps described in the previous section, a number of quality checks are applied. These quality checks are described below. When a given quality check fails, there are two possible outcomes.

When the quality check is critical for having useful data, failure of the quality check results in halting of the processing, and an anomaly is raised and stored in the anomaly database. See Anomalies (processing errors) for which quality checks halt the processing and what anomaly is raised.

On the other hand, there are some quality check where failure does not necessarily mean the entire sequence cannot be used. In some cases only part of the data might be affected (e.g. a single series in the sequence), or in other cases the quality check is only a warning the data should be used with caution. In each of these cases, a quality flag is added to the data, and the processing is continued. In some cases, it is still useful to also raise an anomaly and store it in the anomaly database for future reference. The anomalies are detailed in Anomalies (processing errors).

The different quality checks are detailed below, the triggered flags in Flags description and raised anomalies in Anomalies (processing errors).

L0A: Read raw data

While reading in the data, there are quality checks that verify whether the metadata.txt file is appropriate and all required raw data files exist. If these checks fail, an anomaly is raised (see Anomalies (processing errors)) and the processing halts. There is also a quality check which checks whether the file with meteorological information exists. If it does not, an anomaly is added to the SQL database, but the processing is continued. In addition, for traceability, if the latitude and/or longitude are unknown (i.e. not included in the metadata.txt file), latitude and longitude are taken from the processor configuration file and the ‘lon_default’ and/or ‘lat_default’ flags are triggered. Next, the pointing accuracy of the pan/tilt is also verified. If the requested pan or tilt angle differs by more than 3° with the effective pan or tilt angle, the ‘bad_pointing’ flag is raised for the given scan.

L1A: Check raw data prior to calibrating

Before calibrating each of the individual scans in the L0 data, a number of quality checks is applied. If the spectrally integrated signal of a scan is more than 3 times the standard deviation, or more than 25% (whichever is largest) removed from the mean, it is masked and will not be used when averaging the series. This process is repeated until convergence and applied to the measured (ir)radiances and to the darks. The L0 data is also checked for saturation (digital number \(DN ≥ 64,000\)) and for discontinuities (missing values or \(\Delta DN > 10^4\)). A flag is also added to the L1 data if any of the dark scans have been masked by the above processes. Scans not satisfying the quality checks are flagged, but no data are removed at this stage.

L0B: Average valid scans

When averaging, only scans that passed the L1A quality checks are used. There are a few quality checks that check the number of scans being averaged is sufficient. By default, the threshold number of scans is three. If there are fewer than three scans for one of the dark, radiance or irradiance series, no reliable uncertainty can be calculated, and the series is flagged. If less than half of the radiance or irradiance scans of a series pass the L1A checks, the series is flagged, as this likely indicates something has gone wrong.

L1B: Check calibrated data is fit for purpose

After calibrating the L0B file, we check all the required measurements to form a standard sequence are included and have not been flagged by the previous ‘not_enough_dark_scans’, ‘not_enough_rad_scans’ or ‘not_enough_irr_scans’ flags. If any series are missing or flagged, the ‘series_missing’ is added to all the series in the sequence. If there are no valid radiance or irradiance measurements, the processing is halted.

Next, quality checks on the irradiance measurements are applied. First, their viewing angles are checked (which must be 180°, with a tolerance of 2°, as irradiance measurements have to be pointing up). Next, the irradiance is compared to a simulated clear-sky model. This clear-sky model is made using the libRadtran radiative transfer software package (Emde et al., 2016), assuming its mid-latitude summer standard atmosphere, its standard desert surface (for land sites) and its standard ocean surface (for water sites). Note that the surface does not make a big difference as it is only second-order effects that affect the downwelling irradiance used in the clear-sky model. The surface is assumed to be at sea-level and the TSIS solar irradiance model is used (Coddington et al., 2021). Given the downwelling irradiance measures the full hemisphere, the only relevant angle is the solar zenith angle. A clear-sky model is calculated using solar zenith angles of 0°, 10°, 20°, 40°, 60°, 70° and 80°. These irradiance data are provided at 0.1 nm resolution to the HYPERNETS_PROCESSOR.

When performing the clear sky quality check, the irradiance data are band integrated to the HYPSTAR® bands (which vary slightly from instrument to instrument), as defined by the calibration data, using the matheo tool. The measured HYPERNETS irradiances are then scaled (assuming cosine response) to match the nearest solar zenith angle among the provided clear sky models. In Figure 11, we show an example of the clear sky checks applied to the irradiance. We note that the clear sky models are not always very close, as a midlatitude summer atmosphere at sea-level was used as opposed to a more realistic site-specific model. Therefore this quality check only fails if there are significant differences of more than 50% with the clear-sky model (for more than 10% of the wavelength bands). Overcast conditions consistently trigger this quality flag. Then, there is a quality check verifying that the irradiance has not changed more than 10% (after correcting for differences in solar zenith angle) between the measurements at the start and end of the sequence. At this stage the resulting irradiance series are flagged and the L1B file is produced. However if this ‘variable_irradiance’ check is triggered the processing will be halted at the L1C stage.

There are also some quality checks on the uncertainties. These check that there are no negative uncertainties and that less than 50% of the random uncertainties (i.e., less than half of the spectral channels) on radiance and irradiance have values below 100% (this indicates corrupted or dark data, e.g., measurements at night fail this check).

Land Network quality checks

For the LANDHYPERNET network, there is an additional check that there is no strong discontinuity (larger than 25%) between the VNIR and SWIR parts of the spectrum for both radiances and irradiances.

L1C: Check if all required data for L1C processing is valid

Before interpolating the irradiances, there are a number of checks verifying the data is valid. If the ‘variable_irradiance’ flag was raised in previous levels, we cannot perform reliable interpolation and the processing is halted. Next, the processing is halted if there are no valid series for either radiance or irradiance (checking ‘not_enough_dark_scans’, ‘not_enough_irr_scans’, ‘not_enough_rad_scans’ or ‘vza_irradiance’ flags). When all irradiance series have the ‘no_clear_sky_irradiance’ flag, the processing is continued, as overcast products might still be useful to some users (available by request). A flag is added to all series to indicate this is a sequence without clear sky irradiance. No L1D/ L2B data will be produced (and thus this data will not be provided publicly). When only one irradiance series is available (due to ‘vza_irradiance’ or missing measurements), the processing is continued, and the same irradiance is used for every radiance series (instead of temporally interpolating), with a correction for the changing solar zenith angle throughout the sequence. A flag is added to the entire sequence to indicate only one irradiance has been used.

Water Network quality checks

For the WATERHYPERNET network, there are a number of additional quality checks. First, similarly to the ‘variable_irradiance’ flag, it checks if the downwelling sky radiance, Ld, at 550 nm remains constant over the entire sequence (i.e., coefficient of variation for Ld (550) < 10%). Indeed, if Ld varies significantly between the start and the end of the sequence, the downwelling sky radiance can not be temporally interpolated to the timestamps of the Lu scans and the processing is therefore halted. Note however that the threshold of 10% difference may be subject to further research in order to select the best threshold. Next, an anomaly (i.e., ‘l’) is raised and the processor is halted if the upwelling and downwelling radiance pair does not have a similar pointing azimuth angle (within 1° accuracy), or, if the viewing geometry does not satisfy θv for Ld equals 180-θv for Lu (within 1° accuracy).

The processor also checks for the temporal variability within each series. Scans for Ed, Lu and Ld at 550 nm, should not vary by more than a certain threshold with their neighbouring scans (default threshold is 25%). Note, those flags are not expected to be raised as scans with high temporal variability should have been removed by previous flags, i.e., ‘outliers’ or ‘L0_discontinuty’ flags. However, these flags are kept to ensure consistency with other common water network processing (Ruddick et al. 2016; Vansteenwegen et al., 2019).

The number of scans per series is important to assess the uncertainties. Hence, if the number of scans, not flagged by ‘bad pointing’, ‘outliers’, ‘L0_threshold’, or ‘L0_discontinuity’, for Ed, Lu and Ld is below a given threshold, an anomaly is raised, and the processing is halted. The current default value is three which is a compromise between shortening the duration of the sequences and ensuring enough repeating measurements.

If the viewing geometry of the upwelling and downwelling radiance measurements are outside the viewing geometry range of the selected LUT for the ‘rhof_option’, the flag ‘rhof_default’ is raised. Similarly, a ‘def_wind_flag’ is used to trace spectra processed with a default wind speed value.

Finally, the flag ‘simil_fail’ is raised if the quality check applied on the NIR similarity spectrum is not verified as suggested by Ruddick et al. (2016). Note, this flag should only be considered for water types satisfying the NIR Similarity spectrum theory (i.e., clear to moderately turbid waters).

L2A: Calculate reflectance

Currently, no further quality checks are applied. For the WATERHYPERNET network, water radiance and reflectance are averaged only for the Lu scans which are not flagged for temporal variability, i.e., ‘temp_variability_irr’ and ‘temp_variability_rad’, or ‘rhof_default’.

L2B: Site-specific quality checks

The site-specific quality checks range from angular masks, i.e., viewing geometries that are expected to be affected by shadows or part of the installation (such as a mast) in the field-of-view, to quality checks that are very specific to the surface for a given site (e.g., ensuring vegetation is measured for the Wytham Woods UK (WWUK) site, or checking abnormal high reflectance values over clear or low turbid waters). Such site-specific checks often use thresholds (determined from analysis of the first months/year of data) checking the reflectance (or ratios of reflectances, e.g., epsilon for water sites, or NDVI for vegetated sites) at specific wavelengths. Additionally, the site owners can provide specific date-time ranges to mask, e.g., because something went slightly wrong during the deployment of the instrument (e.g., alignment). Another important quality check is that the surface reflectances are compared to a time-series of similar measurements (matching viewing geometry and time of day) at the same site, to identify outliers so that they can be investigated. If these outliers are found to come from invalid data, further quality checks can be added to remove such cases. The resulting site-specific masks are applied on a sequence-by-sequence basis to both L2A data (resulting in L2B dataset) and to the L1B dataset (resulting in L1D dataset). The same mask is applied to both L2A and L1B so that they remain consistent with each-other.