EXTRACTION OF CLOUD HEIGHTS FROM SENTINEL-2 MULTISPECTRAL IMAGES

Investigation of the focal plane assembly of the Sentinel-2 satellites show slight delays in the acquisition time of different bands on different CCD lines of about 0.5 to 1 second. This effect was already exploited in the detection of moving objects in very high resolution imagery as from WorldView-2 or -3 and also already for Sentinel-2 imagery. In our study we use the four 10-m-bands 2, 3, 4 and 8 (blue, green, red and near infrared) of Sentinel-2. In the level 1C processing each spectral band gets orthorectified separately on the same digital elevation model. So on the one hand moving objects on the ground experience a shift between the spectral bands. On the other hand objects not on the ground also show a slight shift between the spectral bands depending on the height of the object above ground. In this work we use this second effect. Analysis of cloudy Sentinel-2 scenes show small shifts of only one to two pixels depending on the height of the clouds above ground. So a new method based on algorithms for deriving dense digital elevation models from stereo imagery was developed to derive the cloud heights in Sentinel-2 images from the parallax from the 10-m-bands. After detailed description of the developed method it is applied to different cloudy Sentinel-2 images and the results are cross-checked using the shadows of the clouds together with the position of the sun at acquisition time.


INTRODUCTION
Actual high and very-high resolution Satellites like Sentinel-2 or WorldView-2/-3 acquire images scanning the ground line by line while traveling along their orbit. Such a push-broomscanner consists of different CCD lines for each panchromatic and multispectral band acquiring one line at one time. In the processing of the raw imagery the individual multispectral and panchromatic bands get coregistered using a common digital elevation model (DEM). So moving objects on ground show a shift of the objects between the different bands as shown in fig. 1, left. This effect was already exploited in the detection of moving objects in very high resolution imagery as from WorldView-2 or -3 as shown for example in (Krauß et al., 2013) or (Kääb, 2011). For Sentinel-2 the time shift was used for detection of aircrafts and ships by (Heiselberg, 2019) or for trucks in (Fisser, 2020). On the other hand objects not sticked to the ground also show a slight shift between the spectral bands depending on the height of the object above ground as shown in fig. 1, right. In the study we present in this publication we use this second effect.
The principle of the spectral shift due to non-ground objects is depicted in fig. 2. For exploiting the spectral shift of moving or non-ground objects we need to know the details of the focal plane assembly (FPA) of the sensor. Fig. 3 shows the CCD lines in the FPA of Sentinel-2 (ESA, 2021). The FPA for the visual/near-infrared bands (VNIR) of the multispectral sensor (MSI) is build as a staggered array of 12 CCD elements covering the whole 290km-swath of Sentinel-2. But the order of the CCD lines is inverted in neighbouring CCD elements. The blue bands of the odd/even sensors are nearest with a time lag of 2.32 s.
Images acquired by push-broom-scanners with focal plane assemblies like those shown in fig. 3 cause small time gaps between image bands clearly separated in the focal plane assembly -e.g. about one second between the blue and red CCD lines -as listed in table 1 from (ESA, 2021).  For our study we use only the bands with the highest resolution of 10 m: blue, green and red, but not the NIR band (B08, nead infrared) since the spectral differences are mostly too large of the NIR band to the others (see fig. 9).
using the orbital speed vo with orbit period to and orbit length lo calculated from the mean earth radius RE and the orbit height ho vo = lo to = 2 · π · (RE + ho) to = 7.429 km s A shift of one pixel (10 m) between the blue and the red band corresponds to a height above ground of 1053 m using ∆t = 1.005 s and eq. 1. For normal clouds we will thus expect a shift of about one to three pixels. However cirrus clouds may form in heights of 5000 to 13700 m, so shifts of about five to 14 pixel may occur. Fig. 4 shows a stereo image of an area 8.7×6.5 km 2 containing different types of clouds at different heights. The 3D effect of the different heights can only be observed if the image is viewed using appropriate red-cyan-glasses. After the detailled description of the developed method in section Method the method is applied with different parameter settings to different Sentinel-2 images in section Experiments and Discussion. The results are also discussed and cross-checked using casted shadows of the clouds and knowledge of the position of the sun at acqusition time in Evaluation. Finally a resumé of the work is given in the last section Conclusion and Outlook.

METHOD Background
The proposed method is based on small shifts between spectral bands of Sentinel-2 scenes for non-ground objects as described in section Introduction and shown in fig. 2. If no objects above ground exist there should be no shifts between the bands since they were all coregistered and orthorectified to the same ground DEM. The shifts occur in scanning direction of the satellite. For Sentinel-2 there are only Level-1C and Level-2A data available. Since both are orthorectified the knowledge of the scanning direction and also which pixel belongs to which of the staggered CCD arrays (cf. fig. 3) is lost in the data. In the proposed workflow we have to determine the scanning direction, calculate the shifts and create a digital elevation model (DEM) of the clouds (or any other non-ground objects).
Since a delivered Sentinel-2 scene has normally a size of about 10980 × 10980 pixels the scene is split to tiles of size 1000 × 1000 pixels and the processing is done on each tile separately. The processing of a tile needs approximately three minutes on a Dell Latitude E7270 laptop under Ubuntu 16.04.

Workflow
The proposed workflow for deriving cloud heights from parallaxes between different spectral bands of the VNIR Sentinel-2 MSI instrument is shown in fig. 5.
The core of the processing is the derivation of disparities between spectral bands using the well known Semi-Global-Matching (SGM) algorithm (Hirschmüller, 2005) extended and optimized for satellite imagery (dAngelo and Reinartz, 2011). SGM needs the input stereo image pair in epipolar direction with disparities occuring only in one direction -horizontal or vertical. In our case we selected the vertical direction since the Sentinel-2 scenes are acquired roughly from north to south. After determining the scanning direction the input tile is rotated to vertical epipolar direction. Next for a statistically more reliable result for each of the band combinations (2,3), (2,4) and (3,4) (blue-green ∆t = 0.527 s, blue-red ∆t = 1.005 s and green-red ∆t = 0.478 s according to table 1) the shifts or disparities between the spectral bands are calculated using SGM. In the next step these disparities are converted to heights above ground. Finally the three independently derived DEMs are merged and rotated back to the tile geometry. Each step is described in detail in the following subsections.

Rotating to scan-direction
For calculating the disparities epipolar images are needed. But as stated in subsection Background the available Sentinel-2 images of processing level 1C or 2A are already rectified and have no more any information on the scanning direction. Luckily Sentinel-2 always scans nadir and in orbit-direction. So knowing the direction of the orbit at the acquisition position we can calculate the scan-angle. Fig. 6 shows the Sentinel-2-orbit and the definition of the inclination angle α. Figure 6. Orbit of Sentinel-2 on day-side from north to south, descending node with Inclination angle α = −98.62 • From the inclination angle α at the equator the maximum polar latitude of the orbit can be calculated as At the equator the scan-direction is simply α. At the maximum polar latitude in the north and south, λmax and −λmax, the orbit is straight west or the scan-angle is 180 • . So following (Heiselberg, 2019) we can calculate the scan-angle φ as function of the inclination angle α and the center latitude λ of the scene as Since we want to rotate the scene in vertical epipolar direction the rotation angle of the scene is 90 • − φ in clockwise direction. For Frankfurt am Main (50.11 • north) φ is 103.515 • and thus the images will be rotated by 13.515 • to the left. For rotating the tile of size wt ×ht pixels a larger tile with additional borders in each direction will be cut from the original image and rotated around the center of the tile.

Creating dense stereo disparity images
Since the expected shifts of normal clouds with heights of about 1000 m above ground are only about one pixel for the maximum time distance of ∆t = 1.005 s (band combination blue-red) and only about half a pixel for the other band combinations (bluegreen and green-red) the resulting disparity images will be very noisy. To overcome this we create disparity images for all possible band combinations and try to merge them to a statistically more stable result.

Converting disparities to heights above ground DEM
Depending if a pixel belongs to an odd or even CCD array the stereo direction is inverted. In one case the red band is acquired 1.005 s before the blue band, in the other it is acquired 1.005 s after the blue band. When using the red and blue bands as stereo pairs for the DEM generation the disparities get inverted or the height of a cloud is in the first case above ground, in the second case below ground. If we assume heights below ground can not occur we can simply take the absolute value of our disparities.
The disparity maps from the SGM processing are floating point images with values d from -20.0 to 20.0 pixels or no-data values for areas where no disparities can be calculated or the forwardbackward-test failed. These disparity images are converted to "heights above ground" h using the time lag ∆t between the spectral bands, the image resolution ρ = 10 m/px and eqs. 1 and 2 as

Merging band-stereo DEMs
In this processing step the three resulting DEMs for the three band combinations get merged by averaging all non-no-data values. All pixels with DEM-values exceeding 0.25σ with σ as the standard deviation of the DEM-values of this pixel in the three input DEMs will be set to zero height as well as all remaining no-data values. Finally a small morphological opening with an element size of (5,5) and a Gaussian filter of σ = 1.0 will be applied for smoothing the results.

Rotating back to tile
The final step is rotating the resulting DEM back by 90 • − φ in counterclockwise direction and cutting the added borders b to fit the result on the input tile.

EXPERIMENTS AND DISCUSSION
For the experiments we selected two scenes over Germany and one over Austria as listed in table 3.  Fch -Estimating processing parameters from a cloudy area north of Frankfurt The first test case "Fch" uses an area of 3 × 3 km 2 north of Frankfurt am Main containing some scattered clouds as shown in fig. 7. show much more noise than (2,4) with a ∆t of 1.005 s. The pre-scaling of the bands before applying the SGM algorithm on the other hand has no real effect. So we can stick to only using scale 1 for further processing. An other effect may be the selection of the cost-function in the SGM algorithm. Beside testing absolute gray differences and many more the best remaining cost functions were MI (mutual information) and Census transform. Operating experience of processing DEMs from normal very high resolution satellite stereo imagery shows also best results for a weighted sum of MI and Census as cost function. Fig. 10 shows the resulting DEMs using different weights of the MI (mutual information) and Census cost functions. As can be seen in our case the simple usage of Census as cost function for the SGM algorithm shows the best results. Beneath this also the processing using MI needs about 22 s while using Census only needs 4 s.
Ffm -Can we derive building heights? Fig. 11 shows a small section of the epipolar image containing the city center of Frankfurt am Main with skyscrapers together with the final merged cloud-height DEM (heights 0-500 m from black to white). Figure 11. City center of Frankfurt am Main, 1500 × 1200 m 2 , left: multispectral epipolar image, right: merged heights above ground Fig. 12 shows the three single DEMs of this area before merging. The noise in the DEMs can clearly be seen. Looking on the noisy DEMs of fig. 12 and comparing the mutispectral image and the resulting merged DEM in fig. 11 visually we can clearly say, that building heights can not be extracted using the described method.

Mch -Clouds in different heights
The test area Mch contains clouds in different heights as shown already in fig. 4. Fig. 13 shows the multispectral image and fig. 14 three DEMs of the clouds calculated using a bicubic scaling of the bands before SGM with factors one, two and four (heights 0-10000 m from black to white). As can clearly be seen, the thin high clouds at heights of about 6000 to 7000 m above ground vanish in higher scalings.

Inb -Clouds and Snow over mountains
The last test case is a very mountainous area around Innsbruck in Austria. Fig. 15 shows the epipolar scene together with the Figure 14. Derived cloud heights of test area Mch, 5 × 5 km 2 , scales 1, 2 and 4 resulting cloud heights above ground (heights 0-3000 m from black to white). The cloud heights seem to go down and up again following the depicted profile line from top left to bottom right. Figure 15. Test area Inb at Innsbruck, 10 × 10 km 2 , the green line represents the position of the profile line used in fig. 16 Fig . 16 shows the profiles of the resulting cloud DEM with heights above ground in green, the SRTM DEM in red and the sum of both -the absolute height of the clouds -in blue. It can clearly be seen, that the derived cloud-heights are really heights above ground since adding the ground-DEM gives a nearly constant cloud-height at 2000 to 2500 m with no clouds at the tops of higher mountains.  fig. 17. hc is the cloud height above ground which was derived using our described method. d is the measured distance of a cloud-point with the corresponding shadowpoint. Hc and Hs are the heights of the surface model on which the Sentinel-2 scene was orthorectified. sz is the sun zenith angle for the acquisition time. Since we have no reliable azimuth and incidence angles of the looking direction of the satellite we assume nadir looking (which could add a shift the cloud position in the ortho image).
For the validation we measure d and take h from our results and Hc and Hs from the fitting SRTM-C-1-arcsec surface model as listed in table 5 for test case Fch. Fig. 18 shows the Fch ortho image together with our derived cloud heights above ground and the positions of the measurements.   From the measurement of d in the ortho image and the knowledge of the sun zenith angle sz we can calculate the height of the cloud above the position of the shadow edge hs tan sz = d hs Using the difference of the ground heights we get the cloud height above ground hc as hc + Hc = hs + Hs thus hc = d tan sz + Hs − Hc (8) Calculating the differences of the measured hc (from d, cf. eq. 8) versus the automatically derived h (see table 5 So the results of the described method seem to be in very good correlation with the manual evaluation. Other errors influencing the result of the proposed algorithm are: • Terrain height: Since we use the orbit height above ground as ho in eq. 1 we get a maximum error in ho of about 8 km or 1 % which results in a hc also 1 % lower or typical 10 to 20 m -neglectable with general errors in height of about 150 m • Scan direction: using a wrong scan direction φ in eq. 6 gives an error of cos(∆φ) for the disparity d, for a maxium error of about 10 • this means also derived heights hc are about 1.5 % too small -also neglectable • Orbit height: if the orbit height is not always 786 km above the ellipsoid the influence is similar to the one of the terrain height above • Movement of clouds: for average not too cloudy scenes at most a strong breeze (Beaufort 6, 25 knots or 13 m/s) can be assumed. This means an additional 1.5 pixel disparity in any direction. So the derived cloud heights may vary by about ±1500 m. But those shifts originating by the movement of clouds also influence the position of the shadows and in this case also the borders of the cloudshadows should show colored borders or additional disparities. In our examples this effect cannot be observed. So the wind speeds at acqusition time were significantly below 10 m/s. This possible error is not neglectable -crosschecking the shadow borders and areas should be done if proper scenes can be found. Fig. 19 shows positive (green) and negative (red) heights derived from the ortho-image not using the absolute value of d in eq. 6. The five CCD elements of the staggered array can clearly be seen. The full swath of Sentinel-2 is 290 km covered by 12 CCDs so the 109.8 km of scene B only cover 4.5 CCD elements. Figure 19. Test image B, lowest 500 rows, 10980 columns, top: ortho image, bottom: cloud-heights, green: positive heights above ground, red: negative heights above ground

CONCLUSION AND OUTLOOK
In this paper we presented a new method for deriving cloud heights from spectral shifts in the visual 10-m-bands of the Sentinel-2 MSI instrument. Typical clouds show a shift of about one pixel between the blue and red band but only about half a pixel between the blue and green or green and red band. Combining derived disparity maps from all three band combinations give a relative consistent and good result. Cross-checking the derived heights with manual measurements of the cloud shadows show an error of about 150 m at cloud heights of 1500 m above ground.
The resulting digital elevation models of the clouds cover mostly a larger area than clouds are visible in the ortho imagery.
The method can often also derive heights for very thin, nearly invisible clouds. So also high cirrus clouds can be detected, but since they are mostly very thin their DEM is very perforated and only small parts can be seen in the DEM. Due to the very small shifts between the spectral bands the method is also prone to generate noise in areas not covered by clouds.
In future work the cloud height may be fused with spectral cloud masks and a better filtering may be implemented to get rid of noise in non-cloud regions. Also a point to be solved in subsequent works is the detection of moving clouds. This can be done by detecting the same additional spectral shifts in cloud shadow areas. Since already a moderate wind speed of 10 m/s means one more pixel shift between the blue and red band the heights may be wrong by ±1000 m.