Streamflow as water level: impacts on model calibration¶

In this notebook I will show that a river's streamflow doesn't necessarily have to be known as a discharge. Even if the input information of a hydrological model is expressed e.g. in mm/day, and so is its simulated streamflow, one can use water level (or stage, expressed e.g. in meters) as the streamflow information in order to calibrate the model. For that, a mapping of the water level to an equivalent discharge has to be done. Since satellites can measure water level, this technique is particularly useful in the context of spatial hydrology, which is becoming increasingly important.

import pickle
from tqdm import tqdm
import pandas as pd
from pandas import DataFrame
from matplotlib import pyplot as plt
import numpy as np
import random
from multiprocessing import Pool
from waterlevel import gr4j, nse

plt.style.use('ggplot')
%matplotlib notebook

Let's load some precipitation, potential evapotranspiration, and discharge data for a particular basin, measured in mm/day (40 years of data):

peq = pd.read_pickle('peq.pkl')

It is the same data as in this notebook, so I won't go over the calibration of the GR4J model again. However in that notebook, the calibration was done using the Nelder-Mead simplex algorithm to find the maximum of the NSE. This method works well with the GR4J model, because it has few parameters, and it is not very likely to fall into local extrema.

But when using stage instead of discharge, we add yet another level of uncertainty, which comes from the fact that we have no knowledge of the rating curve. Indeed, hydrological models simulate discharge, not stage, so in order to calibrate a model, we need somehow to convert the observed stage of a river to a corresponding discharge to be able to compare it to the simulated discharge. With so much uncertainty, the simplex algorithm is not a good choice for the model calibration any more. Too many local extrema appear in the model's parameter space, which prevents the algorithm to converge to the best solution. A more robust calibration method has to be used.

But first, we will artificially convert the discharge data into stage. For that we use an arbitrary transformation, the point being that we will consider this transformation as unknown in the calibration process.

peq['h'] = (peq['q'] ** 0.5) * 0.1 + 1
ax = peq.loc['1985':'1987', ['h']].plot(figsize=(12, 5))
ax.set_ylabel('Stage (in e.g. meters)')

<matplotlib.text.Text at 0x7f96718ab4a8>

Now that we only know the basin outlet river stage time series, we must find a way to convert it to a discharge time series. We could incorporate prior knowledge about this transformation, i.e. have a model for the rating curve, and add this model's parameters to the hydrological calibration process, but there is a simpler way. Instead of forcing the hydrological model to simulate discharge time series that match a reference discharge time series (which comes from a rating curve model that we also have to calibrate), let's do the opposite and make the reference stage time series match each discharge time series that we simulate during the calibration of the hydrological model. In the end, we don't care which one conforms to the other, all we want is that they match in some way. If the reference stage can be mapped to the simulated discharge, it is an indication that we found a suitable hydrological model. A good candidate for such a transformation is probability distribution mapping, which is also easy to compute:

def dist_map(dst, src):
    return np.sort(src)[np.searchsorted(np.sort(dst), dst)]

This method has the advantage of not incorporating any prior knowledge in the stage-to-discharge transformation, since all the discharge information is included in the simulated discharge. Yet the stage-to-discharge transformation is constrained enough, mainly because the probability distribution mapping ensures that a higher stage leads to a higher discharge.

Ideally, we want to find the same model parameters as the ones we found by calibrating the model when we had the reference discharge time series at our disposal. But as stated earlier, the simplex algorithm cannot be used anymore. And even when it was possible to use it, we ended up with no information about the likeliness of the parameters that were found. On the opposite, the following method uses a Monte Carlo approach in order not only to find the best parameters, but also to have an insight about their likeliness.

We basically sample the model parameters, simulate a discharge, and compute the NSE of the simulated discharge. However, this is the NSE of the whole parameter set, which conveys an information about the ability of the model (and all of its parameters as a whole) to represent the behavior of the basin. What we want is the NSE of each parameter individually (independently of the other ones), which you may call the "marginal NSE". In order to compute the marginal NSE for a parameter's given value, we fix this parameter to this value and sample all the other ones. If we sampled enough, the highest NSE we got is the marginal NSE for this particular parameter at this particular value (*). We can then repeat this operation for other parameter values, and other parameters. The following implementation is a bit smarter since we don't walk through the parameter values but instead keep sampling all the parameters altogether and update the marginal NSE for each parameter as we go:

(*) You can see it like this: if we find a lower NSE, it means that given the value of the parameter for which we are trying to find the marginal NSE (say X1), the other parameter values (X2, X3, X4) are not the ones which lead to the best model performance, so this NSE has no meaning. If we sample enough, we will eventually find the ones which give the model's best performance given X1, and the corresponding NSE is the highest one for this particular value of X1, no matter the values of X2, X3 and X4. When we do that operation for a range of X1 values, we get the marginal NSE of X1 in that range, and the maximum is the most likely. When we do the same for X2, X3 and X4, we get the optimal model parameters along with their associated likeliness.

def init_cal(x_ranges, x_bin_nb=100):
    df_cal = DataFrame()
    for x_i in range(len(x_ranges)):
        param = 'x' + str(x_i + 1)
        df_cal[param + '_range'] = np.linspace(x_ranges[x_i][0], x_ranges[x_i][1], x_bin_nb + 1)
        df_cal[param + '_score'] = -np.inf
    return df_cal

def mc_cal(peq, df_cal, show_progress, dist_map=None, x_sample_per_bin=10000):
    x_bin_nb = len(df_cal) - 1
    x = np.empty(4)
    xi = np.empty(4, dtype=np.uint32)
    iter_nb = x_bin_nb * x_sample_per_bin
    if show_progress:
        loop = tqdm(range(iter_nb))
    else:
        loop = range(iter_nb)
    for i in loop:
        for x_i in range(4):
            this_range = 'x' + str(x_i + 1) + '_range'
            xi[x_i] = random.randint(0, x_bin_nb - 1)
            x[x_i] = random.uniform(df_cal.loc[xi[x_i], this_range], df_cal.loc[xi[x_i] + 1, this_range])
        q_mod = gr4j(x)
        q_sim = q_mod.run([peq['p'].values, peq['e'].values])[0]
        if dist_map is None:
            q_obs = peq['q']
        else:
            q_obs = dist_map(peq['h'].values, q_sim)
        warmup = 365 * 2
        score = nse(q_obs[warmup:], q_sim[warmup:])
        for x_i in range(4):
            this_score = 'x' + str(x_i + 1) + '_score'
            df_cal.loc[xi[x_i], this_score] = np.max([score, df_cal.loc[xi[x_i], this_score]])
    return df_cal

def calibration(x_ranges, peq, dist_map=None, proc_nb=1):
    df_cal = init_cal(x_ranges)
    if proc_nb > 1:
        pool = Pool(processes=proc_nb)
        result = pool.starmap(mc_cal, [(peq, df_cal.copy(), i == 0, dist_map) for i in range(proc_nb)])
        pool.close()
        pool.join()
        for x_i in range(4):
            this_score = 'x' + str(x_i + 1) + '_score'
            for df in result:
                df_cal.loc[:, this_score] = np.where(df.loc[:, this_score] > df_cal.loc[:, this_score], df.loc[:, this_score], df_cal.loc[:, this_score])

    else:
        df_cal = mc_cal(peq, df_cal, True, dist_map)
    return df_cal

def plot_cal(df1, df2, x_ranges):
    x_bin_nb = len(df1)
    plt.close('all')
    plt.figure(figsize=(12, 10))
    x_inferred = [[], []]
    for i in range(4):
        a = 4 * 100 + 11 + i
        plt.subplot(a)
        this_range = 'x' + str(i + 1) + '_range'
        this_score = 'x' + str(i + 1) + '_score'
        delta = abs(df1.loc[0, this_range] - df1.loc[1, this_range]) / 2
        for j, df in enumerate([df1, df2]):
            c = ['blue', 'green'][j]
            label = ['Using discharge', 'Using stage'][j]
            x = df.loc[:x_bin_nb - 2, this_range] + delta
            y = df.loc[:x_bin_nb - 2, this_score]
            xmax = df[this_range].iloc[y.idxmax()] + delta
            ymax = df[this_score].iloc[y.idxmax()]
            plt.plot(x, y, c=c, label=label)
            plt.scatter(xmax, ymax, c=c)
            plt.xlim(x_ranges[i])
            plt.ylim(0, 1.1)
            x_inferred[j].append(xmax)
        plt.legend(loc='lower left')
        plt.ylabel('NSE')
        plt.title('X' + str(i + 1))
    plt.tight_layout()
    plt.show()
    return x_inferred

Let's first compute the marginal NSE of the parameters (X1, X2, X3, X4) when we have discharge information, and then when we only have stage information, and plot the results. In order to know where to sample the parameter values, we provide a range for each of them (the sampling will be uniform inside each range).

x_ranges = [[1, 10000], [-1, 1], [1, 1000], [1, 20]]
try:
    df1 = pd.read_pickle('df1.pkl')
    df2 = pd.read_pickle('df2.pkl')
except:
    df1 = calibration(x_ranges, peq, proc_nb=4)
    df2 = calibration(x_ranges, peq, dist_map=dist_map, proc_nb=4)
    df1.to_pickle('df1.pkl')
    df2.to_pickle('df2.pkl')

x = plot_cal(df1, df2, x_ranges)

As we can see, the results are quite good. The inferred parameters (marked as dots at the NSE maxima) when we only have stage information are very close to the ones found when we have discharge information. However, this depends on the parameters:

X1: the marginal NSE is less sensitive when using stage than when using discharge, yet the maxima are almost aligned.
X2: when using both stage or discharge, the marginal NSE is very flat and doesn't allow to determine a clear maximum (even more so when using stage). This parameter may not have such a big influence on the model's performance.
X3: the marginal NSE is less sensitive when using stage than when using discharge, yet the maxima are almost aligned.
X4: the marginal NSE when using stage is almost as sensitive as it is when using discharge, and the maxima are almost aligned.

It has to be said that the Monte Carlo approach used for the calibration method is much more CPU-intensive than the simplex algorithm. But one advantage of the former is that it is parallelizable, which is not the case for the latter. A good algorithm should be able to take advantage of the increasing number of cores at our disposal in modern CPUs (not to mention GPUs and FPGAs). And as was already said, the Monte Carlo based algorithm gives us a visibility on the likeliness of the inferred parameters. If the simplex algorithm tries to walk blindly inside a labyrinth, the Monte Carlo algorithm sheds light on the whole place.

Let's conclude with a side product of calibration using stage information: inferring the rating curve. Indeed, we have been doing so implicitly each time we were comparing the simulated discharge to the reference stage time series. We just need to plot the sorted simulated discharge values as a function of the sorted reference stage values. In the following figure, showing the inferred rating curve and the true rating curve (which was arbitrarily chosen), we can see that the match is quite good.

def plot_rating_curve(peq):
    plt.figure()
    plt.plot(np.sort(peq['h']), np.sort(peq['q']), label='True')
    plt.plot(np.sort(peq['h']), np.sort(peq['q_sim']), label='Inferred')
    plt.legend(loc='upper left')
    plt.xlabel('River stage')
    plt.ylabel('Discharge (mm/day)')
    plt.title('Rating curve')
    plt.show()

peq['q_sim'] = gr4j(x[1]).run([peq['p'].values, peq['e'].values])[0]
plot_rating_curve(peq)

The difference we see in the rating curves are likely caused by the hydrological model errors. To verify that this is the case, let's replace the reference discharge with a simulated one:

x = [334.6954615, -0.64015554, 55.56161722, 3.4764825]
peq['q'] = gr4j(x).run([peq['p'].values, peq['e'].values])[0]
peq['h'] = (peq['q'] ** 0.5) * 0.1 + 1

try:
    df3 = pd.read_pickle('df3.pkl')
    df4 = pd.read_pickle('df4.pkl')
except:
    df3 = calibration(x_ranges, peq, proc_nb=4)
    df4 = calibration(x_ranges, peq, dist_map=dist_map, proc_nb=4)
    df3.to_pickle('df3.pkl')
    df4.to_pickle('df4.pkl')

x = plot_cal(df3, df4, x_ranges)

Now that the model is perfect (it is able to simulate exactly the reference discharge), the parameters are inferred when we reach NSE=1 with both discharge information and stage information.

peq['q_sim'] = gr4j(x[1]).run([peq['p'].values, peq['e'].values])[0]
plot_rating_curve(peq)

We can see that it is a perfect match for the lower range of stage/discharge values. The differences for the higher range are probably due to the fact that these values are less frequent than the lower ones.