46 research outputs found

    La modélisation stochastique des étiages: une revue bibliographique

    Get PDF
    La croissance continue de la population mondiale et l'augmentation du niveau de vie dans certaines parties de la planète exercent une pression de plus en plus forte sur la demande quantitative et qualitative de la ressource hydrique, nécessitant ainsi une gestion plus adéquate. Afin d'évaluer la fiabilité d'un système de ressources en eau et de déterminer son mode de gestion durant un étiage, il est utile d'avoir un outil de modélisation. Nous présentons ici une synthèse des travaux de modélisation réalisés dans le cadre de l'approche stochastique. Nous faisons d'abord le point sur la différence entre une sécheresse et un étiage, termes qui sont souvent confondus dans les publications, pour ensuite en présenter quelques indicateurs. L'approche stochastique peut être subdivisée en deux catégories: l'étude fréquentielle et les processus stochastiques. La plupart des études d'analyse de fréquence ont pour objet de calculer des débits d'étiage critiques xT correspondant à une certaine période de retour T, tel que P(X<xT)=1/T. L'approche par les processus stochastiques consiste à modéliser les événements de déficit ou les variables d'intérêt sans utiliser directement des modèles de débit. L'analyse de fréquence des débits ne tient pas compte des durées et émet des hypothèses trop simplistes de stationnarité. L'analyse des séquences permet l'obtention des lois de durées uniquement pour des processus de débits très simples. L'avantage de l'approche des processus ponctuels par rapport à l'analyse des séquences est qu'elle permet d'étudier des processus complexes, dépendants et non stationnaires. De plus, les processus ponctuels alternés permettent la modélisation des durée et la génération synthétique des temps d'occurrence des séries de surplus et de déficit. Nous présentons dans cet article les travaux de modélisation des étiages basés sur l'analyse fréquentielle, la théorie des séquences et sur les processus ponctuels. Nous n'avons pas inclus les études qui développent des distributions des faibles débits à partir de modèles physiques, ni les études de type régional.The increasing pressure on the water resources requires better management of the water deficit situations may it be unusual droughts or yearly recurring low-flows. It is therefore important to model the occurrence of these deficit events in order to quantify the related risks. Many approaches exist for the modeling of low-flow/drought events. We present here a literature review of the stochastic methods. We start by clarifying the difference between low-flows and droughts, two terms which are often used interchangeably. We then present some low-flow and drought indicators. The stochastic approach may be divided into two categories: Frequency analysis and stochastic processes. Most frequency analysis studies aim to assign to a flow value X a cumulative frequency, either directly using empirical distribution functions, or by fitting a theoretical distribution. This allows the computation of a critical flow xT corresponding to a return period T, such that P(X<xT)=1/T. These studies use mostly the annual minima of daily flows where the hydrological data is assumed independent and identically distributed. It is also common to analyze Qm, the annual minimum of the m-consecutive days average flow, m being generally 7, 10, 30, 60, 90, or 120 days, and to adopt as critical flow the m-day average having a return period of T years. The distributions which are used include the Normal, Weibull, Gumbel, Gamma, Log-Normal (2), Log-Pearson (3), Generalized Extreme Value, Pearson type 3, and Pearson type 5 distributions (GUMBEL, 1954; MATALAS, 1963; BERNIER, 1964; JOSEPH, 1970; CONDIE and NIX, 1975; HOANG, 1978; TASKER, 1987; RAYNAL-VILLASENOR and DOURIET, 1987; NATHAN and MCMAHON, 1990; ADAMCZYK, 1992).The approach using stochastic processes for low-flows may be direct (analytical) or indirect (experimental) (YEVJEVICH et al., 1983). The indirect approach (not described in this literature review) consists of obtaining flow models, generating synthetic flows and then empirically studying certain drought variables obtained from the synthetic data. The direct approach models deficit events and related variables without explicitely modeling flows. The stochastic processes are of two types and differ in the way that randomness is introduced in the model: ·- State modeling: The process may be modeled as a probabilistic transition between various states (Markov processes for example). The states of the process {Xt } are obtained from the hydrological observations {Yt } using thresholds. The number of states of {Xt } is finite and run series analysis may be used to study the properties of the drought parameters; or- Event modeling: The concept of random occurrence of an event is introduced, where an event is a transition between surplus and deficit and vice-versa. In this approach, stochastic point processes are appropriate. A deficit event is then considered a rare event and is characterized by its occurrence time.We review the low-flows studies based on frequency analysis, run series analysis and on point processes. However, we do not include the physically-based models nor the regional analysis studies.Run series analysis is applied to processes derived from flows and thresholds. A two-state process is obtained and Markov processes are often applied. The variables of interest are the duration of a deficit defined by the run length of series below the threshold (RL), the severity corresponding to the deficit volume over a negative run of length n (RSn), and the intensity In defined by the ratio RSn /RL (SALDARRIGA and YEVJEVICH, 1970; SEN, 1977; MILLAN and YEVJEVICH, 1971; MILLAN, 1972; SEN, 1980A; SEN, 1980B; SEN, 1980C; GÜVEN, 1983; MOYÉ et al., 1988; SEN, 1990). It is often assumed that the flow process is either independent or autoregressive of order 1 and that it is stationary except for SEN, 1980B.Point processes are based on the notion of the occurrence of an event. They are defined by the occurrence time tj of an event ej. We present a classification of some of the pertinent processes and their relation to each other. These include the Poisson process, both homongeneous and non-homogeneous, the renewal process, the doubly stochastic process and the self-exciting process. These processes are well suited for obtaining models of deficit durations (NORTH, 1981; LEE et al., 1986; ZELENHASIC et SALVAI, 1987; CHANG, 1989; MADSEN and ROSBJERG, 1995; ABI-ZEID, 1997). The advantage of this approach is its ability to take into account nonstationarity where alternating surplus-deficit point processes are defined from daily flow data. ABI-ZEID (1997) proposed a physically-based alternating non-homogeneous Poisson process that takes into account precipitation and temperature, and defined low-flow risk indices computed from these developed models.In conclusion, we remark that frequency analysis does not take into account well the duration aspcets and uses simplifying stationnarity hypothesis. Series analysis provides duration distributions for simple flow processes. The advantage of point processes is that they can model complex, dependent and non-stationary processes. Furthermore, alternating point processes can be used to model deficit durations and generate synthetic data such as occurrences of deficit and surplus events. We argue that the duration of low-flows is an important issue which has not received a lot of attention

    Une méthodologie générale de comparaison de modèles d'estimation régionale de crue

    Get PDF
    L'estimation du débit QT de période de retour T en un site est généralement effectuée par ajustement d'une distribution statistique aux données de débit maximum annuel de ce site. Cependant, l'estimation en un site où l'on dispose de peu ou d'aucune données hydrologiques doit être effectuée par des méthodes régionales qui consistent à utiliser l'information existante en des sites hydrologiquement semblables au site cible. Cette procédure est effectuée en deux étapes: (a) détermination des sites hydrologiquemcnt semblables(b) estimation régionalePour un découpage donné (étape a), nous proposons trois approches méthodologiques pour comparer les différentes méthodes d'estimation régionale. Ces approches sont décrites en détail dans ce travail. Plus particulièrement il s'agit de- simulation par la méthode du bootstrap - analyse de régression ou Bayes empirique - méthode bayésienne hiérarchiqueEstimation of design flows with a given return period is a common problem in hydrologic practice. At sites where data have been recorded during a number of years, such an estimation can be accomplished by fitting a statistical distribution to the series of annual maximum floods and then computing the (1-1/T) -quantile in the estimated distribution. However, frequently there are no, or only few, data available at the site of interest, and flood estimation must then be based on regional information. In general, regional flood frequency analysis involves two major steps:- determination of a set of gauging stations that are assumed to contain information pertinent to the site of interest. This is referred to as delineation of homogeneous regions.- estimation of the design flood at the target site based on information from the sites ofthe homogeneous region.The merits of regional flood frequency analysis, at ungauged sites as well as at sites where some local information is available, are increasingly being acknowledged, and many research papers have addressed the issue. New methods for delitneating regions and for estimating floods based on regional information have been proposed in the last decade, but scientists tend to focus on the development of new techniques rather than on testing existing ones. The aim ofthis paper is to suggest methodologies for comparing different regional estimation alternatives.The concept of homogeneous regions has been employed for a long time in hydrology, but a rigorous detinition of it has never been given. Usually, the homogeneity concerns dimensionless statistical characteristics of hydrological variables such as the coefficient of variation (Cv) and the coefficient of skewness (Cs) of annual flood series. A homogeneous region can then be thought of as a collection of stations with flood series whose statistical properties, except forscale, are not significantly different from the regional mean values. Tests based on L-moments are at present much applied for validating the homogeneity of a given region. Early approaches to regional flood frequency analysis were based on geographical regions, but recent tendencies are to deline homogeneous regions from the similarity of basins in the space of catchment characteristics which are related to hydrologic characteristics. Cluster analysis can be used to group similar sites, but has the disadvantage that a site in the vicinity ofthe cluster border may be closer to sites in other clusters than to those ofits ovm group. Burn (1990a, b) has recently suggested a method where each site has its owm homogeneous region (or region of influence) in which it is located at the centre of gravity.Once a homogeneous region has been delineated, a regional estimation method must be selected. The index flood method, proposed by Dalrymple (1960), and the direct regression method are among the most commonly used procedures. Cunnane (1988) provides an overview of several other methods. The general performance of a regional estimation method depends on the amount of regional information (hydrological as well as physiographical and climatic), and the size and homogeneity of the region considered relevant to the target site. Being strongly data-dependent, comparisons of regional models will be valid on a local scale only. Hence, one cannot expect to reach a general conclusion regarding the relative performance of different models, although some insight may be gained from case studies.Here, we present methodologies for comparing regional flood frequency procedures (combination of homogeneous regions and estimation methods) for ungauged sites. Hydrological, physiographical and climatic data are assumed to be available at a large number of sites, because a comparison of regional models must be based on real data. The premises of these methodologies are that at each gauged site in the collection of stations considered, one can obtain an unbiased atsite estimate of a given flood quantile, and that the variance of this estimate is known. Regional estimators, obtained by ignoring the hydrological data at the target site, are then compared to the at-site estimate. Three difrerent methodologies are considered in this study:A) Bootstrap simulation of hydrologic dataIn order to preserve spatial correlation of hydrologic data (which may have an important impact on regional flood frequency procedures), we suggest performing bootstrap simulation of vectors rather than scalar values. Each vector corresponds to a year for which data are available at one or more sites in the considered selection of stations; the elements ofthe vectors are the different sites. For a given generated data scenario, an at-site estimate and a regional estimate at each site considered can be calculated. As a performance index for a given regional model, one can use, for example, the average (over sites and bootstrap scenarios) relative deviation ofthe regional estimator from the at-site estimator.B) Regression analysisThe key idea in this methodology is to perform a regression analysis with a regional estimator as an explanatory variable and the unknown quantile, estimated by the at-site method, as the dependent variable. It is reasonable to assume a linear relation between the true quantiles and the regional estimators. The estimated regression coeflicients express the systematic error, or bias, of a given regional procedure, and the model error, estimated for instance by the method of moments, is a measure of its variance. It is preferable that the bias and the variance be as small as possible, suggesting that these quantities be used to order different regional procedures.C) Hierarchical Bayes analysisThe regression method employed in (B) can also be regarded as the resultfrom an empirical Bayes analysis in which point estimates of regression coeflicients and model error are obtained. For several reasons, it may be advantageous to proceed with a complete Bayesian analysis in which bias and model error are considered as uncertain quantities, described by a non-informative prior distribution. Combination of the prior distribution and the likelihood function yields through Bayes, theorem the posterior distribution of bias and model error. In order to compare different regional models, one can then calculate for example the mean or the mode of this distribution and use these values as perfonnance indices, or one can compute the posterior loss

    Estimation non paramétrique des quantiles de crue par la méthode des noyaux

    Get PDF
    La détermination du débit de crue d'une période de retour donnée nécessite l'estimation de la distribution des crues annuelles. L'utilisation des distributions non paramétriques - comme alternative aux lois statistiques - est examinée dans cet ouvrage. Le principal défi dans l'estimation par la méthode des noyaux réside dans le calcul du paramètre qui détermine le degré de lissage de la densité non paramétrique. Nous avons comparé plusieurs méthodes et avons retenu la méthode plug-in et la méthode des moindres carrés avec validation croisée comme les plus prometteuses.Plusieurs conclusions intéressantes ont été tirées de cette étude. Entre autres, pour l'estimation des quantiles de crue, il semble préférable de considérer des estimateurs basés directement sur la fonction de distribution plutôt que sur la fonction de densité. Une comparaison de la méthode plug-in à l'ajustement de trois lois statistiques a permis de conclure que la méthode des noyaux représente une alternative intéressante aux méthodes paramétriques traditionnelles.Traditional flood frequency analysis involves the fitting of a statistical distribution to observed annual peak flows. The choice of statistical distribution is crucial, since it can have significant impact on design flow estimates. Unfortunately, it is often difficult to determine in an objective way which distribution is the most appropriate.To avoid the inherent arbitrariness associated with the choice of distribution in parametric frequency analysis, one can employ a method based on nonparametric density estimation. Although potentially subject to larger standard error of quantile estimates, the use of nonparametric densities eliminates the need for selecting a particular distribution and the potential bias associated with a wrong choice.The kernel method is a conceptually simple approach, similar in nature to a smoothed histogram. The critical parameter in kernel estimation is the smoothing parameter that determines the degree of smoothing. Methods for estimating the smoothing parameter have already been compared in a number of statistical papers. The novelty of our work is the particular emphasis on quantile estimation, in particular the estimation of quantiles outside the range of observed data. The flood estimation problem is unique in this sense and has been the motivating factor for this study.Seven methods for estimating the smoothing parameter are compared in the paper. All methods are based on some goodness-of-fit measures. More specifically, we considered the least-squares cross-validation method, the maximum likelihood cross-validation method, Adamowski's (1985) method, a plug-in method developed by Altman and Leger (1995) and modified by the authors (Faucher et al., 2001), Breiman's goodness-of-fit criterion method (Breiman, 1977), the variable-kernel maximum likelihood method, and the variable-kernel least-squares cross-validation method.The estimation methods can be classified according to whether they are based on fixed or variable kernels, and whether they are based on the goodness-of-fit of the density function or cumulative distribution function.The quality of the different estimation methods was explored in a Monte Carlo study. Hundred (100) samples of sizes 10, 20, 50, and 100 were simulated from an LP3 distribution. The nonparametric estimation methods were then applied to each of the simulated samples, and quantiles with return period 10, 20, 50, 100, 200, and 1000 were estimated. Bias and root-mean square error of quantile estimates were the key figures used to compare methods. The results of the study can be summarized as follows :1. Comparison of kernels. The literature reports that the kernel choice is relatively unimportant compared to the choice of the smoothing parameter. To determine whether this assertion also holds in the case of the estimation of large quantiles outside the range of data, we compared six different kernel candidates. We found no major differences between the biweight, the Normal, the Epanechnikov, and the EV1 kernels. However, the rectangular and the Cauchy kernel should be avoided.2. Comparison of sample size. The quality of estimates, whether parametric or nonparametric, deteriorates as sample size decreases. To examine the degree of sensitivity to sample size, we compared estimates of the 200-year event obtained by assuming a GEV distribution and a nonparametric density estimated by maximum likelihood cross-validation. The main conclusion is that the root mean square error for the parametric model (GEV) is more sensitive to sample size than the nonparametric model. 3. Comparison of estimators of the smoothing parameter. Among the methods considered in the study, the plug-in method, developed by Altman and Leger (1995) and modified by the authors (Faucher et al. 2001), turned out to perform the best along with the least-squares cross-validation method which had a similar performance. Adamowski's method had to be excluded, because it consistently failed to converge. The methods based on variable kernels generally did not perform as well as the fixed kernel methods.4. Comparison of density-based and cumulative distribution-based methods. The only cumulative distribution-based method considered in the comparison study was the plug-in method. Adamowski's method is also based on the cumulative distribution function, but was rejected for the reasons mentioned above. Although the plug-in method did well in the comparison, it is not clear whether this can be attributed to the fact that it is based on estimation of the cumulative distribution function. However, one could hypothesize that when the objective is to estimate quantiles, a method that emphasizes the cumulative distribution function rather than the density should have certain advantages. 5. Comparison of parametric and nonparametric methods. Nonparametric methods were compared with conventional parametric methods. The LP3, the 2-parameter lognormal, and the GEV distributions were used to fit the simulated samples. It was found that nonparametric methods perform quite similarly to the parametric methods. This is a significant result, because data were generated from an LP3 distribution so one would intuitively expect the LP3 model to be superior which however was not the case. In actual applications, flood distributions are often irregular and in such cases nonparametric methods would likely be superior to parametric methods

    Une approche floue pour la détermination de la région d'influence d'une station hydrométrique

    Get PDF
    La notion d'appartenance partielle d'une station hydrométrique à une région hydrologique est modélisée par une fonction d'appartenance obtenue en appliquant les concepts de l'analyse floue. Les stations hydrométriques sont représentées dans des plans dont les axes sont des attributs hydrologiques et/ou physiographiques. Les régions hydrologiques sont considérées comme des sous-ensembles flous. Une méthode d'agrégation par cohérence (Iphigénie) permet d'établir des classes d'équivalence pour la relation floue "il n'y a pas d'incohérence entre les éléments d'une même classe": ce sont des classes d'équivalence qui représentent les régions floues. La fonction d'appartenance dans ce cas est stricte. Par opposition, la seconde méthode de type centres mobiles flous (ISODATA) permet d'attribuer un degré d'appartenance d'une station à une région floue dans l'intervalle [0,1]. Celle-ci reflète le degré d'appartenance de la station à un groupe donné (le nombre de groupes étant préalablement choisi de façon heuristique). Pour le cas traité (réseau hydrométrique tunisien, débits maximums annuels de crue), il s'avère cependant que le caractère flou des stations n'est pas très prononcé. Sur la base des agrégats obtenus par la méthode Iphigénie et des régions floues obtenues par ISODATA, est effectuée une estimation régionale des débits maximums de crue de période de retour 100 ans. Celle-ci est ensuite comparée à l'estimation régionale obtenue par la méthode de la région d'influence ainsi qu'à l'estimation utilisant les seules données du site, sous l'hypothèse que les populations parentes sont des lois Gamma à deux paramètres et Pareto à trois paramètres.The concept of partial membership of a hydrometric station in a hydrologic region is modeled using fuzzy sets theory. Hydrometric stations are represented in spaces of hydrologic (coefficient of variation: CV, coefficient of skewness: CS, and their counterparts based on L- moments: L-CV and L-CS) and/or physiographic attributes (surface of watershed: S, specific flow: Qs=Qmoyen/S, and a shape index: Ic). Two fuzzy clustering methods are considered.First a clustering method by coherence (Iphigénie) is considered. It is based on the principle of transitivity: if two pairs of stations (A,B) and (B,C) are known to be "close" to one another, then it is incoherent to state that A is "far" from C. Using a Euclidean distance, all pairs of stations are sorted from the closest pairs to the farthest. Then, the pairs of stations starting and ending this list are removed and classified respectively as "close" and "far". The process is then continued until an incoherence is detected. Clusters of stations are then determined from the graph of "close" stations. A disadvantage of Iphigénie is that crisp (non fuzzy) membership functions are obtained.A second method of clustering is considered (ISODATA), which consists of minimizing fuzziness of clusters as measured by an objective function, and which can assign any degree of membership between 0 to 1 to a station to reflect its partial membership in a hydrologic region. It is a generalization of the classical method of mobile centers, in which crisp clusters minimizing entropy are obtained. When using Iphigénie, the number of clusters is determined automatically by the method, but for ISODATA it must be determined beforehand.An application of both methods of clustering to the Tunisian hydrometric network (which consists of 39 stations, see Figure 1) is considered, with the objective of obtaining regional estimates of the flood frequency curves. Four planes are considered: P1: (Qs,CV), P2: (CS,CV), P3: (L-CS,L-CV), and P4: (S,Ic), based on a correlation study of the available variables (Table 1).Figures 2, 3a, 4 and 5 show the clusters obtained using Iphigénie for planes P1 through P4. Estimates of skewness (CS) being quite biased and variable for small sample sizes, it was decided to determine the influence of sample size in the clusters obtained for P2. Figure 3b shows the clusters obtained when the network is restricted to the 20 stations of the network for which at least 20 observations of maximum annual flood are available. Fewer clusters are obtained than in Figure 3, but it can be observed that the structure is the same: additional clusters appearing in Figure 3 may be obtained by breaking up certain large clusters of Figure 3b. In Figure 3c, the sample size of each of the 39 stations of the network is plotted in the plane (CS,CV), to see if extreme estimated values of CS and CV were caused by small samples. This does not seem to be the case, since many of the most extreme points correspond to long series.ISODATA was also applied to the network. Based on entropy criteria (Table 2, Figures 6a and 6b), the number of clusters for ISODATA was set to 4. It turns out that the groups obtained using ISODATA are not very fuzzy. The fuzzy groups determined by ISODATA are generally conditioned by only one variable, as shown by Figures 7a-7d, which respectively show the fuzzy clusters obtained for planes P1-P4. Only lines of iso-membership of level 0.9 were plotted to facilitate the analysis. For hydrologic spaces (P2 and P3), it is skewness (CS and L-CS) and for physiographic spaces (P1 and P4) it is surface (Qs and S). Regionalization of the 100-year return period flood is performed based on the homogeneous groups obtained (using an index-flood method), and compared to the well-known region of influence (ROI) approach, both under the hypothesis of a 2-parameter Gamma distribution and a 3-parameter Pareto distribution. For the ROI approach, the threshold corresponding to the size of the ROI of a station is taken to be the distance at which an incoherence first appeared when applying Iphigénie. Correlation of the regional estimate with a local estimation for space P1 is 0.91 for Iphigénie and 0.85 both for ISODATA and the ROI approach. Relative bias of regional estimates of the 100-year flood based on P1 is plotted on Figures 9 (Gamma distribution) and Figure 10 (Pareto distribution). The three methods considered give similar results for a Gamma distribution, but Iphigénie estimates are less biased when a Pareto distribution is used. Thus Iphigénie appears superior, in this case, to ISODATA and ROI. Values of bias and standard error for all four planes are given for Iphigénie in Table 3.Application of an index-flood regionalization approach at ungauged sites requires the estimation of mean flow (also called the flood index) from physiographic attributes. A regression study shows that the best explanatory variables are watershed surface S, the shape index Ic and the average slope of the river. In Figure 8, the observed flood index is plotted against the flood index obtained by regression. The correlation coefficient is 0.93.Iphigénie and ISODATA could also be used in conjunction with other regionalization methods. For example, when using the ROI approach, it is necessary to, quite arbitrarily, determine the ROI threshold. It has been shown that this is a byproduct of the use of Iphigénie. ISODATA is most useful for pattern identification when the data is very fuzzy, unlike the example considered in this paper. But even in the case of the Tunisian network, its application gives indications as to which variables (skewness and surface) are most useful for clustering

    Revue bibliographique des méthodes de prévision des débits

    Get PDF
    Dans le domaine de la prévision des débits, une grande variété de méthodes sont disponibles: des modèles stochastiques et conceptuels mais aussi des approches plus novatrices telles que les réseaux de neurones artificiels, les modèles à base de règles floues, la méthode des k plus proches voisins, la régression floue et les splines de régression. Après avoir effectué une revue détaillée de ces méthodes et de leurs applications récentes, nous proposons une classification qui permet de mettre en lumière les différences mais aussi les ressemblances entre ces approches. Elles sont ensuite comparées pour les problèmes différents de la prévision à court, moyen et long terme. Les recommandations que nous effectuons varient aussi avec le niveau d'information a priori. Par exemple, lorsque l'on dispose de séries chronologiques stationnaires de longue durée, nous recommandons l'emploi de la méthode non paramétrique des k plus proches voisins pour les prévisions à court et moyen terme. Au contraire, pour la prévision à plus long terme à partir d'un nombre restreint d'observations, nous suggérons l'emploi d'un modèle conceptuel couplé à un modèle météorologique basé sur l'historique. Bien que l'emphase soit mise sur le problème de la prévision des débits, une grande partie de cette revue, principalement celle traitant des modèles empiriques, est aussi pertinente pour la prévision d'autres variables.A large number of models are available for streamflow forecasting. In this paper we classify and compare nine types of models for short, medium and long-term flow forecasting, according to six criteria: 1. validity of underlying hypotheses, 2. difficulties encountered when building and calibrating the model, 3. difficulties in computing the forecasts, 4. uncertainty modeling, 5. information required by each type of model, and 6. parameter updating. We first distinguish between empirical and conceptual models, the difference being that conceptual models correspond to simplified representations of the watershed, while empirical model only try to capture the structural relationships between inputs to the watershed and outputs, such as streamflow. Amongst empirical models, we distinguish between stochastic models, i.e. models based on the theory of probability, and non-stochastic models. Three types of stochastic models are presented: statistical regression models, Box-Jenkins models, and the nonparametric k-nearest neighbor method. Statistical linear regression is only applicable for long term forecasting (monthly flows, for example), since it requires independent and identically distributed observations. It is a simple method of forecasting, and its hypotheses can be validated a posteriori if sufficient data are available. Box-Jenkins models include linear autoregressive models (AR), linear moving average models (MA), linear autoregressive - moving average models (ARMA), periodic ARMA models (PARMA) and ARMA models with auxiliary inputs (ARMAX). They are more adapted for weekly or daily flow forecasting, since the yallow for the explicit modeling of time dependence. Efficient methods are available for designing the model and updating the parameters as more data become available. For both statistical linear regression and Box-Jenkins models, the inputs must be uncorrelated and linearly related to the output. Furthermore, the process must be stationary. When it is suspected that the inputs are correlated or have a nonlinear effect on the output, the k-nearest neighbor method may be considered. This data-based nonparametric approach simply consists in looking, among past observations of the process, for the k events which are most similar to the present situation. A forecast is then built from the flows which were observed for these k events. Obviously, this approach requires a large database and a stationary process. Furthermore, the time required to calibrate the model and compute the forecasts increases rapidly with the size of the database. A clear advantage of stochastic models is that forecast uncertainty may be quantified by constructing a confidence interval. Three types of non-stochastic empirical models are also discussed: artificial neural networks (ANN), fuzzy linear regression and multivariate adaptive regression splines (MARS). ANNs were originally designed as simple conceptual models of the brain. However, for forecasting purposes, these models can be thought of simply as a subset of non linear empirical models. In fact, the ANN model most commonly used in forecasting, a multi-layer feed-forward network, corresponds to a non linear autoregressive model (NAR). To capture the moving average components of a time series, it is necessary to use recurrent architectures. ANNs are difficult to design and calibrate, and the computation of forecasts is also complex. Fuzzy linear regression makes it possible to extract linear relationships from small data sets, with fewer hypotheses than statistical linear regression. It does not require the observations to be uncorrelated, nor does it ask for the error variance to be homogeneous. However, the model is very sensitive to outliers. Furthermore, a posteriori validation of the hypothesis of linearity is not possible for small data sets. MARS models are based on the hypothesis that time series are chaotic instead of stochastic. The main advantage of the method is its ability to model non-stationary processes. The approach is non-parametric, and therefore requires a large data set.Amongst conceptual models, we distinguish between physical models, hydraulic machines, and fuzzy rule-based systems. Most conceptual hydrologic models are hydraulic machines, in which the watershed is considered to behave like a network of reservoirs. Physical modeling of a watershed would imply using fundamental physical equations at a small scale, such as the law of conservation of mass. Given the complexity of a watershed, this can be done in practice only for water routing. Consequently, only short term flow forecasts can be obtained from a physical model, since the effects of precipitation, infiltration and evaporation must be negligible. Fuzzy rule-based systems make it possible to model the water cycle using fuzzy IF-THEN rules, such as IF it rains a lot in a short period of time, THEN there will be a large flow increase following the concentration time. Each fuzzy quantifier is modeled using a fuzzy number to take into account the uncertainty surrounding it. When sufficient data are available, the fuzzy quantifiers can be constructed from the data. In general, conceptual models require more effort to develop than empirical models. However, for exceptional events, conceptual models can often provide more realistic forecasts, since empirical models are not well suited for extrapolation.A fruitful approach is to combine conceptual and empirical models. One way of doing this, called extended streamflow prediction or ESP, is to combine a stochastic model for generating meteorological scenarios with a conceptual model of the watershed.Based on this review of flow forecasting models, we recommend for short term forecasting (hourly and daily flows) the use of the k-nearest neighbor method, Box-Jenkins models, water routing models or hydraulic machines. For medium term forecasting (weekly flows, for example), we recommend the k-nearest neighbor method and Box-Jenkins models, as well as fuzzy-rule based and ESP models. For long term forecasting (monthly flows), we recommend statistical and fuzzy regression, Box-Jenkins, MARS and ESP models. It is important to choose a type of model which is appropriate for the problem at hand and for which the information available is sufficient. Each type of model having its advantages, it can be more efficient to combine different approaches when forecasting streamflow

    Utilisation de l'information historique en analyse hydrologique fréquentielle

    Get PDF
    L'utilisation de l'information historique dans une analyse fréquentielle permet de mieux mobiliser l'information réellement disponible et devrait donc permettre d'améliorer l'estimation des quantiles de grande période de retour. Par information historique, on entend ici de l'information relative à des grandes crues qui se sont produites avant le début de la période de mesure (dite période de jaugeage systématique) des niveaux et débits des lacs et rivières. On observe de manière générale que l'utilisation de l'information historique conduit à une diminution de l'impact des valeurs singulières dans les séries d'enregistrements systématiques et à une diminution de l'écart-type des estimations. Dans le présent article on présente les méthodes statistiques qui permettent la modélisation de l'information historique.Use of information about historical floods, i.e. extreme floods that occurred prior to systematic gauging, can often substantially improve the precision of flood quantile estimates. Such information can be retrieved from archives, newspapers, interviews with local residents, or by use of paleohydrologic and dendohydrologic traces. Various statistical techniques for incorporating historical information into frequency analyses are discussed in this review paper. The basic hypothesis in the statistical modeling of historical information is that a certain perception water level exists and that during a given historical period preceding the period of gauging, all exceedances of this level have been recorded, be it in newpapers, in people's memory, or trough traces in the catchment such as sediment deposits or traces on trees. No information is available on floods that did not exceed the perception threshold. It is further assumed that a period of systematic gauging is available. Figure 1 illustrates this situation. The U.S. Water Resources Council (1982) recommended the use of the method of adjusted moments for fitting the log Pearson type III distribution. A weighting factor is applied to the data below the threshold observed during the gauged period to account for the missing data below the threshold in the historical period. Several studies have pointed out that the method of adjusted moments is inefficient. Maximum likelihood estimators based on partially censored data have been shown to be much more efficient and to provide a practical framework for incorporating imprecise and categorical data. Unfortunately, for some of the most common 3-parameter distributions used in hydrology, the maximum likelihood method poses numerical problems. Recently, some authors have proposed use of the method of expected moments, a variant of the method of adjusted moments which gives less weight to observations below the threshold. According to preliminary studies, estimators based on expected moments are almost as efficient as maximum likelihood estimators, but have the advantage of avoiding the numerical problems related to the maximization of likelihood functions. Several studies have emphasized the potential gain in estimation accuracy with the use of historical information. Because historical floods by definition are large, their introduction in a flood frequency analysis can have a major impact on estimates of rare floods. This is particularly true when 3-parameter distributions are considered. Moreover, use of historical information is a means to increase the representativity of a outlier in the systematic data. For example, an extreme outlier will not get the same weight in the analysis if one can state with certainty that it is the largest flood in, say, 200 years, and not only the largest flood in, say, 20 years of systematic gauging.Historical data are generally imprecise, and their inaccuracy should be properly accounted for in the analysis. However, even with substantial uncertainty in the data, the use of historical information is a viable means to improve estimates of rare floods

    Revue de processus ponctuels et synthèse de tests statistiques pour le choix d'un type de processus

    Get PDF
    Nous nous intéressons dans ce travail de recherche à la modélisation d'une série d'événements par la théorie des processus ponctuels temporels. Un processus ponctuel est défini comme étant un processus stochastique pour lequel chaque réalisation constitue une collection de points. Un grand nombre d'ouvrages traitent particulièrement de ces processus, cependant, il existe dans la littérature peu de travaux qui se préoccupent de l'analyse de séries d'événements. On identifie deux catégories de séries d'événements : une série d'un seul type d'événements et une série de plusieurs types d'événements.L'objectif de ce travail est de mettre en évidence les différents tests statistiques appliqués aux séries d'un seul ou de plusieurs types d'événements et de proposer une classification de ces tests. Nous présentons d'abord une revue de littérature des processus ponctuels temporels, accompagnée d'une classification de ces modèles. Par la suite, nous identifions les tests statistiques de séries d'un seul type d'événements et nous examinons leur applicabilité pour une série de deux ou de plusieurs types d'événements. Les tests statistiques identifiés sont répartis en quatre classes : analyse graphique, tests appliqués au processus de Poisson homogène et non homogène, tests appliqués au processus de renouvellement homogène et les tests de discrimination entre deux processus ponctuels. Ce travail est réalisé avec l'idée d'une application ultérieure dans le cadre de l'analyse du risque.Les résultats de cette recherche ont montré qu'il n'existe dans la littérature que des tests d'une série d'un seul type d'événements et ils sont, généralement, valables pour les processus ponctuels suivants : Poisson homogène et renouvellement homogène. L'application de ces tests aux séries de deux ou de plusieurs types d'événements est possible dans le cas où les événements sont définis par leurs nombres et leurs temps d'occurrence seulement, i.e. la durée de chaque événement n'est pas prise en considération.The design and management of hydraulic structures require a good knowledge of the characteristics of extreme hydrologic events such as floods and droughts, that may occur at the site of interest. Occurrences of such events may be modelled as temporal point processes. This modelling approach allows the derivation of various performance indices related to the design and operation of this infrastructure, as well as to the quantification and management of the associated risks. In this paper, we present statistical tests that may be applied for the modelling of a series of events by temporal point processes. A point process is defined as a stochastic process for which each realisation constitutes a series of points. Although a large body of literature dealt with temporal point processes, very few focused on the analysis of a series of events.In the present paper we identify two types of series of events: the first represents a series of only one type of event, and the second represents a series of several types of events. The main objective of this research is to comprehensively review the statistical tests applied to the series of one or several types of events and to propose a classification of these tests. This comprehensive review of statistical tests applied to point processes is carried out with the ultimate objective of applying these tests to real case studies within the framework of risk analysis. For example, an extended low-flow event constitutes a risk that may place a water resources system in a state of failure. Thus, it's important to identify and quantify this risk in order to ensure the optimal management of water resources. The modelling of the observed series of events by point processes can provide some statistical results, such as the distribution of number of events or the shape of the intensity function. These results are useful in a risk analysis framework, which includes two steps: risk evaluation and risk management. In the first part of the paper, a review and classification of the various temporal point processes are presented. These include the homogeneous and nonhomogeneous Poisson processes, the Negative Binomial process, the cluster point processes (such as the Neyman-Scott and the Bartlett-Lewis processes), the doubly stochastic Poisson processes, the self-exciting point processes, the homogeneous and nonhomogeneous renewal processes and the semi-markov processes. Also, we illustrate the various links and relationships that exist between these point processes. This classification is elaborated by considering the homogeneous Poisson process as the starting point. The simplicity and the wide use of this process in the statistical and hydrological literature justify this choice.In the second part of the paper, statistical tests of a series of one type of event are identified. A series of events may be characterised by the number of events, the occurrence times of the events or by the duration of each event. These characteristics are considered as random variables that must be represented by suitable statistical distributions. A series of events may also be characterised by the intensity function, which represents the instantaneous average rate of occurrence of an event. Clearly, the choice of the statistical distribution to model the number of events in a series or the intensity function depends on the nature of the observed data. For example, a stationary series of events may be represented by a constant intensity function. Thus, it is necessary to conduct an analysis of the observed series of the events, such as graphical analysis and statistical testing in order to select and validate the hypothesis underlying the point process model. The hypotheses that may be verified include trend analysis, homogeneity analysis, periodicity analysis, independence of intervals between events, and the adequacy of a given distribution for the number of events and for the time intervals separating events.In the third part, the applicability of the tests identified in the second part to the case of a series of two or more types of events is examined. In this part, our goal is to analyse the global point process (or the pooled output) obtained by the superposition of the p subsidiary point processes. The decomposition of the global process into p point processes necessitates an identification of each type of event, characterised generally by the number of occurrences and by the intervals between the successive events of the same type. We also examine the applicability of the statistical tests identified in the second part to the case where the global point process is characterised by the duration of each type of event. We investigate more specifically the case of two subsidiary point processes (p=2) where the two event types alternate in the time (an alternating point process). Finally, statistical tests identified in the second part are classified into four categories: tests based on graphical analysis; tests applied to the homogeneous and nonhomogeneous Poisson processes; tests applied to the homogeneous renewal process; and finally tests of discrimination between two specific processes. Theses tests of discrimination include the selection among the Poisson process and the renewal process, the Poisson process and the Binomial point process, and finally, the selection among these three point processes: Cox process, Neyman-Scott process and renewal process.The results of this research indicate that, in the past, mostly tests for a series of one type of event were presented in the literature. These tests are only valid for the following point processes: a homogenous Poisson process or a homogenous renewal process. The application of these tests to a series of two or several types of events is possible as long as these events are only described by their number and time of occurrence i.e. the duration of each event can not be taken into consideration. Otherwise, these tests are applicable to the alternating point process, which is characterised only by the number and the duration of the two types of events

    Synthèse de modèles régionaux d'estimation de crue utilisée en France et au Québec

    Get PDF
    De nombreuses méthodes régionales ont été développées pour améliorer l'estimation de la distribution des débits de crues en des sites où l'on dispose de peu d'information ou même d'aucune information. Cet article présente une synthèse de modèles hydrologiques utilisés en France et au Québec (Canada), à l'occasion d'un séminaire relatif aux " méthodes d'estimation régionale en hydrologie " tenu à Lyon en mai 1997. Les modèles français sont fortement liés à une technique d'extrapolation de la distribution des crues, la méthode du Gradex, qui repose sur l'exploitation probabiliste conjointe des séries hydrométriques et pluviométriques. Ceci explique les deux principaux volets d'études régionales pratiquées en France : les travaux liés à la régionalisation des pluies et ceux liés à la régionalisation des débits. Les modèles québecois comprennent généralement deux étapes : la définition et la détermination de régions hydrologiquement homogènes, puis l'estimation régionale, par le transfert à l'intérieur d'une même région de l'information des sites jaugés à un site non-jaugé ou partiellement jaugé pour lequel on ne dispose pas d'information suffisante. Après avoir donné un aperçu des méthodes pratiquées dans les deux pays, une discussion dégage les caractéristiques principales et les complémentarités des différentes approches et met en évidence l'intérêt de développer une collaboration plus étroite pour mieux tenir compte des particularités et des complémentarités des méthodes développées de part et d'autre. Une des pistes évoquées consiste à combiner l'information régionale pluviométrique (approche française) et hydrométrique (approche québécoise).Design flood estimates at ungauged sites or at gauged sites with short records can be obtained through regionalization techniques. Various methods have been employed in different parts of the world for the regional analysis of extreme hydrological events. These regionalization approaches make different assumptions and hypotheses concerning the hydrological phenomena being modeled, rely on various types of continuous and non-continuous data, and often fall under completely different theories. A research seminar dealing with " regional estimation methods in hydrology " took place in Lyon during the month of May 1997, and brought together various researchers and practitioners mainly from France and the Province of Quebec (Canada). The present paper is based on the conferences and discussions that took place during this seminar and aims to review, classify, comparatively evaluate, and potentially propose improvements to the most prominent regionalization techniques utilized in France and Quebec. The specific objectives of this paper are :· to review the main regional hydrologic models that have been proposed and commonly used during the last three decades ;· to classify the literature into different groups according to the origin of the method, its specific objective, and the technique it adopts ; · to present a comprehensive evaluation of the characteristics of the methods, and to point out the hypotheses, data requirements, strengths and weaknesses of each particular one ; and · to investigate and identify potential improvements to the reviewed methods, by combining and extending the various approaches and integrating their particular strengths.Regionalization approaches adopted in France include the Gradex method which represents a simplified rainfall-runoff model which provides estimates of flood magnitudes of given probabilities and is based on rainfall data which often cover longer periods and are more reliable than flow data (Guillot and Duband, 1967 ; CFGB, 1994). It is based on the hypotheses that beyond a given rainfall threshold (known as the pivot point), all water is transformed into runoff, and that a rainfall event of a given duration generates runoff for the same length of time. These hypotheses are equivalent to assuming that, beyond the pivot point, the rainfall-runoff relationship is linear and that the precipitation and runoff probability curves are parallel on a Gumbel plot.In Quebec (and generally in North America), regional flood frequency analysis involves usually two steps : delineation of homogeneous regions, and regional estimation. In the first step, the focus is on identifying and regrouping sites which seem sufficiently homogeneous or sufficiently similar to the target ungauged site to provide a basis for information transfer. The second step of the analysis consists in inferring flood information (such as quantiles) at the target site using data from the stations identified in the first step of the analysis. Two types of " homogeneous " regions can be proposed : fixed set regions (geographically contiguous or non-contiguous) and neighborhood type of regions. The second type includes the methods of canonical correlation analysis and of the regions of influence. Regional estimation can be accomplished using one of two main approaches : index flood or quantile regression methods.The results of this work indicate that the philosophies of regionalization and the methods utilized in France and Quebec are complementary to each other and are based on different needs and outlooks. While the approaches followed in France are characterized by strong conceptual and geographic aspects with an emphasis on the utilization of information related to other phenomena (such as precipitations), the approaches adopted in Quebec rely on the strength of their statistical and stochastic components and usually condense the spatial and temporal information to a realistic functional form. This dissimilarity in the approaches being followed on either side may be originated by the distinct topographic and climatic characteristics of each region (France and Quebec) and by the differences in basin sizes and hydrometeorologic network densities. The conclusions of the seminar point to the large potential of improvements in regional estimation methods, which may result from an enhanced exchange between scientists from both sides : indeed, there is much to gain from learning about the dissimilarities between the various approaches, comparing their performances, and devising new methods that combine their individual strengths. Hence, the Gradex method for example could benefit from an increased utilization of regional flood information, while flood regionalization methods utilized in Quebec could gain much from the formalization of the use of rainfall information and from the integration of an improved modeling of physical hydrologic phenomena. This should result in the enhancement of the efficiency of regional estimation methods and their ability to handle various practical conditions.It is hoped that this research will contribute towards closing the gap between French and Quebec literature, and more generally between the European and the North American hydrological schools of thought, by narrowing the large literature that is available, by providing the necessary cross-evaluation of regional flood analysis models, and by providing comprehensive propositions for improved approaches for regional hydrologic modeling

    La régionalisation des précipitations : une revue bibliographique des développements récents

    Get PDF
    L'estimation de l'intensité de précipitations extrêmes est un sujet de recherche en pleine expansion. Nous présentons ici une synthèse des travaux de recherche sur l'analyse régionale des précipitations. Les principales étapes de l'analyse régionale revues sont les méthodes d'établissement de régions homogènes, la sélection de fonctions de distributions régionales et l'ajustement des paramètres de ces fonctions.De nombreux travaux sur l'analyse régionale des précipitations s'inspirent de l'approche développée en régionalisation des crues. Les méthodes de types indice de crues ont été utilisées par plusieurs auteurs. Les régions homogènes établies peuvent être contiguës ou non-contiguës. L'analyse multivariée a été utilisée pour déterminer plusieurs régions homogènes au Canada. L'adéquation des sites à l'intérieur d'une région homogène a souvent été validée par une application des L-moments, bien que d'autres tests d'homogénéité aient aussi été utilisés.La loi générale des valeurs extrêmes (GEV) est celle qui a le plus souvent été utilisée dans l'analyse régionale des précipitations. D'autres travaux ont porté sur la loi des valeurs extrêmes à deux composantes (TCEV), de même que sur des applications des séries partielles.Peu de travaux ont porté sur les relations intensité durée dans un contexte régional, ni sur les variations saisonnières des paramètres régionaux. Finalement, les recherches ont débuté sur l'application des concepts d'invariance d'échelle et de loi d'échelle. Ces travaux sont jugés prometteurs.Research on the estimation of extreme precipitation events is currently expanding. This field of research is of great importance in hydraulic engineering not only for the design of dams and dikes, but also for municipal engineering designs. In many cases, local data are scarce. In this context, regionalization methods are very useful tools. This paper summarizes the most recent work on the regionalization of precipitation. Steps normally included in any regionalization work are the delineation of homogenous regions, selection a regional probability distribution function and fitting the parameters.Methods to determine homogenous regions are first reviewed. A great deal of work on precipitation was inspired by methods developed for regional flow analysis, especially the index flood approach. Homogenous regions can be contiguous, but in many cases they are not. The region of influence approach, commonly used in hydrological studies, has not been often applied to precipitation data. Homogenous regions can be established using multivariate statistical approaches such as Principal Component Analysis or Factorial Analysis. These approaches have been used in a number of regions in Canada. Sites within a homogenous region may be tested for their appropriateness by calculating local statistics such as the coefficient of variation, coefficient of skewness and kurtosis, and by comparing these statistics to the regional statistics. Another common approach is the use of L-moments. L-moments are linear combinations of ordered statistics and hence are not as sensitive to outliers as conventional moments. Other homogeneity tests have also been used. They include a Chi-squared test on all regional quantiles associated with a given non-exceedance probability, and a Smirnoff test used to validate the inclusion of a station in the homogenous region.Secondly, we review the distributions and fitting methods used in regionalization of precipitation. The most popular distribution function used is the General Extreme Value (GEV) distribution. This distribution has been recommended for precipitation frequency analysis in the United Kingdom. For regional analysis, the GEV is preferred to the Gumbel distribution, which is often used for site-specific frequency analysis of precipitation extremes. L-moments are also often used to calculate the parameters of the GEV distribution. Some applications of the Two-Component Extreme Value (TCEV) distribution also exist. The TCEV has mostly been used to alleviate the concerns over some of the theoretical and practical restrictions of the GEV.Applications of the Partial Duration Series or Peak-Over-Threshold (POT) approach are also described. In the POT approach, events with a magnitude exceeding a certain threshold are considered in the analysis. The occurrence of such exceedances is modelled as a Poisson process. One of the drawbacks of this method is that it is sometimes necessary to select a relatively high threshold in order to comply with the assumption that observations are independent and identically distributed (i.i.d.). The use of a re-parameterised Generalised Pareto distribution has also been suggested by some researchers.Research on depth-duration relations on a regional scale is also discussed. Empirical approaches used in Canada and elsewhere are described. In most cases, the method consists of establishing a non-linear relationship between a quantile associated with a given duration and its return period to a reference quantile, such as a 1-hour rainfall with a 10-year return period. Depth duration relationships cannot be applied uniformly across Canada for events with durations exceeding two hours. Seasonal variability studies in regionalization are relatively scarce, but are required because of the obvious seasonality of precipitation. In many cases, seasonal regimes may lead to different regionalization approaches for the wet and the dry season. Some research has focused on the use of periodic functions to model regional parameters. Another approach consists of converting the occurrence data of a given event in an angular measurement and developing seasonal indices based on this angular measurement.Other promising avenues of research include the scaling approach. The debate over the possibility of scale invariance for precipitation is ongoing. Simple scaling was studied on a number of precipitation data, but the fact that intermittence is common in precipitation regimes and the presence of numerous zero values in the series does not readily lead to proper application of this approach. Recent research has shown that multiple scaling is likely a more promising avenue

    Un colloque sur la pluridisciplinariaté dans les problèmes d'environnement : quelques enseignements et orientations pour l'avenir

    Get PDF
    International audienceAn experiment has been carried out in Rennes (Brittany, West of France) in June 2000 to estimate atrazine and alachlor volatilisation fluxes after application over a maize crop. Fluxes were calculated using the classical aerodynamic micrometeorological method from vertical gradients of pesticide concentrations, temperature and wind speed. Volatilisation fluxes showed a diurnal pattern of the order of few ng/m2/s for atrazine and the order of a few 10 ng/m2/s for alachlor, leading to cumulated losses of approximately 0.1% of the theoretical application dose for atrazine and several-fold of that in the case of alachlor
    corecore