Functional Data Analysis in Groundwater Modeling

Object Details


XVI International Conference on Computational Methods in Water Resources (CMWR-XVI) Ingeniørhuset

Functional Data Analysis in Groundwater Modeling
Author:Bruno Mendes <> (University of California Santa Cruz)
David Draper <> (University of California Santa Cruz)
Presenter:Bruno Mendes <> (University of California Santa Cruz)
Date: 2006-06-18     Track: Special Sessions     Session: Multi-Disciplinary Approaches To Reactive Transport Simulation In Aquifer Systems

In groundwater contamination studies, uncertainties are a constant presence. We have in previous work classified the different sources of uncertainty one can encounter in such studies [ items 43 and 55], and we have proposed a framework to tackle them involving four hierarchical layers of uncertainty: * Scenario (there may be uncertainty about relevant inputs to the physical process under study), * Structure (conditional on scenario, the precise mathematical form of the best model to capture all relevant physical processes (advection, diffusion, ...) may not be known, * Parametric (conditional on scenario and structure, the model will typically have one or more unknown physical constants that need to be estimated from data), and * Predictive (conditional on scenario, structure, and parameters, the model predictions may still not agree perfectly with the observed data). We have been developing work on all of these types of uncertainty and the present study focuses on scenario uncertainty. The set of scenarios we used was developed by Prado, Eguilior and Saltelli [Level E/G test-case specifications (GESAMAC project). CIEMAT, Madrid, 1998]; it consists of different sets of hydrogeological assumptions about what can go wrong if a deep underground storage chamber for nuclear waste material is breached:* Reference (Ref) Scenario (from the PSACOIN Level E Intercomparison (NEA PSAG User's Group 1989)); Fast Path (FP) Scenario (a fast pathway to the geosphere), Additional Geosphere (AG) Scenario (an additional geosphere layer), Glacial Advance (GA) Scenario (related to the AG scenario but arising from an advancing rather than retreating glacier), Human Disposal Errors (HDE) Scenario (corresponding to deficiencies in the construction of the repository and/or in waste disposal operationsleading to premature failing of the near-field barriers) and Environmentally Induced Changes (EIC) Scenario (arising from human activities or geological events that indirectly are responsible for the modification of the disposal system conditions). Statistical models are often applied to sets of data with a single outcome variable and we have indeed performed such studies in this same context before [ items 43 and 55], where we studied the values of maximum radiologic dose. In fact the deterministic model that we used in this study produces more informative output than that: among other things, it produces a collection of values for contaminant concentration for different time points, and this for a fixed point in space, which we take to be one at the biosphere. This collection of data points can be seen to approximate a continuous function of dose versus time. In this paper we describe statistical methods that are useful when the outcome of interest is an entire function rather than just a single numerical summary of the function. Functional Principal Component Analysis is performed on the curves in order to find the curve's main modes of variability, also ANOVA-like calculations are made where we identify the effects of alternative scenarios for the physical state of the groundwater system. We performed functional linear regression of the program's input parameters on the whole dose curve. It is shown that the application of these innovative techniques yields new important insights on the uncertainties we should expect from computer simulations in this field; we noted that scenario effects can account for as much as a 40-fold increase in the uncertainty of predicted doses. We also find indications that one should expect higher uncertainties in the portions of the curve that come before its maximum, than after it.