2 Overview


2.1 What is CSIEM?

CSIEM is a spatially resolved 3-dimensional (3D) model able to simulate the environmental conditions within Cockburn Sound and adjacent areas within Perth Waters, spanning between Quinns Rocks and Mandurah. The depth gradient spans from the shallow littoral waters to approximately 20 meters at the centre of the Sound, and in the context of the interactions with ocean dynamics, catchment inputs, and variable climate forcing, this creates a spectrum of water quality conditions and habitat distributions. To capture the spatio-temporal dynamics of the system, CSIEM has been designed to bring together a system of environmental models able to sequentially “downscale” from global to regional to local, and “vertically integrate” between physical, chemical and biological processes (Figure 2.1).

At its core, the model platform consists of the hydrodynamic model TUFLOW-FV and the water quality model AED. Different model domain resolutions can be selected and/or time-periods of interest simulated. Different levels of model complexity are also configurable by the user (e.g., by enabling various biogeochemical or ecological components or modules), with common input files and output analysis workflows.

The model platform has been setup to allow for a variety of stakeholder needs and use-cases by providing a allows a common approach to setup, configuration and visualisation; for example, a long-term low-resolution habitat restoration simulation can be run alongside a high-resolution dredging scenario, with simulations sharing common model settings, parameters, and parameterisations. The intent is to provide decision-makers a single trusted tool that can be used to answer a diversity of questions that span from day-to-day management to long-term policy settings (e.g, restoration and climate change adaptation strategies).

In general, the models are configured to resolve hydrodynamics and flushing, water temperature and salinity, and numerous water quality attributes including oxygen, nutrients (organic and inorganic), planktonic organisms, seagrass and benthic communities, and suspended sediment. CSIEM also links a large volume of data from historical surveys, real-time data streams and recent experimental work. Complex integration workflows are implemented to process data for model inputs (e.g., boundary conditions) or model assessment (e.g., calibration or validation).

2.2 Approach to model-data organisation


The data requirements for the modelling are diverse and varied, spanning hydrological, meteorological, water and sediment quality (long-term monitoring, data from intensive campaings and in situ sondes), plus ecological survey data. This creates an integration challenge for model setup, parameterisation and assessment (calibration and validation) (Figure 2.1).

The model-data ecosystem and conceptual approach to model-data integration, accomodating data diversity and varied model requirements.

Figure 2.1: The model-data ecosystem and conceptual approach to model-data integration, accomodating data diversity and varied model requirements.

As the model and data is always evolving we aim to support a process of Continuous Integration - Continuous Deployment (CI/CD), a set of software practices used to ensure practicies to deliver code (model files and data products) changes frequently and safely. To enable the ongoing use and development of the CSIEM in this context, we have developed a model-data integration framework able to be used by stakeholders to co-ordinate the reference datasets needed for model development, and standardise the data integration and modelling workflows. The below sections describe how the CSIEM is organised, the tools and approaches used for model provenance and managing data streams, and model versioning.

2.3 CSIEM repository structure and organisation

The CSIEM repository ecosystem has been designed is such a way to both facilitate the sharing of data and models across various agencies and researchers, as well as providing a formalised structure to store, catalogue and process complex and unique datasets. Comparmentalised data structures have been implemented to allow for tracking and version control of data and models as they are utilised and upgraded throughout the project. Given the complexitites and scope of the requirements, four interconnected repositories have been created to house and share the accumulated project outputs:

  • csiem-science: a repository for documenting the CSIEM workflows, model-data structure and integration, and results (the current CSIEM bookdown online manual);
  • csiem-data: a platform for storing and sharing environmental data for this project (see Chapter 3);
  • csiem-model: a platform for storing and sharing model files and configurations;
  • csiem-marvl: a collection of scripts for visualising, assessing and reporting the data and model performance.

Github has been chosen to house the repository based on it’s mature version control systems and cross platform program support to aid all stakeholders accessing the data and models (https://github.com/). In addition, there is a wide variety of documentation online to assist users new to GitHub. Information on how to clone a publicly available GitHub repository can be found here.

Data cataloguing via the “CSIEM Data Catalogue” (described in Section 3.3) has been designed with cross-agency usage in mind. Integration with the WAMSI Theme 10 Data Catalogue is essential to allow for the tracking of data changes and upgrades throughout the project, and to be used for modelling purposes. The “Point of Truth”, “WAMSI Catalogue Classification” and “Status Notes” categories have been included in the CSIEM Data Catalogue to track a dataset’s evolution throughout the project.

2.3.1 Online repository access

The data and model files are stored in the Github CSIEM REPOSITORY. Core model files and data-sets are distributed publicly, and other aspects of model setup or data-sets that are commercial-in-confidence modules are restricted and shared with end-users on a case-by-case basis.

2.3.2 High level model-data integration workflows

The raw data collected from previous works and current WAMSI research project outcomes are stored in the csiem-data folder and classified by their sources. These raw data are then post-processed using standard scripts to convert the raw data into ‘standardised’ data formatd (stored in the data-warehouse folder) that can be used for model configuration and boundary/initial conditions (stored in the csiem-model folder), and for model/data visualization, model calibration/validation, and reporting using the csiem-marvl analysis library. The data-model workflow is shown in Figure 2.2.

CSIEM conceptual diagram showing the flow of data through the system.

Figure 2.2: CSIEM conceptual diagram showing the flow of data through the system.


2.4 SEAF-CS: cloud-based shared modelling platform

The Shared Environmental Analytics Facility (SEAF) is a cloud-based, multi-zonal analytics platform developed to support integrated environmental assessment for Cockburn Sound. SEAF has been co-designed and implemented as a joint initiative between WAMSI, WABSI, Microsoft, PwC Australia, IXUP, Sentient Hubs and the Pawsey Supercomputer Research Centre. The Cockburn Sound specific implementation (SEAF-CS) brings together the diverse environmental datasets and models described in this manual into a shared facility for use and dissemination, providing a central trusted source and repository of data, modelling and analytics.

The platform architecture draws on five main cloud solutions to create a unified shared modelling capability whilst maintaining a high level of security (Figure 2.3):

  • GitHub: version-controlled data and model repositories (Chapters 2 and 3);
  • Pawsey Acacia S3 Storage: cloud object storage for the data lake, data warehouse and model boundary condition files;
  • SEAF Landing Zones: the underlying Azure-based cloud network platform providing secure, isolated analytics environments;
  • Databricks: cloud-based data engineering for running MARVL analytics notebooks (Chapter 4); and
  • Sentient Hubs: cloud modelling web services for automated model execution, chainage and archival (Chapter 6).

Each proponent on the platform is allocated a separate, secure Landing Zone — an entirely self-contained, networked and governed region. The innovation in the SEAF platform is that landing zones can be instantiated from code (“Infrastructure-as-Code”), based on an open-source project developed by Microsoft, allowing auto-generation of zones from simple configuration settings. Landing zones are categorised as Private, Collaboration, Encrypted or Shared, depending on security and data-sharing requirements. This architecture supports cumulative environmental assessment by enabling multiple proponents to work with shared model tools and data without compromising data security.

The SEAF-CS shared platform. (a) Schematic of the Cockburn Sound Landing Zone, showing the various cloud resources for analytics and modelling, and (b) architecture and data flow diagram showing the different landing zone types.The SEAF-CS shared platform. (a) Schematic of the Cockburn Sound Landing Zone, showing the various cloud resources for analytics and modelling, and (b) architecture and data flow diagram showing the different landing zone types.

Figure 2.3: The SEAF-CS shared platform. (a) Schematic of the Cockburn Sound Landing Zone, showing the various cloud resources for analytics and modelling, and (b) architecture and data flow diagram showing the different landing zone types.

CSIEM data products, model configurations and analytics workflows have been integrated within the SEAF-CS platform. GitHub serves as the repository for model configurations, analytics scripts, and data governance documentation. The Pawsey Acacia S3 storage hosts the data lake and data warehouse, as well as model boundary condition files. Model execution is managed through the Sentient Hubs interface, which automates simulation bundling, license management, and post-processing via “Project Tapestry”. Over the duration of the WWMSP, SEAF-CS has served to host a range of different proponents including research groups, consultants and regulators involved in Cockburn Sound environmental assessment.