2 Overview
2.1 What is CSIEM?
CSIEM is a spatially resolved 3-dimensional (3D) model able to simulate the environmental conditions within Cockburn Sound and adjacent areas within Perth Waters, spanning between Quinns Rocks and Mandurah. The depth gradient spans from the shallow littoral waters to approximately 20 meters at the centre of the Sound, and in the context of the interactions with ocean dynamics, catchment inputs, and variable climate forcing, this creates a spectrum of water quality conditions and habitat distributions. To capture the spatio-temporal dynamics of the system, CSIEM has been designed to bring together a system of environmental models able to sequentially “downscale” from global to regional to local, and “vertically integrate” between physical, chemical and biological processes (Figure 2.1).
At its core, the model platform consists of the hydrodynamic model TUFLOW-FV and the water quality model AED. Different model domain resolutions can be selected and/or time-periods of interest simulated. Different levels of model complexity are also configurable by the user (e.g., by enabling various biogeochemical or ecological components or modules), with common input files and output analysis workflows.
The model platform has been setup to allow for a variety of stakeholder needs and use-cases by providing a allows a common approach to setup, configuration and visualisation; for example, a long-term low-resolution habitat restoration simulation can be run alongside a high-resolution dredging scenario, with simulations sharing common model settings, parameters, and parameterisations. The intent is to provide decision-makers a single trusted tool that can be used to answer a diversity of questions that span from day-to-day management to long-term policy settings (e.g, restoration and climate change adaptation strategies).
In general, the models are configured to resolve hydrodynamics and flushing, water temperature and salinity, and numerous water quality attributes including oxygen, nutrients (organic and inorganic), planktonic organisms, seagrass and benthic communities, and suspended sediment. CSIEM also links a large volume of data from historical surveys, real-time data streams and recent experimental work. Complex integration workflows are implemented to process data for model inputs (e.g., boundary conditions) or model assessment (e.g., calibration or validation).
2.2 Approach to model-data organisation
The data requirements for the modelling are diverse and varied, spanning hydrological, meteorological, water and sediment quality (long-term monitoring, data from intensive campaings and in situ sondes), plus ecological survey data. This creates an integration challenge for model setup, parameterisation and assessment (calibration and validation) (Figure 2.1).

Figure 2.1: The model-data ecosystem and conceptual approach to model-data integration, accomodating data diversity and varied model requirements.
As the model and data is always evolving we aim to support a process of Continuous Integration - Continuous Deployment (CI/CD), a set of software practices used to ensure practicies to deliver code (model files and data products) changes frequently and safely. To enable the ongoing use and development of the CSIEM in this context, we have developed a model-data integration framework able to be used by stakeholders to co-ordinate the reference datasets needed for model development, and standardise the data integration and modelling workflows. The below sections describe how the CSIEM is organised, the tools and approaches used for model provenance and managing data streams, and model versioning.
2.3 CSIEM repository structure and organisation
The CSIEM repository ecosystem has been designed is such a way to both facilitate the sharing of data and models across various agencies and researchers, as well as providing a formalised structure to store, catalogue and process complex and unique datasets. Comparmentalised data structures have been implemented to allow for tracking and version control of data and models as they are utilised and upgraded throughout the project. Given the complexitites and scope of the requirements, four interconnected repositories have been created to house and share the accumulated project outputs:
- csiem-science: a repository for documenting the CSIEM workflows, model-data structure and integration, and results (the current CSIEM bookdown online manual);
- csiem-data: a platform for storing and sharing environmental data for this project (see Chapter 3);
- csiem-model: a platform for storing and sharing model files and configurations;
- csiem-marvl: a collection of scripts for visualising, assessing and reporting the data and model performance.
Github has been chosen to house the repository based on it’s mature version control systems and cross platform program support to aid all stakeholders accessing the data and models (https://github.com/). In addition, there is a wide variety of documentation online to assist users new to GitHub. Information on how to clone a publicly available GitHub repository can be found here.
Data cataloguing via the “CSIEM Data Catalogue” (described in Section 3.3) has been designed with cross-agency usage in mind. Integration with the WAMSI Theme 10 Data Catalogue is essential to allow for the tracking of data changes and upgrades throughout the project, and to be used for modelling purposes. The “Point of Truth”, “WAMSI Catalogue Classification” and “Status Notes” categories have been included in the CSIEM Data Catalogue to track a dataset’s evolution throughout the project.
2.3.1 Online repository access
The data and model files are stored in the Github CSIEM REPOSITORY. Core model files and data-sets are distributed publicly, and other aspects of model setup or data-sets that are commercial-in-confidence modules are restricted and shared with end-users on a case-by-case basis.
2.3.2 High level model-data integration workflows
The raw data collected from previous works and current WAMSI research project outcomes are stored in the csiem-data folder and classified by their sources. These raw data are then post-processed using standard scripts to convert the raw data into ‘standardised’ data formatd (stored in the data-warehouse folder) that can be used for model configuration and boundary/initial conditions (stored in the csiem-model folder), and for model/data visualization, model calibration/validation, and reporting using the csiem-marvl analysis library. The data-model workflow is shown in Figure 2.2.

Figure 2.2: CSIEM conceptual diagram showing the flow of data through the system
2.4 Cockburn SEAF - cloud-based integrated modelling platform
The Shared Environmental Analytics Facility is a cloud-based, multi-zonal analytics prototype being developed to support the Cockburn Sound environmental impact assessment process by WAMSI, in partnership with Microsoft, PwC Australia, Sentient Hubs and the Pawsey Supercomputer Research Centre. At the core of the prototype is the development of an open, standardised platform to support modelling and data services between diverse agencies and companies, on a secure robust platform. Built on Microsoft’s Azure platform, and utilising the modelling expertise of Sentient Hub’s, the SEAF prototype is being designed to facilitate complex modelling projects whilst maintainig a high degree of data security, transparancy and repeatability.
Based around Microsoft’s Analytics Landing Zone, which can be described as a Analytics Platform in a Box, a single zone allows for an agency to carry out highly complex workflows, whilst maintaining an very high degree of security and tranparency. Metadata logging at each phase of a project allows for the reproducability of results, and the standardisation inherent in each ALZ means results can be repoduced outside of a single agencies ALZ. Data is shared between zones via automated processes currently under development by Microsoft, which will facilitate the creation of dynamic assessments between different agencies without comprimising data security.

Figure 2.3: Data flow through a single Analytics Landing Zone
The CSIEM Environmental Information Management system has been design to integrate seamlessly with the SEAF prototype once it has become operational. The data governance documentation has been designed to be easily transformed to work with Microsoft Purvue, and the base format of the data inside the data-warehouse also allows for quick ingestion into Azure Synapse and the header information into MS Purvue.
Both the SEAF Prototype and CSIEM EIM use GitHub as their main repository for configuration information and data in the case of CSIEM, and the user credentials used in one system will be available for use in the other.