In today’s blog, I want to share my recent experience about big data and its application in the coastal engineering field. JBPacific is working on a range of coastal and flood projects throughout Australia and the Indo-pacific region. Whilst each project has a different focus, e.g. hazard mapping vs options analysis, they all require the processing of a growing amount of data. This step is often missed in the stereotypical engineering profile, which pictures an engineer getting straight into the design phase of a project, but the reality is that these designs cannot happen without the help of big datasets.
This month I have been working on a coastal hazard assessment in Cape York, Australia, where we are estimating the erosion from storms within the Torres Strait. But with little available wave buoy data, we needed to move to long-term simulations to understand the coastal climate. We selected the ERA5 reanalysis database for use, produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), which combines historical observations, modelling and data assimilation. But the entire dataset is huge! – with hourly wind/wave estimates available from 1979 to present at any point on Earth on a 0.5 by 0.5 degree grid.
Unfortunately Excel just isn’t suitable for this type of data analysis, so we have been using Python to review historic conditions, create wave roses, consider seasonal shifts and estimate extremes. Python is a free general-purpose programming language that is relatively simple to use after a bit of practise. For any first-time users I would recommend the Anaconda environment and the PANDAS and NUMPY libraries. These allow us to create plots like the following image, which shows the seasonal changes in wind and wave conditions in the Torres Strait. Another merit of developing Python scripts is the increase in work efficiency by minimising the amount of repetitive work. For example, the scripts we create to display wind/wave roses are utilised for several projects.
Figure 1: Wave roses showing the difference between dry season and wet season, due to trade winds, based on ERA5 data.
Other big(ish) datasets used in this analysis is the Southern Hemisphere Tropical Cyclone Data Portal, which we analyse using Python tools such as PANDAS, NUMPY, SEABORN, and SCIKIT LEARN. Here we can review different statistical relationships between parameters such as central pressure, radius to maximum winds, cyclone direction etc. The plot below shows a hexplot distribution of east coast cyclone events comparing central pressure (Pc) and radius to the outermost isbobar. The result can be estimation of design cyclone conditions, which can be used in conjunction with open-source packages such as Delft3D to estimate the nearshore effects.
Figure 2: Analysis of cyclone statistical relationships - PC and ROCI, based on raw cyclone data.
I have been in the engineering field for nine months now and am still surprised to see the increasing use of datasets throughout our work. We are starting to see tools like Python used more than our CAD software packages – perhaps identifying the need to update our views on what a traditional ‘engineer’ looks like.