Browsing MH370

Last week Geoscience Australia released a vast bathymetric survey dataset from phase one of an intensive search for the missing aircraft, flight MH370. Read the full story here.

I’ve been ruminating on the idea of treating bathymetric datasets in the same way I handle LiDAR surveys – as massive point clouds. So this dataset presented an opportunity to try some things out.

I used the Python library Siphon to extract data from NCI’s THREDDS catalogue – ending up with roughly 100gb of ASCII files on my working machinery. It was easy to see what these files contained – but they’re no good for my use case as lines of text. I had in mind dumping them all into a postgres-pointcloud database – but then, I got excited by the idea of visualising it all.

So I did.

The first step was to clean up the data. I needed to convert very nice descriptive headers into something that described the data in a less verbose way.

sed -i took care of that task. It also handled removing leading 0’s from two number longitudes. I still had ASCII data, but now I can do something with it!

Enter the Point Data Abstraction Library (PDAL). My new ASCII headers describe PDAL dimensions. My cleaned numbers left no doubt about what an extra 0 means. With a quick pipeline I turned all my data into LAS files, reprojected from lat/long to metres in EPSG:3577 – GDA94 / Australian Albers. I used this because it was the only cartesian projection which could feasibly swallow the whole region without any weirdness (for example writing out things that are in UTM44 as if they were in UTM48).

But wait – what? Why LAS?

…because it can be read by PotreeConverter. You can likely see where I’m headed now. After some wrangling to get PotreeConverter working on CentOS, I burned a bunch of RAM and CPU time to generate octree indexes of all 4ish billion bathymetry points. With some scaling fun and experimenting with just how much data I could throw in at once, I eventually rendered out and glued together visualisation. A screenshot is below. The interactive visualisation is no longer live, I’m working on a new hosting site and a better version – and will update this post when done!

It’s not perfect by any stretch, but you can browse the entire dataset in 3D in your (Chrome or Firefox) web browser. It is a Potree 1.5 viewer, and doesn’t like Safari  (yet). Given the number of individual indexes being loaded, it also sometimes overwhelms the capacity of a browser to open connection sockets. Ideally I’d have a nice basemap and some more context in there as well, but as a proof-of-concept it isn’t bad!

The whole thing is built with a mix of pretty simple freely available tools – from linux built-ins to cutting-edge webGL.

Why did I do all this, and could it be better?

I’m grateful to my employer for the time I get to spend on side projects like this, but there is a real purpose to it. When data get massive, it makes sense to shift big processing off of desktops and onto dedicated high performance compute hardware. Point cloud data are on the cusp of being workable – data as a service in the raster domain has been around for a while.

Point clouds have some special considerations, the primary one being lack of data topology. Creating visualisations like this demonstrates one way of organising data, and makes light of the traditionally difficult task of investigating entire surveys. It also makes us ask hard questions about how to store the data on disk, and how to generate products from the original datasets on demand – without storing derivatives.

For this project, it would have been great to skip the LAS creation part and render straight from an underlying binary data source to the octree used for visualisation. And then, run an on-demand product delivery (rasterisation/gridding, analytical products) from the same data store. In it’s current form this is not possible. As-is, the data are designed for users to make their own copies and then do stuff – which is limited by the size of local compute, or the size of your public cloud account budget.

What next?

Prepare for testing with a point cloud data delivery service I’m working on. Tune in to FOSS4G (August, Boston) to hear about that.

In the meantime, play with it yourself! You can obtain the data shown here for free – it is at: I used the data in ‘bathymetry processed -> clean point clouds’ (direct link). The data are also held on AWS (see the Geoscience Australia data description) if that’s easier for you. Tinker with it, have look at the viewer, see what you can come up with!

Oh, and let me know if you spot anything unusual. WebGL lets all kinds of things happen in the ocean deeps


Thanks to Geoscience Australia for releasing the dataset to the public! And thanks to the National Computational Infrastructure (NCI) for my time and the hardware used to develop this technology demonstration.

The MH370 dataset is hosted on NCI hardware. However – I used the same methods for data access anyone in the general public would (my download stats were a big blip for a while..)

EGU 2017

I’m going to Vienna next week for EGU17 – which is awesome, and I’m extraordinarily excited under my ‘I’m a badass science ninja’ veneer! I’ll be listening a lot, and talking a little bit about work I’m doing at the National Computational Infrastructure on some ideas for data services around massive, dense point data – standing in front of the poster pictured above.

Hope to see you there!

ps – I promise, I’ll write about sea ice real soon. I need to – for Pint of Science (Australia)  next month…

Drifting sea ice and 3D photogrammetry

3D photogrammetry has been a hobby horse for ages, and I’ve been really excited to watch it grow from an experimental idea [1] to a full-blown industrial tool. It took a really short time from research to production for this stuff. Agisoft Photoscan turned up in 2009 or 2010, and we all went nuts! It is cheap, super effective, and cross-platform. And then along came a bunch of others.

Back to the topic – for my PhD research I was tinkering with the method for a long time, since I had a lot of airborne imagery to play with. I started by handrolling Bundler + PMVS, and then my University acquired a Photoscan Pro license – which made my life a lot simpler!

My question at the time was: how can we apply this to sea ice? or can we at all?

The answer is yes! Some early thoughts and experiments are outlined here, and below are  some results from my doctoral thesis, using imagery captured on a 2012 research voyage (SIPEX II). Firstly, a scene overview because it looks great:

Next, stacking up elevations with in situ measurements from drill holes from a 100m drill hole line on the ice. The constant offset is a result of less-than-great heighting in the local survey – I focussed heavily on getting horizontal measurements right, at the expense of height. Lesson learned for next time!

And finally, checking that we’re looking good in 3D, using a distributed set of drill holes to validate the heights we get from photogrammetric modelling. All looks good except site 7 – which is likely a transcription error.

How did we manage all this? In 2012 I deployed a robotic total station and a farm of GPS receivers on drifting ice, and used them to make up a lagrangian reference frame (fancy word for ‘reference frame which moves with the ice’) – so we can measure everything in cartesian (XYZ) coordinates relative to the ice floe, as well as using displacement and rotation observations to translate world-coordinates to the local frame and vice-versa. Here’s a snapshot:

I don’t know if this will ever make it to publication outside my thesis – I think the method should be applied to bigger science questions rather than just saying ‘the method works and we can publish because nobody put Antarctica in the title yet’ – because we know that from other works already (see [2] for just one example).

So what science questions would I ask? Here’s a shortlist:

  • can we use this method to extract ridge shapes and orientations in detail?
  • can we differentiate between a snow dune and a ridge using image + topographic characteristics?

These are hard to answer with lower-density LiDAR – and are really important for improving models of snow depth on sea ice (eg [3]).

For most effective deployment, this work really needs to be done alongside a raft of in situ observations – previous experience with big aircraft shows that it is really hard to accurately reference moving things from a ship. That’s a story for beers 🙂



[2] Nolan, M., Larsen, C., and Sturm, M.: Mapping snow depth from manned aircraft on landscape scales at centimeter resolution using structure-from-motion photogrammetry, The Cryosphere, 9, 1445-1463, doi:10.5194/tc-9-1445-2015, 2015

[3] Steer, A., et al., Estimating small-scale snow depth and ice thickness from total freeboard for East Antarctic sea ice. Deep-Sea Res. II (2016),

Data sources


Dr Jan Lieser (University of Tasmania) instigated the project which collected the imagery used here, let me propose all kinds of wild ideas for it, and was instrumental in getting my PhD done. Dr Christopher Watson (University of Tasmania) provided invaluable advice on surveying data collection, played a massive part in my education on geodesy and surveying, and also was instrumental in getting my PhD done. Dr Petra Heil and Dr Robert Massom (Australian Antarctic Division) trusted me to run logistics, operate a brand new (never done before in the AAD program) surveying operation and collect the right data on a multi-million dollar investment.  The AAD engineering team got all the instruments talking to each other and battled aircraft certification engineers to get it all in the air. Helicopter Resources provided safe and reliable air transport for instruments and operators; the management and ship’s crew aboard RSV Aurora Australis kept everyone safe, relatively happy, and didn’t get too grumpy when I pushed the operational boundaries too far on the ice; and Walch Optics (Hobart) worked hard to make sure the total station exercise went smoothly.






The LiDAR uncertainty budget part III: data requirements

Part I of this series described the LiDAR georeferencing process (for one, 2D, single return scanner). Part II showed how point uncertainties are computed and why they are useful.

What hasn’t been covered so far is what data you need to do this stuff! Here is your shopping list:

Scanner data

  • Raw LiDAR ranges
  • Scanner mirror angles
  • Waveforms with times if you are going to decide range-to-returns for yourself
  • Intensity
  • anything else (RGB, extra bands, etc)

Essentially you need the full data package from the LiDAR including any corrections for mirror wobble or other sensor oddness – and a way to decode it. You also need engineering  drawings which show the LiDAR instrument’s reference point.

Aircraft trajectory data

  • GPS/GNSS based aircraft trajectory (positions in space)
  • Attitude data  (relationship between the aircraft’s orientation and some reference frame) normally collected by a ‘strapdown navigator’ – which is an inertial motion sensor (often called IMU), and often integrated with GNSS.

Hopefully you don’t have to do the work combining these yourself, but you need a high-temporal-resolution data stream of position (XYZ) and attitude (heading pitch roll, or omega phi kappa, or a quaternion describing rotation of the airframe relative to the earth). This needs to be as accurate as possible (post-processed dual-frequency GPS with preferably tightly-coupled processing of IMU and GPS observations).

Navigation instrument and aircraft data

  • Engineering drawings which show the navigation instrument’s reference point
  • The lever arm between the navigation instrument reference point and the LIDAR reference point, in IMU-frame-coordinates
  • The rotation matrix between the IMU and the LIDAR coordinate systems
  • If necessary, the rotation matrix describing the difference between the aircraft’s body frame and the IMU’s internal coordinate system (quite often, this gets simplified by assuming that the IMU frame and the aircraft frame are equivalent – but many operators still account for this – which is awesome!)
  • Any boresight misalignment information you can get (tiny angles representing the difference between how we think instruments were mounted and how they actually were mounted)

Of course, you could also just push your contractors to deliver point-by-point uncertainty or some per-point quality factor as part of the end product.

This series might have one more post, on how to deal with these data – or more accurately a tutorial on how not to. And then I run out of expertise – brain = dumped, and over to people with newer, more up to date experience.

The LiDAR uncertainty budget II: computing uncertainties

In part 1, we looked at one way that a LIDAR point is created. Just to recap, we have 14 parameters (for the 2D scanner used in this example) each with their own uncertainty. Now, we work out how to determine the geolocation uncertainty of our points.

First, let’s talk about what those uncertainties are. An obvious target is GPS positioning uncertainty, and this is the source that comes to mind first. Using a dual-frequency GPS gives the advantage of sub-decimetre positioning, after some post-processing is done. Merging those positions with observations from the IMU constrains those positions even further.

For a quick mental picture of how this works, consider that the navigation unit is collecting a GPS fix every half a second. The accelerometers which measure pitch, roll and heading take a sample every 1/250th of a second. They also keep track of how far they think they’ve moved since the last GPS fix – plus the rate of motion is tracked, putting a limit on how far it is possible to move between each GPS fix. The navigation unit compares the two data sources. If the GPS fix is wildly unexpected, a correction is added to GPS positions and we carry on.

So we get pretty accurate positions (~5cm).

But is that the uncertainty of our point positions? Nope. There’s more. The laser instrument comes with specifications about how accurate laser ranging is, and how accurate it’s angular encoder is.

Even more, the navigation device has specifications about how accurate it’s accelerometers are, and all of these uncertainties contribute! How?

Variance-covariance propagation to the rescue

Glennie [1] and Schaer [2] used Variance-covariance propagation to estimate the uncertainty in geolocation of LiDAR points. This sounds wildly complex, but at it’s root is a simple idea:

uncert(thing1 + thing2) = \sqrt{uncert(thing1)^2 + uncert(thing2)^2}

See that? we figured out the uncertainty of a thing made from adding thing1 and thing2, by knowing what the uncertainties of things1 and thing2 are.

Going from simple addition to a few matrix multiplications was a quantum leap, handled neatly by symbolic math toolboxes – but only after I nearly quit my PhD trying to write out the uncertainty equation by hand.

Here is the uncertainty equation for LiDAR points, whereU is the uncertainty:

U = FC_uF^t

What!! Is that all? I’m joking, right?

Nearly. F is a 14-element vector containing partial derivatives of each the parameters that go into the LiDAR georeferencing equation (the so-called Jacobians). C is a 14 x 14 matrix on which the uncertainties associated with each element occupy the diagonal.

Writing this set of equations out by hand was understandably a nightmare, but more clever folks than myself have achieved it – and while I certainly learned a lot about linear algebra in this process, I took advice from an old astronomer and used a symbolic maths toolbox to derive the Jacobians.

…which meant that the work actually got done, and I could move on with research!

Now my brain is fully exploded – what do uncertainties look like?

Glennie [1] and Schaer [2] both report that at some point, angular motion uncertainty overtakes GPS position uncertainty as the primary source of doubt about where a point is. Fortunately I found the same thing. Given the inherent noise of the system I was using, this occurred pretty quickly. In the Kalman filter which integrates GPS and IMU observations, a jittery IMU is assigned a higher uncertainty at each epoch. This makes sense, but also means that angular uncertainties need to be minimised (for example, by flying instruments in a very quiet aircraft or an electric-powered UAS)

I made the following map years ago to check that I was getting georeferencing right, and also getting uncertainty estimates working properly. 

It could be prettier, but you see how the components all behave – across-track and ‘up’ uncertainties are dominated by the angular component not far off nadir. Along-track uncertainties are more consistent across-track, because the angular measurement components (aircraft pitch and a bit of yaw) are less variable.

The sample below shows LiDAR point elevation uncertainties (relative to an ITRF08 ellipsoid) during level flight over sea ice. At instrument nadir, height uncertainty is more or less equivalent to instrument positioning uncertainty. Increasing uncertainties toward swath edges are a function of angular measurement uncertainty.

But normally LiDAR surveys come with an averaged accuracy level. Why bother?

i. why I bothered:

In a commercial survey the accuracy figure is determined mostly by comparing LiDAR points with ground control data – and the more ground control there is, the better (you have more comparison points and can make a stronger estimate of the survey accuracy).

Over sea ice this is impossible. I was also using these points as input for an empirical model which attempts to estimate sea-ice thickness. As such, I needed to also propagate uncertainties from my LiDAR points through the model to sea-ice thickness estimates.

In other words, I didn’t want to guess what the uncertainties in my thickness estimates were. As far as plausible, I need to know what the input uncertainty is for each thickness estimate is – so every single point. It’s nonsensical to suggest that every observation and therefore every thickness estimate comes with the same level of uncertainty.

Here is another map from my thesis. It’s pretty dense, but shows the process in action:

The top panel is LIDAR ‘height above sea level’ for sea ice. Orange points are ‘sea level reference’ markers, and the grey patch highlights an intensive survey plot. The second panel is uncertainty associated with each height measurment. In panel  three we see modelled sea ice thickness (I’ll write another post about that later), and the final panel shows the uncertainty associated with each thickness estimate. Thickness uncertainties are greater than LIDAR height uncertainties because we’re also accounting for uncertainties in each of the other model parameters (LIDAR elevations are just one). So, when I get to publishing sea-ice thickness estimates, I can put really well made error bars around them!

ii. why you, as a commercial surveyor or an agency contracting a LiDAR survey or a LIDAR end user should bother:

The computation of uncertainties is straightforward and quick once the initial figuring of the Jacobians is done – and these only need to be recomputed when you change your instrument configuration. HeliMap (Switzerland) do it on-the-fly and compute a survey quality metric (see Schaer et al, 2007 [2]) which allows them to repeat any ‘out of tolerance’ areas before they land. Getting an aircraft in the air is hard, and keeping it there for an extra half an hour is easy – so this capability is really useful in terms of minimising costs to both contracting agencies and surveyors. This analysis in conjunction with my earlier post on ‘where exactly to measure in LiDAR heights‘ shows you where you can assign greater confidence to a set of LiDAR points.

It’s also a great way to answer some questions – for example are surveyors flying over GCP’s at nadir, and therefore failing to meet accuracy requirements off-nadir? (this is paranoid ‘all the world is evil’ me speaking – I’m sure surveyors are aware of this stuff and collect off-nadir ground control matches as well). Are critical parts of a survey being captured off-nadir, when it would be really useful to get the best possible data over them? (this has implications for flight planning). As a surveyor, this type of thinking will give you fewer ‘go back and repeat’ jobs – and as a contracting agency, you might spend a bit more on modified flight plans, but not a lot more to get actually really great data instead of signing off and then getting grief from end users.

As an end user of LiDAR products, If you’re looking for data quality thresholds – for example ‘points with noise < N m over a flat surface’, this type of analysis will help you out. I’ve also talked to a number of end users who wonder about noisy flight overlaps, and why some data don’t appear to be well-behaved. Again, having some quality metric around each point will help an end-user determine which data are useful, and which should be left alone.


I certainly stand on the shoulders of giants here, and still incur brain-melting when I try to come at these concepts from first-principles (linear algebra is still hard for me!). However, the idea of being able to estimate in an a-priori quality metric is, in my mind, really useful.

I don’t have much contact with commercial vendors, so I can only say ‘this is a really great idea, do some maths and make your LiDAR life easer!’.

I implemented this work in MATLAB, and you can find it here:

With some good fortune it will be fully re-implemented in Python sometime this year. Feel free to clone the repository and go for it. Community efforts rock!

And again, refer to these works. I’ll see if I can find any more recent updates, and would really appreciate hearing about any recent work on this stuff:

[1] Glennie, C. (2007). Rigorous 3D error analysis of kinematic scanning LIDAR systems. Journal of Applied Geodesy, 1, 147–157. (accessed 19 January 2017)

[2] Schaer, P., Skaloud, J., Landtwing, S., & Legat, K. (2007). Accuracy estimation for laser point cloud including scanning geometry. In Mobile Mapping Symposium 2007, Padova. (accessed 19 January 2017)

The LiDAR uncertainty budget I: georeferencing points

This is part 1 of 2, explaining how uncertainties in LiDAR point geolocation can be estimated for one type of scanning system. We know LiDAR observations of elevation/range are not exact (see this post), but a critical question of much interest to LiDAR users is ‘how exact are the measurements I have’?

As an end-used of LiDAR data I get a bunch of metadata that is provided by surveyors. One of the key things I look for are the accuracy estimates. Usually these come as some uncertainty in East, North and Up measurements, in metres, relative to the spatial reference system the point measurements are expressed in. What I don’t get is any information about how these figures are arrived at, or if they apply equally to every point. It’s a pretty crude measure.

As a LiDAR maker, I was concerned with the uncertainty of each single point – particularly height – because I use these data to feed a model for estimating sea ice thickness. I also need to feed in an uncertainty – so that I can put some boundaries around how good my sea ice thickness estimate is. However, there was no way of doing so in an off the shelf software package – so I implemented the LiDAR geoferencing equations and a variance-covariance propagation method for them in MATLAB  and used these. This was a choice of convenience at the time, and I’m now slowly porting my code to Python, so that you don’t need a license to make LiDAR points and figure out their geolocation uncertainties.

My work was based on two pieces of fundamental research: Craig Glennie’s work on rigorous propagation of uncertainties in 3D [1], and Phillip Schaer’s implementation of the same equations [2]. Assuming that we have a 2D scanner, the LiDAR georeferencing equation is:

\begin{bmatrix}  x \\  y \\  z \\  \end{bmatrix}^m=  \begin{bmatrix}  X\\  Y\\  Z\\  \end{bmatrix} + R^b_m  \left[  R^b_s  \rho  \left(  \begin{gathered}  sin\Theta\\  0\\  cos\Theta  \end{gathered}  \right)  +  \begin{bmatrix}  a_x\\  a_y\\  a_z\\  \end{bmatrix}^b  \right]

The first term on the right is the GPS position of the vehicle carrying the LiDAR:

\begin{bmatrix}  X\\  Y\\  Z\\  \end{bmatrix}

The next term is made up of a few things. Here we have points in LiDAR scanner coordinates:

\rho\left(  \begin{gathered}  sin\Theta\\  0\\  cos\Theta  \end{gathered}  \right)

…which means ‘range from scanner to target’ (\rho ) multiplied by sin\theta to give an X coordinate and cos\theta to give a Z coordinate of the point measured.

Note that there is no Y coordinate! This is a 2D scanner, observing an X axis (across track) and a Z axis (from ground to scanner). The Y coordinate is provided by the forward motion of a vehicle, in this case a helicopter.

For a 3D scanner, or a canner with an elliptical scan pattern, there will be additional terms describing where a point lies in the LiDAR frame. Whatever term is used at this point, the product is the position of a reflection-causing object in the LiDAR instrument coordinate system which is rotated to the coordinate system of the vehicle’s navigation device using the matrix:


The point observed also has a lever arm offset added (the distance in 3 axes between the navigation device’s reference point and the LiDAR’s reference point), so we pretend we’re putting our navigation device exactly on the LiDAR instrument reference point:

\begin{bmatrix}  a_x\\  a_y\\  a_z\\  \end{bmatrix}^b

This mess of terms is finally rotated to a mapping frame using euler angles in three axes (essentially heading, pitch, roll) recorded by the navigation device:


…and added to the GPS coordinates of the vehicle (which are really the GPS coordinates of the navigation system’s reference point).

There are bunch of terms there – 14 separate parameters which go into producing a LiDAR point, and that’s neglecting beam divergence, and only showing single returns. Sounds crazy – but the computation is actually pretty efficient.

Here’s a cute diagram of the scanner system I was using – made from a 3D laser scan and some engineering drawings. How’s that? Using a 3D scanner to measure a 2D scanner. Even better, the scan was done on the helipad of a ship in the East Antarctic pack ice zone!

You can see there the relationships I’ve described above. The red box is our navigation device – a dual-GPS, three-axis-IMU strapdown navigator, which provides us with the relationship between the aircraft body and the world. The green cylinder is the LiDAR, which provides us ranges and angles in its own coordinate system. The offset between them is the lever arm, and the orientation difference between the axes of the two instruments is the boresight matrix.

Now consider that each of those parameters from each of those instruments and the relationships between them have some uncertainty associated with them, which contributes to the overall uncertainty about the geolocation of a given liDAR point.

Mind warped yet? Mine too. We’re all exhausted from numbers now, so part 2 will examine how we take all of that stuff and determine, for every point, a geolocation uncertainty.

Feel free to ask questions, suggest corrections, or suggest better ways to clarify some of the points here.

There’s some code implementing this equation here: – it’s Apache 2.0 licensed so feel free to fork the code and make pull requests to get it all working, robust and community-driven!

Meanwhile, read these excellent resources:

[1] Glennie, C. (2007). Rigorous 3D error analysis of kinematic scanning LIDAR systems. Journal of Applied Geodesy, 1, 147–157. (accessed 19 January 2017)

[2] Schaer, P., Skaloud, J., Landtwing, S., & Legat, K. (2007). Accuracy estimation for laser point cloud including scanning geometry. In Mobile Mapping Symposium 2007, Padova. (accessed 19 January 2017)

LiDAR thoughts – where to measure, exactly?

LiDAR is a pretty common tool for geospatial stuff. It means ‘Light Detection and Ranging’. For the most part involved shining a laser beam at something then measuring how long a reflection takes to come back. Since we approximate the speed of light, we can use the round trip time to estimate the distance between the light source and ‘something’ with a great degree of accuracy. Modern instruments perform many other types of magic – building histograms of individual photons returned, comparing emitted and returned wave pulses, and even doing this with many different parts of the EM spectrum.

Take a google around about LiDAR basics – there are many resources which already exist to explain all this, for example

What I want to write about here is a characteristic of the returned point data. A decision that needs to be made using LiDAR is:

Where should I measure a surface?

…but what – wait? Why is this even a question? Isn’t LiDAR just accurate to some figure?

Sort of. A few years ago I faced a big question after finding that the LiDAR I was working on was pretty noisy. I made a model to show how noisy the LiDAR should be, and needed some data to verify the model. So we hung the LiDAR instrument in a lab and measured a concrete floor for a few hours.

Here’s a pretty old plot of what we saw:

What’s going on here? In the top panel, I’ve spread our scanlines along an artificial trajectory heading due north (something like N = np.arange(0,10,0.01)), with the Easting a vector of zeroes and height a vector of 3’s – and then made a swath map. I drew in lines showing where scan angle == 75, 90, and 115 are.

In the second panel (B), there’s a single scanline show across-track. This was kind of a surprise – although we should have expected it. What we see is that the range observation from the LiDAR is behaving as specified – accurate to about 0.02 m (from the instrument specifications). What we didn’t realise was that accuracy is angle-dependent, since moving away from instrument nadir the impact of angular measurement uncertainties becomes greater than the ranging uncertainty of the instrument.

Panels C and D show this clearly – near instrument nadir, ranging is very good! Near swath edges we approach the published instrument specification.

This left us with the question asked earlier:

When we want to figure out a height reference from this instrument, where do we look?

If we use the lowest points, we measure too low. Using the highest points, we measure too high. In the end I fitted a surface to the points I wanted to use for a height reference – like the fit line in panel B – and used that. Here is panel B again, with some annotations to help it make sense.

You can see straight away there are bears in these woods – what do we do with points which fall below this plane? Throw them away? Figure out some cunning way to use them?

In my case, for a number of reasons, I had to throw them away, since I levelled all my points using a fitted surface, and ended up with negative elevations in my dataset. Since I was driving an empirical model based on these points, negative input values are pretty much useless. This is pretty crude. A cleverer, more data-preserving method will hopefully reveal itself sometime!

I haven’t used many commercial LiDAR products, but one I did use a reasonable amount was TerraSolid. It worked on similar principles, using aggregates of points and fitted lines/planes to do things which required good accuracy  – like boresight misalignment correction.

No doubt instruments have improved since the one I worked with. However, it’s still important to know that a published accuracy for a LiDAR survey is kind of a mean (like point density)  – some points will have a greater chance of measuring something close to where it really is than others. And that points near instrument nadir are more likely to be less noisy and more accurate.

That’s it for now – another post on estimating accuracy for every single LiDAR point in a survey will come soon(ish).

The figure shown here comes from an internal technical report I wrote in 2012. Data collection was undertaken with the assistance of Dr Jan L. Lieser and Kym Newbery, at the Australian Antarctic Division. Please contact me if you would like to obtain a copy of the report – which pretty much explains this plot and a simulation I did to try and replicate/explain the LiDAR measurement noise.

You can call me Doctor now.

Glorious bearded philosopher-king is clearly more appropriate, but Doctor will do.

After six years of chasing a crazy goal and burrowing down far too many rabbit holes in search of answers to engineering problems, I got a letter from the Dean of graduate research back in late October. My PhD is done!

So what exactly did I do? The ten minute recap is this:

1. I assessed the utility of a class of empirical models for estimating snow depth on sea ice using altimetry (snow height). The models were derived from as many in situ (as in holes drilled in the ice) measurements as I could find, and I discovered that they are great in a broad sense (say hundreds of metres), but don’t quite get the picture right at high resolutions (as in metres). This is expected, and suspected – but nobody actually did the work to say how. So this was published (and of course, is imperfect – there is much more to say on the matter).

2. For a LiDAR altimetry platform I spent a lot of time tracking down noise sources, and ended up implementing a method to estimate the 3D uncertainty of every single point. This was hard! I also got quite good at staring at GPS post-processing output, and became quite insanely jealous anytime anyone showed me results from fixed-wing aircraft.

3. Now having in hand some ideas about estimating snow depth and uncertainties, I used another empirical model to estimate sea ice thickness using snow depths estimated from sea ice elevation (see 1), and propagating uncertainty from LiDAR points through to get an idea of uncertainty in my thickness estimates (see 2). Because of spatial magic I did with a robotic total station on SIPEX-II (see the blog title pic – that’s me with my friendly Leica Viva), I could also coregister some under-ice observations of sea ice draft and use them to come up with parameters to use in the sea-ice thickness from altimetry model at the scale of a single ice floe. For completeness, I did the same with a very high resolution (10cm) model I made from 3D photogrammetry on the same site. I then used this ‘validated’ parameter set to estimate sea-ice thickness for some larger regions.

Overall, the project changed direction three our four times, reshaping as we learned more – and really taking shape after some new methods for sea ice observations were applied in 2012.

What I discovered was that it is actually pretty feasible to do sea ice thickness mapping from a ship-deployed aircraft in the pack ice zone. This is important – because it means regions usually very difficult to access from a land-based runway can be examined.

I also showed that observations of sea ice so far may be underestimating sea ice thickness in certain ice regimes – and also likely to be overestimating sea ice thickness in others. This has pretty important implications for modelling the fresh water flux and other stuff (habitat availability) on the southern ocean – so more work is already underway to try and gather more data. Encouragingly, I showed that drill holes are actually quite accurate – and showed how some new approaches to the practice of in situ sampling might markedly increase the power of these observations.

The possibilities of a robotic total station and a mad keen field sampling nutter on an ice floe – endless! (we did some awesome work with a hacky sack and the total station, hopefully coming to light soon).

…and like all PhD theses, the last part is a vast introspection on where the soft underbelly of the project lies, and what could/should be done better next time.

Needless to say, I’m really relieved to be done. I go wear a floppy hat and collect my bit of paper next month.

What next? Subject the ice thickness work to peer review! Aiming for publication in 2017.

FOSS4G 2016 wrap-up


I’ve recently returned from FOSS4G 2016 in Bonn, Germany. It was my first OSgeo conference, and it was quite amazing.

As a little background the OSgeo foundation supports the development of open source ‘geo software’, from desktop packages like QGIS to behind-the-scenes libraries (Proj.4) and even geospatial metadata catalogues (eg Geonetwork). My involvement? Over the years I’ve used parts of the OSgeo stack heavily. Increasingly. This also holds for the National Computational Infrastructure, so it made sense to submit an abstract and try to go.

My talk was going to be about storing and querying point cloud data from HDF files. Unfortunately I didn’t finish my experiments in time, and I ended up presenting a high level view of NCI, our point data storage and analysis needs, and some attempts at using Postgres-pointcloud and SPDlib (a HDF based point and waveform data storage system). This worked well with the session – until that time, most of the point data discussions at FOSS4G revolved around airborne or mobile LiDAR. It was nice to feel a bit like we had some unique problems to solve, and I feel grateful to be able to stand among the giants of OSgeo. In the same session, for example, were the developers of the PDAL and Entwine libraries – technical wizardry right there! I just try to keep up, and apply this stuff to jobs I need to do.

The main takeaway from the conference, however, is not the fact that I gave a talk. It was the amount of insight gained from the conference itself, and just being there. It was incredibly difficult to choose which sessions and workshops to attend, there were so many that were potentially useful for my life. I left feeling like I’d stepped into a seriously strong community of people who care about making great tools, and giving them away.

I chaired a session on web processing services, which was a new experience and really informative – especially Nis Hempelmann presenting on birdhouse services. This has seen immediate application in my work at NCI, so in the spirit of FOSS4G I’m running birdhouse, bugging the developers and have made my first ever commit on the project (a documentation typo – but you get the picture. Get involved).

Some other highlights were:

Ivan Sanchez’s lightning talk presenting what3fucks to the world. This actually has a serious point about how to divide the world into smaller pieces, and also how open software and data can be immediately applied, whereas closed-source options are not necessarily going to live as lively a life. If you build something and share it, it gets used.

Pretty much all of the keynote talks. Tomas Zerweck from Munich RE spoke about FOSS in risk management (which is all of us, science nerdlings, we are the boss of risk managers – without our field observation data, there is nothing). Oddly, every title slide showed Australia, and given our ignoble position as top consumers and chief backwards-pedalers of the world, I wondered if we are seen as a risk to be managed. Probably! Another standout keynote was Peter Kusterer, from IBM Germany, who described how the FOSS community got behind an emergency response to refugees arriving in Germany. Take note also Australia: Wir shaffen das!


In the regular programme – so much to choose from! I went to a discussion on open standards and open software, and was heartened to hear that small developers and big organisations with standards voting rights are happy working partners. I saw an excellent talk by Arnulf Christi on the relative permanence of data compared to software. Mind the data – words to live by, because there’s no point collecting all this information if nobody can use it in 3, 10, 50 years time. My brain was completely exploded by Olivier Courtin, on using postGIS in a real advanced way. Indeed. And I saw as many geoserver-related talks as I could, because it’s my job. But really, there was so much to choose from, and I will gradually catch all the videos of talks I would have liked to see. A colleague of mine just showed me another mind-blowing talk on geotiff.js and plotly that I would never have thought to go to. Here’s the programme:

…and you can download or watch all of the talks from the conference here:

So conference main work things done, here is a photo from the conference dinner – on a boat on the Rhine, with an accordion and cello. Later, after some beers and discussions of life and point clouds, I was dragged onto the dance floor and busted out my finest moves.


…and here a couple of photos from the world conference centre, an amazing venue.



The theme of the conference was ‘building bridges’ – and it certainly met those aims. There was a vast cross section of the spatial data community from insurance companies doing their due diligence about whether to switch to FOSS software stacks, to excited developers presenting brand new ideas. I was able to meet and speak to a number of people I’d only ever heard and read about – making that personal connection which always helps when we need to develop ideas into reality later.

I’m really grateful that I was able to go, and I’m already looking ahead to 2017, and what ideas I can convert into reality so that I’ve got something to bring to the community.

Finally, a side effect of the conference was catching up with an old friend I have not seen in 14 years!  We have surprisingly parallel lives – so what did we do? climbed rocks, of course!img_20160828_155707701

A model of a model of a world



My little people Joe and Oli came up with this amazing landscape, which they called Meerkat world. The meerkats live in a house (a square structure constructed from baskets just left of centre), below a volcano which sticks up out of the snow. A lake drains out to a waterfall, which then flows to a beach off which a pirate ship is anchored. Clearly, there are sharks near the ship!

On the left of the meerkat house is a maze, which prevents tigers and lions from getting to the meerkat house and wreaking havoc. I particularly enjoy the multi-material aspect of their work – the variety of objects used to create Meerkat world is astounding!

An ordinary photo is no justice for this work, so I modelled it instead. This took 89 photos, which were used to generated 15 000 000 points and just over 1 000 000 polygons. Now to find a way to let you interact with this world via or maybe even straight up three.js. Oh, and overcoming file size limitations on my web host!