In early 2018 ago a former colleague, Claire Trenham, asked if I was interested in proposing a chapter for a publication on ‘good data’ for some university publication she’d come across. So I said yes!
We pitched a ‘Good Data Manifesto’, and it was accepted! One and a half years on, it’s in an edited book titled ‘Good Data’. forming part of the Network for Institute Cultures theory on demand series. It was challenging, writing for editors in law and humanities – and ultimately a fantastic exercise in weaving the stories we wanted to tell into into a perspective we’re not used to seeing; and pulling in threads from three editors with sometimes quite different viewpoints.
As geospatial scientists and data managers, we’re both intimately involved in the ideas of precision, accuracy, provenance, repeatability and useability of massive geospatial datasets – from climate models to satellite imagery to centimetre-scale 3D datasets. For example, what do 30 billion sub-metre data points look like – and how do we access them? …or, at another scale, how does one sanely reconcile distributed replication of petascale climate model outputs?
We wrote the Good Data Manifesto initially from this fairly technical viewpoint – as we collate increasingly large and invasive geospatial data from all manner of platforms, how can we manage them appropriately in terms of our human rights, and the right of (actually, our need for) our ecological support systems to exist?
FAIR data are a start…
Using the FAIR data principles as a base, we reasoned that ‘good data’ at a minimum are Findable, Accessible, Interoperable and Reusable. This leads to useful services and enables science in new ways – both fantastic outcomes.
In considering ‘good data’ we also need to consider how, and why data are collected. We need to consider the purpose for their existence. Are data Ethical? Data are never purely objective observations of a world, they are always shaped by our social environment, by politics, by ourselves.
FAIR data are not always ‘good’. For example, systems which enable tracking data re-identification, or result in automated bias or erroneous debt notifications all rely on, and likely create data using FAIR principles. Does that make them ‘good’? An example we used was Australia’s controversial ‘robodebt’ system. The data shared between agencies may adhere to FAIR principles – the outcome is anything but fair.
FAIRER data may help!
At the time of writing the chapter, we had heard the acronym ‘FAIRER’ – adding ethical and revisable to the FAIR principles. We loved the idea! And we spent a lot of hours trying to remember where we’d seen this, and our Google-fu completely failed us, so we failed to cite the idea properly.
In 2022 I’m delighted to provide a source! See this thread from twitter: https://twitter.com/tully_barnett/status/920516528306556931 – and a 2018 keynote talk about data by Deb Verhoeven here: https://www.vala.org.au/vala2018-proceedings/vala2018-plenary-6-verhoeven/
So, thanks for the inspiration, Deb – and my apologies for … well … being a functionally terrible researcher and failing to acknowledge your influence in print.
In the context of our chapter, and back to the original text of this post:
We reasoned that Good Data are ethical.
By this we mean ‘data are collected, held and used with respect’ – for our fellow humans and for our planet. We look at the the case of very high resolution satellite imagery – are they intrinsically ethical? what should we, as humans, have a right to keep secret; and what about our natural world? Does she have rights to her secret places, which are necessarily hard for humans to access and exploit?
…on a similar line, we ask ‘what about the infrastructure required to host data?’ Where is the line between sustainable and reasonable use of resources to build and deploy infrastructure-scale compute capacity, and our need to keep our planet habitable in the long term? Data collection and storage has an environmental cost, and Good Data should consider this.
In human terms, we need to respect privacy and collect data openly, only as required and with full consent. Keßler and McKenzie’s geoprivacy manifesto walks through myriad ways in which data are collected whether the data owner agrees to it or not. These data may meet FAIR principles – could we call them ‘good’? Often, not really.
We also see that Good Data are revisable.
The world is not static. Our home planet is constantly in flux. We, as humans change. People leave old lives and ways behind – we can never assume linearity in behaviour. As earth observers and scientists, we sometimes get things wrong. We fuck up and need to recalculate things.
A desire for larger sample sizes/ statistical modelling must never outweigh a persons right to remake themselves; we should always be able to update old ideas; and we should consider the resources required to keep an ever-growing data collection.
With all this in mind, the Good Data Manifesto makes the case for FAIRER principles as a minimum viable standard for ‘Good Data’.
What next for FAIRER data?
The idea is already out in the wild, we came across a summary in a data smart schools project – which was super flattering! We’d love to see the FAIRER data principles gain traction.
I’ve had conversations here and there around FAIRER data, and the primary objection to the principles are ‘what about data integrity if everything is revisable?’. My answer is always ‘why pretend we are immutable?’. …and perhaps it reflects inexperience – both Claire and I have been privileged enough to see a lot of data in a few different domains… and how/why it gets to where it is today.
The Good Data Manifesto was pulled together in our spare time, as a 100% volunteer effort. Given the time, funding and opportunity to publish a ’next thing’, we’d also investigate data acquisition and usage models like OpenStreetMap, which is actually pretty close to a FAIRER data model; or similar revisable, community-based data systems ( eg openaerialmap ).
I encourage you to go read the chapter in full (it’s free) for the rest of the manifesto, touching also on geoprivacy; and providing a Good Data checklist to work from. The rest of the book is also intriguing and valuable – I’ve got part way through, and I’m looking forward to digesting the rest.
..and if you’re in Brisbane on 27 June 2019, get along to the launch event!
The sales pitch
Spatialised is a fully independent consulting business currently in hibernation / very much slow down mode. The tutorials and write-ups here are free for you to use, without ads or tracking.
If you find the content here useful to your business or research or billion dollar startup idea, you can support production of ideas and open source geo-recipes via Paypal or Patreon; or hire me to do stuff; or hire me to talk about stuff; or buy stuff from the store; or just give me a seat on your advisory board and a 1% stake.