By Nick Radcliffe
WE are used to synthetic alternatives in many areas of life: synthetic leather, synthetic flight (with flight simulators), synthetic medical implants, and even the promise of synthetic meat. While many synthetic alternatives are widely perceived as inferior to the real thing (synthetic leather), others, at least in certain respects, are plainly superior – perhaps titanium knees or perfect artificial diamonds. The concept of synthetic data may be less familiar.
Where real data describes real events, people, entities and places, synthetic data describes artificial people and events – the places and other entities may be real – while retaining the same shape and structure as corresponding real data. The big idea is that if synthetic data matches real data in key respects, we can use it instead of real data, with little or no loss of fidelity, but vastly increased privacy – for analysis, machine learning, reporting, or education. With rampant identity theft, the surveillance economy and the advent of GDPR, and its possible fines of two per cent of global turnover, there are both ethical and business imperatives for adopting better privacy practices.
Synthetic data is not new: people have always used artificial data for testing and simulations for scenario modelling (“what if?” analyses). What is new is the possibility of training machines to learn the patterns in real data, and then inverting the resulting AI model (in some sense, turning it inside out) to generate synthetic data.
If all goes well, the result will be data that replicates the relevant general patterns in real data, without reproducing the specific features of any real individuals or events. Naturally, verifying that a synthetic dataset accurately captures the key patterns in a real dataset is complex, and it can be even harder to prove that it properly protects privacy, by showing that no specific information from individuals has leaked through to the synthetic data. But progress is rapid, and there are ever-improving procedures that can assure these with high confidence when applied diligently.
As with other synthetic counterparts, there are even cases where synthetic data can be better than the real thing, for example by compensating for biases in real data or encapsulating scenarios that have not been seen or recorded. One obvious example is climate forecasting: we don’t have real data about a (modern) world with 3ºC heating relative to the 18th century, but it’s useful to model what it would be like. Similarly, we don’t have data about how an economy that has undergone decarbonisation functions, or from a credit system free from historical biases and exclusions, but both would be useful to simulate.
Synthetic data is not a silver bullet but provides a vital tool in the arsenal for responsible organisations looking to use and share information from sensitive data responsibly and safely. Britain is well-placed to be in vanguard, with leading academic research, such as Synthpop from Edinburgh, and innovative start-ups such as Hazy: this doesn’t need to be yet another technology in which the US dominates.
Nick Radcliffe is the Chief Data Scientist at Smart Data Foundry, a non-profit organisation at the University of Edinburgh that is using data to improve lives.
Why are you making commenting on The Herald only available to subscribers?
It should have been a safe space for informed debate, somewhere for readers to discuss issues around the biggest stories of the day, but all too often the below the line comments on most websites have become bogged down by off-topic discussions and abuse.
heraldscotland.com is tackling this problem by allowing only subscribers to comment.
We are doing this to improve the experience for our loyal readers and we believe it will reduce the ability of trolls and troublemakers, who occasionally find their way onto our site, to abuse our journalists and readers. We also hope it will help the comments section fulfil its promise as a part of Scotland's conversation with itself.
We are lucky at The Herald. We are read by an informed, educated readership who can add their knowledge and insights to our stories.
That is invaluable.
We are making the subscriber-only change to support our valued readers, who tell us they don't want the site cluttered up with irrelevant comments, untruths and abuse.
In the past, the journalist’s job was to collect and distribute information to the audience. Technology means that readers can shape a discussion. We look forward to hearing from you on heraldscotland.com
Comments & Moderation
Readers’ comments: You are personally liable for the content of any comments you upload to this website, so please act responsibly. We do not pre-moderate or monitor readers’ comments appearing on our websites, but we do post-moderate in response to complaints we receive or otherwise when a potential problem comes to our attention. You can make a complaint by using the ‘report this post’ link . We may then apply our discretion under the user terms to amend or delete comments.
Post moderation is undertaken full-time 9am-6pm on weekdays, and on a part-time basis outwith those hours.
Read the rules here