Openness, identity, identify, propriety, triangulation - a moment?
Not that we're quiet at the moment but something must be in the air - so, like buses, another post!
courtesy (c) MediaBuzz 2017
Steve Wilson has managed to capture in this post, on a specific case to do with health care data, arguably amongst the most sensitive of personal data, a sense that 'we' have lost or are losing (what limited) control we might have had, over our data, over who collects it, on whose behalf they collect, how they analyse and process it - in particular what other data they "join"* [see below] it to, on whose behalf that is in turn done, who that is sold or supplied to.
* join - a short-hand here for the vast array of, to many, unfathomable ways in which 0s and 1s from X data set are ingested, validated/verified, analysed, combined, associated or otherwise linked, directly or indirectly, using techniques and methods in and from data science, statistics, information science, physics, geography, operations research et al with many other data sets to derive new insights, inform new products, define new actions, reinforce existing.....you get/know the picture....(various other short-hands such as AI or machine learning (ML) or even natural language processing (NLP) have made their way into the public domain)
Kieron O'Hara has written on privacy (including the end of), guidance on anonymisation (for data sharers), on online obfuscation, on the semantic web and, per this post, on (de)anonymisation; among his nuggets are 'vaguing-up' and individual consent-based treatment (in the debate over geographically defining a crime event). Remembered here is the line from the 2011 Transparent government, not transparent citizens: a report on privacy and transparency for the Cabinet Office: "There are no complete legal or technical fixes to the deanonymisation problem".
We are familiar with the argument that government, parliament, legislation and regulation struggles to keep up with the rapid evolution of tools and technology, giving rise to horses long having bolted, poor governance, weak compliance et al. Deanonymisation, hacks, leaks, mash-ups are a daily occurrence whether in the media or otherwise. It seems millenials especially are sanguine (why is less clear) about personal data, expect to receive personally targeted ads, will most benefit from triangulation of data sets, be it in health care, dating or at work and so those with their levers in theory on the handles of power should be equally chill. [Those same millenials didn't get much of a say in the EU referendum so that argument works both ways].
As these Smart Cities, surveillance state and related twitter commentary pieces suggest, the language of and about the technology and what it does/produces is instrumental/influential in our acceptance, adoption, anger or passivity. Smart = good, surveillance = bad (or less good, given certain arguments)? Either way there should be a more open debate about the surveillance society, from 2006!
And so back to openness, open data, the role and impact of open by default policy as applied to data. If deanonymisation is a given, which may seem counter-intuitive, or requires aggregation to such temporal, geographical or other levels as to detract from meaningful utility, what can be done, going forward (remembering what horses are already out there) to provide some level of protection from the abject, infinite processing power of the cloud and those who would seek insights, information etc for prurient, research or other ends. Privacy Impact Assessments (PIAs) are one tool. How you apply a PIA, to what element of the process, can though be perverse (or appear to be), as pointed out here. With 'lists' from hacks being publicly tangible in a headline grabbing way that land ownership can't despite Anna Powell Smith and Guy Shrubsole's best efforts, perhaps that was HMLR's focus, and anyway related data sets (e.g. Companies House) are already open, so even if that does make individuals identifiable that's ok, isn't it? After all, "the most interesting thing that will be done with 'our' data will be done by somebody else" [ref a lot of people].
This has been a compelling narrative behind open data release - do interesting things, find new insights, create new value, build new businesses. But as the BT/InLink story suggests, despite the always on availability of the cloud, the willingness of VC to invest and an extensive start-up infrastructure, the value seems to more broadly be accruing to those who already have - UK unicorns are in short supply but UK start-up and SME-land remains vibrant, with 40000 employed in my own part of the digital data economy alone, mostly intermediating to drive value for end users. In theory a national data infrastructure is agnostic (net neutrality not withstanding); but does the on-going capture of the public realm by a privatised, commercial surveillance state already heavily inter-twined with the State (see Snowden or JG Ballard) create a two (or multi-) tier data economy?
In Open data comes to market from a 2013 workshop at Policy Exchange, O'Hara is clear that "It would be absolutely wrong for a provider with state backing to invade existing markets", "Government needs to take the views of those demanding data and supplying information genuinely into account in decision making" and 'government as brand'. State benefit drives state backing or at least not getting too 'in the way'? Where to draw the line, what to release, on what terms (this is key - PIA, privacy, security, context, mosaic potential), what not to release, who to empower, what to allow, how to do that, who's going to shout loudest, who's shouting close by, where do the benefits accrue, what messaging, what evidence?
Challenges for policy makers, but 'a moment' and words to the wise for all players in the data mosaic? After all, re-identification (deanonymisation), per Steve Wilson, and subsequent analysis, association, enrichment and publication can be catastrophic.
courtesy (c) MediaBuzz 2017
Steve Wilson has managed to capture in this post, on a specific case to do with health care data, arguably amongst the most sensitive of personal data, a sense that 'we' have lost or are losing (what limited) control we might have had, over our data, over who collects it, on whose behalf they collect, how they analyse and process it - in particular what other data they "join"* [see below] it to, on whose behalf that is in turn done, who that is sold or supplied to.
* join - a short-hand here for the vast array of, to many, unfathomable ways in which 0s and 1s from X data set are ingested, validated/verified, analysed, combined, associated or otherwise linked, directly or indirectly, using techniques and methods in and from data science, statistics, information science, physics, geography, operations research et al with many other data sets to derive new insights, inform new products, define new actions, reinforce existing.....you get/know the picture....(various other short-hands such as AI or machine learning (ML) or even natural language processing (NLP) have made their way into the public domain)
Kieron O'Hara has written on privacy (including the end of), guidance on anonymisation (for data sharers), on online obfuscation, on the semantic web and, per this post, on (de)anonymisation; among his nuggets are 'vaguing-up' and individual consent-based treatment (in the debate over geographically defining a crime event). Remembered here is the line from the 2011 Transparent government, not transparent citizens: a report on privacy and transparency for the Cabinet Office: "There are no complete legal or technical fixes to the deanonymisation problem".
We are familiar with the argument that government, parliament, legislation and regulation struggles to keep up with the rapid evolution of tools and technology, giving rise to horses long having bolted, poor governance, weak compliance et al. Deanonymisation, hacks, leaks, mash-ups are a daily occurrence whether in the media or otherwise. It seems millenials especially are sanguine (why is less clear) about personal data, expect to receive personally targeted ads, will most benefit from triangulation of data sets, be it in health care, dating or at work and so those with their levers in theory on the handles of power should be equally chill. [Those same millenials didn't get much of a say in the EU referendum so that argument works both ways].
As these Smart Cities, surveillance state and related twitter commentary pieces suggest, the language of and about the technology and what it does/produces is instrumental/influential in our acceptance, adoption, anger or passivity. Smart = good, surveillance = bad (or less good, given certain arguments)? Either way there should be a more open debate about the surveillance society, from 2006!
And so back to openness, open data, the role and impact of open by default policy as applied to data. If deanonymisation is a given, which may seem counter-intuitive, or requires aggregation to such temporal, geographical or other levels as to detract from meaningful utility, what can be done, going forward (remembering what horses are already out there) to provide some level of protection from the abject, infinite processing power of the cloud and those who would seek insights, information etc for prurient, research or other ends. Privacy Impact Assessments (PIAs) are one tool. How you apply a PIA, to what element of the process, can though be perverse (or appear to be), as pointed out here. With 'lists' from hacks being publicly tangible in a headline grabbing way that land ownership can't despite Anna Powell Smith and Guy Shrubsole's best efforts, perhaps that was HMLR's focus, and anyway related data sets (e.g. Companies House) are already open, so even if that does make individuals identifiable that's ok, isn't it? After all, "the most interesting thing that will be done with 'our' data will be done by somebody else" [ref a lot of people].
This has been a compelling narrative behind open data release - do interesting things, find new insights, create new value, build new businesses. But as the BT/InLink story suggests, despite the always on availability of the cloud, the willingness of VC to invest and an extensive start-up infrastructure, the value seems to more broadly be accruing to those who already have - UK unicorns are in short supply but UK start-up and SME-land remains vibrant, with 40000 employed in my own part of the digital data economy alone, mostly intermediating to drive value for end users. In theory a national data infrastructure is agnostic (net neutrality not withstanding); but does the on-going capture of the public realm by a privatised, commercial surveillance state already heavily inter-twined with the State (see Snowden or JG Ballard) create a two (or multi-) tier data economy?
In Open data comes to market from a 2013 workshop at Policy Exchange, O'Hara is clear that "It would be absolutely wrong for a provider with state backing to invade existing markets", "Government needs to take the views of those demanding data and supplying information genuinely into account in decision making" and 'government as brand'. State benefit drives state backing or at least not getting too 'in the way'? Where to draw the line, what to release, on what terms (this is key - PIA, privacy, security, context, mosaic potential), what not to release, who to empower, what to allow, how to do that, who's going to shout loudest, who's shouting close by, where do the benefits accrue, what messaging, what evidence?
Challenges for policy makers, but 'a moment' and words to the wise for all players in the data mosaic? After all, re-identification (deanonymisation), per Steve Wilson, and subsequent analysis, association, enrichment and publication can be catastrophic.

 
Comments
Post a Comment
Thank you for taking the time to ponder my musings and for any contribution you make. Although comments appear immediately (i.e. unmoderated) I will remove (or if possible) edit offensive comments.