Some thoughts on Doing Data Right

I’m playing catch up on the blog – my Drafts is a scary place to look and a lot isn’t going to see the light of day. I’m also not going to be shackled to trying to get stuff done in any sort of chronological order because then nothing will ever get published.

With that in mind, here’s a quick set of reflections from the Doing Data Right conference that I went to on 10 September. The conference was organised by the Scotsman in conjunction with the a range of partners, not least of whom were my University and the Data Driven Innnovation initiative.* Overall I found the event very useful in terms of raising complex questions to which there aren’t easy answers, but which we need to be grappling with if we are to use data ethically and carefully.

Inclusivity

Talat Yaqoob from Equate Scotland emphasised the need for diversity in teams working with data, and Jarmo Eskelinen highlighted the need for the data we work with to be inclusive. This raises some tough questions around the rights of individuals versus notions of collective good when it comes to data sets. A data set based on opt-in data, or which individuals have been able to opt-out of arguably contains bias. How do we test and balance what can be seen as competing rights in all the different scenarios in which we might be using data? (data protection best practice gives us some options here with things like legitimate interest balancing tests, but much more will be needed).

Disaggregation

Arguably one strategy to manage some of the privacy issues around inclusivity above would be greater use of aggregated, anonymised data. To further complicate things, Caroline Criado-Perez eloquently emphasised the dangers of having too little dis-aggregation in data – specifically tackling the issues around a lack of sex-disaggregated data (go read Invisible Women if you haven’t yet).

Failing to disaggregate data by gender, and consider women’s bodies when designing the world, depressingly means I’m more likely to be maimed or die in a car accident, or have a heart attack mis-diagnosed. Excellent. I’m sure there are many other ways in which we might disaggregate data which would help us better design our societies to be more inclusive. But of course disaggregation can start to mean a greater potential for re-identification in anonymised data sets if not handled very carefully.

“Access to information must be balanced with the rights to privacy and data protection. With the increasing use of big data and the demand for data disaggregation to measure the 2030 Agenda, there is a critical need to ensure the protection of these rights, as acknowledged in the call for a data revolution.” (A Human Rights-Based Approach to Data, UN OHCHR)

Diversity

Lesley McAra Director of the Edinburgh Futures Institute (amongst other things) reminded us that we need diversity in the data that we use as well. Beyond thinking about questions of bias, we need to get more sophisticated with the kinds of data we use. Data can be quantitative, qualitative, narrative stories, or visual. We need to have the capacity and capability to work with all these different kinds of data if we want to tackle some of the biggest questions facing society.

No rabbit holes

Elizabeth Hollinger from Aggreko outlined their approach to using data and analytics within their business. She emphasised the need to identify the big questions that can be tackled most easily to ensure maximum benefit from effort spent working with data. She was emphatically clear about the need to stop analysis when the business need has been met. The number of times I’ve spoken with colleagues who want to interrogate data sets because they look interesting. There’s definitely a prospecting / gold rush fever mentality in this space that we have to resist. Not only is it not particularly efficient or productive in terms of resourcing, it’s where the potential for unwarranted intrusion might lie. So, no going down rabbit holes with the data analysis.

Openness

Finally, Catherine Stihler, CEO of the Open Knowledge Foundation called out the need for openness and transparency in what we do with data, in particular more open data sets and open decision making about the use of data.

Across the whole day there was a strong focus on the ethics of collecting and using data. What became most clear to me is that taking an ethical approach involves a constant set of negotiations and considerations, balancing the rights of individuals, the risks of harm, and the potential for collective benefit. Ethical approaches will be helpful, but they must also be transparent to create trust. I think after a few weeks of gut-wrenching revelations about the MIT Media Lab and others, we are long overdue a whole bunch more ethical practice and transparency in the tech industry.


*This is all part of a much larger initiative with the catchy title of the Edinburgh and South East Scotland City Region Deal. In short over £1bn of public and private money through which Edinburgh and the surrounding region will be transformed into the “Data Capital of Europe“.

I’m also not going to wax too lyrical about the irony of a whole afternoon focused on women in data science, but a morning programme that failed to have even one female data scientist on the panel session…

Leave a Reply

Your email address will not be published. Required fields are marked *