To collect or not to collect: that is often not the right question

I saw this tweet a while back, and whilst it’s in the context of data collection in libraries, it speaks to something I hear in edtech conversations more generally and was reminded of again this weekend e.g. how we shouldn’t be collecting all this data about our students. The statement above about not collecting user data is of course absolutely true; but also something I find frustrating and perhaps a little dangerous even. I fear that framing concerns in black and white terms risks that such concerns are too easily dismissed as a lack of understanding about technology more generally. I’ll try to explain my no doubt woolly thinking…

At their most basic level our learning technology systems are some code and some persistent storage, probably an RDBMS. Even if there’s no explicit stats tracking built in (and system audit trails have been a support feature for a very long time so there probably are) there’s *always* data, and even a relatively small of data can have quite a bit squeezed out of it.

As an example, consider a discussion forum. You have a class cohort with access to it, some have posted, some have not. Some have replied to others posts. You are then holding at the most basic level (a) post content (b) relationships between posts (c) author identifiers (d) post dates.

Immediately that’s enough data there to do some rudimentary social network analysis to graph the activity on the forum and relationships between post authors.

You’ve also got the post content, so you can do some topic modelling analysis to identify roughly what’s being discussed.

You can also start to create derived data variables – who *isn’t* there from the larger class cohort (even if your VLE isn’t tracking who is visiting but not posting – which most do in the system audit log btw).

You’ve also got date information, so you can graph how conversations unfold over time, perhaps creating more derived data by considering how close posts are created to key dates on the course, or perhaps even just day of the week – are students posting regularly or in bursts? You could create new a data point for student engagement based on rough thresholds of activity.

You can start to combine some of this too – for example weaving conversational topics into your social network analysis to see which students are clustering around which topics of conversation. You could start to create even more derived data by thinking about network centrality as another rudimentary measure of engagement.

What might you do with this data? You might reach out to the students who haven’t posted at all, or to the lurkers, or to the highly engaged. You might use it to inform how you teach the next iteration of this course. You, or a vendor, might use it to inform development of the software. Who decides on these uses though?

The question then about whether to collect data or not, is in practical terms a fundamental question about whether to use technology or not.

Once you start using whatever system or software, the critical questions very quickly need to move on to what data exists, who sees it, what’s done with it, and how long it persists. We often talk about tracking students in edtech systems like it’s an additional layer of data collection, when in very many cases (though not all) what we’re dealing with is secondary uses for data that already exists as fundamental parts of the systems we use. Using polemical arguments about data collection or not then carries a very real risk of being dismissed as simplistic and naive, and with it go the more vital and nuanced questions about uses of data.

Leave a Reply

Your email address will not be published. Required fields are marked *