‘Towards Robust Training Data Transparency’

As if on cue, Open Future releases a new brief call for meaningful training data transparency:

Transparency of the data used to train AI models is a prerequisite for understanding how these models work. It is crucial for improving accountability in AI development and can strengthen people’s ability to exercise their fundamental rights. Yet, opacity in training data is often used to protect AI-developing companies from scrutiny and competition, affecting both copyright holders and anyone else trying to get a better understanding of how these models function.

The brief invokes core, Mertonian science norms in its argument to put muscle behind Europe’s AI Act:

The current situation highlights the need for a more robust and enabled ecosystem to study and investigate AI systems and critical components used to train them, such as data, and underscores the importance of policies that allow researchers the freedom to conduct scientific research. These policies must include a requirement that AI providers be transparent about the data used to train models […] as it will allow researchers to critically evaluate the implications and limitations of AI development, identify potential biases or discriminatory patterns in the data, and reduce the risk of harm to individuals and society by encouraging provider accountability.

‘AI Act fails to set meaningful dataset transparency standards for open source AI’

Open Future’s Alek Tarkowski, writing in March about Europe’s AI Act:

Overall, the AI Act does not introduce meaningful obligations for training data transparency, despite the fact that they are crucial to the socially responsible development of what the Act defines as general purpose AI systems.

Tarkowski’s post is nuanced, and well worth a read. My mind kept drifting to the scholarly-publishing case—in which scholars’ tracked behavior, citation networks, and full-text works might train propriety models built by the likes of Elsevier. As Tarkowski hints here—echoing Open Future’s July 2023 position paper—open science norms around data sharing should be brought to bear on legislation and regulation. The case for FAIR-like principles to apply to models trained on scholarly data is stronger still.

‘Publishers can’t be blamed for clinging to the golden goose’

I missed this Steven Harnad piece from last May. It is trademark Harnad:

So, you should ask, with online publishing costs near zero, and quality control provided gratis by peer reviewers, what could possibly explain, let alone justify, levying a fee on S&S [scientists and scholars] authors trying to publish their give-away articles to report their give-away findings? The answer is not as complicated as you may be imagining, but the answer is shocking: the culprits are not the publishers but the S&S authors, their institutions and their funders! The publishers are just businessmen trying to make a buck. […] Under mounting ‘open access’ pressure from S&S authors, institutional libraries, research funders and activists, the publishers made the obvious business decision: ‘You want open access for all users? Let the authors, their institutions or their research funders pay us for publication in advance, and you’ve got it!’

Harnad, the original (and wittiest) advocate for the “green” repository route, is basically right. It’s not just scholars, of course—we’re not free agents when it comes, say, to productivity metrics imposed by university managers. But the academic system as a whole (funders included) is responsible for letting the oligopolist publishers laugh, as Harnad has it, all the way to the bank.

‘He Wanted Privacy. His College Gave Him None’

I missed this great Markup piece when it was published last November. It tells the story of dorm-to-classroom surveillance through the lens of a California college student:

By the time Natividad went to bed that night, Google and Facebook had data about which Mt. SAC webpages he’d visited, and a company called Instructure had gathered information for his professors about how much time he’d spent looking at readings for his classes and whether he had read messages about his courses. Campus police and a company called T2 Systems potentially had information about what kind of car he was driving and where he parked. And as he drifted off to sleep, Natividad had to contend with the worry that, later this semester, his professors could subject him to the facial detection software incorporated into the remote proctoring tools used at Mt. SAC.

The Markup story touches on textbook surveillance:

This semester, one of Natividad’s professors assigned a digital textbook through Cengage, a publishing company turned ed tech behemoth. […] According to Cengage’s online privacy policy, the company collects information about a student’s internet network and the device they use to access online textbooks as well as webpages viewed, links clicked, keystrokes typed, and movement of their mouse on the screen, among other things. The company then shares some of that data with third parties for targeted advertising. For students who sign into Cengage websites with their social media accounts, the company collects additional information about them and their entire social networks.

The Markup story might have added: When a student turns to research a term paper, they’re also being tracked there. Surveillance publishers like Elsevier harvest a shocking amount of data through their article-delivery platforms. Your journals, to paraphrase Sarah Lamdan, are spying on you.

‘Thomson Reuters announces expanded vision to provide GenAI assistant for every professional it serves’

The information conglomerate Thomson Reuters, in a press release announcing an “expanded vision” for its “professional-grade GenAI assistant”:

CoCounsel is an AI assistant that acts like a team member – handling complex tasks with natural language understanding. Completing tasks at superhuman speeds, CoCounsel provides high-quality information at the right time, maintains multiple threads of work, as well as keeping context and memory across the different tasks and products customers use each day. By augmenting professional work with GenAI skills, CoCounsel delivers accelerated and streamlined workflows, enables professionals to produce higher-quality work more quickly, all while keeping customer data secure.1

The CoCounsel name is, it seems, a nod to Thomson Reuters’ Westview and other legal businesses—and a lazy riff on Microsoft’s Copilot. Either way, another publishing-adjacent colossus picking up its pace in the race to re-monetize “content” through AI.


  1. Probably written by CoCounsel.  ↩︎

‘Academic Life Is About Humiliation and Envy. This Novel Gets It.’

My short piece in the Chronicle Review (paywalled, alas, but here’s a PDF), on C.P. Snow’s The Masters (1951):

What Snow captures is the outsize role pride plays in faculty life. We are, nearly all of us, vulnerable like this — a single snub is enough. We live in a hothouse of peer esteem, poised for humiliation, our dignity always in question. Snow shows this — or, rather, he tells it, through paragraphs of psychological portraiture. It’s this tell-not-show realism that struck Leavis as ponderous and cringeworthy. But Leavis is wrong: What’s best about The Masters is its sharply observed phenomenology. This is how the book transcends the cloistered male world of an unnamed Cambridge college in the late 1930s — why it feels fresh, even contemporary.

The piece—whose headline should be “Academic Life is About Injured Pride”—is really a footnote off of Vivian Gornick’s brilliant 2021 Harper’s essay “‘Put on the Diamonds’:Notes on Humiliation.”

Jeff Pooley is affiliated professor of media & communication at Muhlenberg College, lecturer at the Annenberg School for Communication at the University of Pennsylvania, and director of mediastudies.press.

pooley@muhlenberg.edu | jeff.pooley@asc.upenn.edu
press@mediastudies.press

CV
Publications
@jpooley@scholar.social
Orcid
Humanities Commons
Google Scholar


Projects

mediastudies.press

A non-profit, scholar-led publisher of open-access books and journals in the media studies fields
Director

History of Media Studies

An open access, refereed academic journal
Founding co-editor

History of Social Science

A refereed academic journal published by the University of Pennsylvania Press
Founding co-editor

Journal of Communication Forum

The Journal of Communication’s review-essay section
Co-editor

Annenberg School for Communication Library Archives

Archives consulting, Communication Scholars Oral History Project, and History of Communication Research Bibliography
Consultant

MediArXiv

The open archive for media, film, & communication studies
Founding co-coordinator