As if on cue, Open Future releases a new brief call for meaningful training data transparency:

Transparency of the data used to train AI models is a prerequisite for understanding how these models work. It is crucial for improving accountability in AI development and can strengthen people’s ability to exercise their fundamental rights. Yet, opacity in training data is often used to protect AI-developing companies from scrutiny and competition, affecting both copyright holders and anyone else trying to get a better understanding of how these models function.

The brief invokes core, Mertonian science norms in its argument to put muscle behind Europe’s AI Act:

The current situation highlights the need for a more robust and enabled ecosystem to study and investigate AI systems and critical components used to train them, such as data, and underscores the importance of policies that allow researchers the freedom to conduct scientific research. These policies must include a requirement that AI providers be transparent about the data used to train models […] as it will allow researchers to critically evaluate the implications and limitations of AI development, identify potential biases or discriminatory patterns in the data, and reduce the risk of harm to individuals and society by encouraging provider accountability.