The Scholarly Fingeprinting industry

Note: This essay was recently published in Amerikastudien/American Studies, as part of a Forum on Digitization, Digital Humanities, and American Studies. The essay carries a CC BY-NC-ND 4.0 license.



Elsevier, Taylor & Francis, Springer Nature, Wiley, and SAGE: Many researchers know that the five giant firms publish most of the world’s scholarship. Fifty years of acquisitions and journal launches have yielded a stunningly profitable oligopoly, built up from academics’ unpaid writing-and-editing labor. Their business is a form of IP rentiership—collections of title-by-title prestige monopolies that, in the case of Nature or The Lancet, underwrite a stable of spinoff journals on the logic of the Hollywood franchise.

Less well-known is that Elsevier and its peers are layering a second business on top of their legacy publishing operations, fueled by data extraction. They are packaging researcher behavior, gleaned from their digital platforms, into prediction products, which they sell back to universities and other clients. Their raw material is scholars’ citations, abstracts, downloads, and reading habits, repurposed into dashboard services that, for example, track researcher productivity. Elsevier and the other oligopolist firms are fast becoming, in other words, surveillance publishers (Pooley). And they are using the windfall profits from their existing APC-and-subscription business to finance their moves into predictive analytics.

Elsevier is the farthest along. In 2015, its parent company RELX Group announced its “transformation” from publisher to a “technology, content and analytics-driven business,” adding that the firm is “systematically migrating all of our businesses towards electronic decision tools” (RELX Group, Annual Report 2014 5, 4). By then, Elsevier’s decade-long acquisition binge, up and down the research lifecycle, was already underway. In the past decade, it acquired Pure (2012), Mendeley (2013), Newsflo (2015), SSRN (2016), bepress (2017), Parity Computing (2019), and, in spring 2022, Interfolio, the “Faculty Information System” provider. Together with ScienceDirect, the firm’s web-based journal delivery platform, and Scopus, its citation index, Elsevier has assembled a portfolio of knowledge products that spans lab software to research assessment. These are, in a sense, services with benefits: reference management from Mendeley and journal access from ScienceDirect both furnish scholars’ behavioral data back to Elsevier. The company then sells the processed data back to universities and other clients in the form of “research intelligence,” i. e., prediction products like SciVal and Pure that score researcher impact and productivity.

Elsevier, to borrow a computing phrase, has become a full-stack publisher. Its thousands of journals might be seen as data-delivery vehicles—in themselves and by way of trackable engagement. Though some of these researcher-facing services are costly indeed, the core dynamic is not unlike the surveillance businesses built by Google and Facebook (Zuboff). The key difference is that Elsevier gets to charge its customers twice, first through sky-high subscription-and-APC rates and, secondly, for the “decision tools” generated by the legacy business’s behavioral surplus (RELX Group, Annual Report 2021 5). As CUNY law professor Sarah Lamdan put it in a 2021 talk, “[y]our journals are spying on you” (Your Journals). Earlier this year, internet sleuths discovered that Elsevier had embedded a per-download tracker in its PDF metadata (Hansen). Psychologist Eiko Fried followed up with a GDPR data request, which yielded a spreadsheet haul of torrential size. The company, Fried revealed, is tracking article engagement at the granularity of specific image views. The precise ways that these and other data are mined, sorted, and processed into prediction products like SciVal is, of course, shrouded in proprietary secrecy. Elsevier touts what it calls its Fingerprint® Engine, which applies machine learning to its vast trove of researcher data (“signals”) to assign, for example, a list of weighted concepts to a particular researcher (Picadio). As the RELX Group boasts in its latest annual report, the company’s “research intelligence portfolio”—sold to university management, corporate R&D executives, funders, and policy-makers—now generates over a third of Elsevier’s revenue (Annual Report 2021 21, 23). The company states that it expects to improve on its 2021 profit margin which, at 38 percent, places Elsevier among the world’s most lucrative businesses.

The other publishing colossi are playing catch up. Taylor & Francis, a unit of the UK-based intelligence conglomerate Informa Group, has been expanding its “knowledge services” through acquisitions like the Faculty of 1000 platform last year (Annual Report 2021 51–55). The division’s profit margin, at 37 percent, was just hairs off the Elsevier pace (51). Wiley, meanwhile, recently rolled out its journal platform Literatum, built by the software firm it acquired in 2016, Atypon. “Know thy reader,” reads the firm’s pitch. “Literatum’s analytics module tracks and combines publishing-specific content usage data with readers’ site behavior” (Atypon). Wiley’s margin last year was 35 percent (John Wiley & Sons 32). Springer Nature’s parent company, Holtzbrinck, for its part, owns its own full-stack research lifecycle offerings, including the Scopus competitor Dimensions, Pure competitor Symplectic, impact tracker Altmetric, and data repository figshare (Holtzbrinck).

Elsevier’s main competitor, tellingly, is Clarivate, a firm that began as the Institute for Scientific Information (ISI) in the late 1950s (Wouters). ISI’s founder, Eugene Garfield, helped establish the field of bibliometrics through the company’s Science Citation Index. In 2016, ISI was spun off as Clarivate in a $3.5 billion private equity deal, with Garfield’s citation index—renamed Web of Science—the new company’s crown jewel (Clarivate 5, 12–13). Sold to over 9,000 universities and other customers, Web of Science builds on what was, in Garfield’s citation graph, the original academic prediction product. What Clarivate is selling, after all, is bets on future scholarly productivity and impact. A key growth strategy, the company states, is “moving up the value chain by providing our customers with predictive and prescriptive analytics” (Clarivate 10). Late last year Clarivate—which reported an astonishing 42 percent profit margin—acquired ProQuest, the sprawling library vendor, for over $5 billion (Clarivate 9, 13). The data generated from ProQuest’s library products will almost certainly feed Clarivate’s own “research intelligence” offerings, Converis and InCites. If anything, Elsevier’s leg up on Clarivate has been its access to the rich behavioral surplus produced by its publishing business.

More acquisitions and inter-firm jockeying will proceed at the pace of Wall Street. What is fast emerging is a small band of vertically integrated knowledge brokers, most of them, in Björn Brembs’s phrase, “corporations formerly known as publishers” (“Off to Paris”). Elsevier and its peers, indeed, have used their enormous publishing profits to finance their full-stack acquisitions. In that respect, surveillance publishing is an insult-to-injury story. Scholars justly complain about the insanely lucrative scholarly publishing industry, whose subscription and APC windfalls are made off their unpaid labor. Now Wiley and the others are extracting a second rent, without the consent or notice of scholars.

Most scholars, after all, have no idea that their behavioral cream is getting skimmed for profit. If widely exposed, these next-level predations could build momentum for a nonprofit, academy-led alternative to the oligopolists. As historian Aileen Fyfe has chronicled, the current joint-custody arrangement—nonprofit universities and for-profit publishers—is a recent and reversible development. A community-owned infrastructure is, with slow care, getting built out, with the aim to support new and established scholar-led publishing initiatives. Another scholarly communication world really is possible. We need, however, researcher buy-in in light of predictable—if short-run—prestige penalties; funders and librarians, too, must be shaken from their APC-and-subscription slumbers. The emerging surveillance publishing economy, in that respect, is an opportunity of sorts. A range of scholar-critics, including Renke Siems, George Chen, Leslie Chan, Björn Brembs (“Algorithmic Employment”), and Sarah Lamdan (Data Cartels), have begun to sound the alarm. Our task is to amplify their accounts—to spread the word about surveillance profits—in support of the campaign to restore custody over scholarly publishing.




Works Cited

Atypon. “Analytics.” Atypon, n. d. Web. 20 Aug. 2022. https://www.atypon.com/products/literatum/analytics/.

Brembs, Björn. “Algorithmic Employment Decisions in Academia?” björn.brembs.blog. Björn Brembs, 23 Sept. 2021. Web. 12 Sept. 2022. http://bjoern.brembs.net/2021/09/algorithmic-employment-decisions-in-academia/.

—. “Off to Paris for #FENS2022 with Two Posters.” björn.brembs.blog. Björn Brembs, 8 July 2022. Web. 12 Sept. 2022. http://bjoern.brembs.net/2022/07/off-to-paris-for-fens2022-with-two-posters/.

Chen, George, and Leslie Chan. “University Rankings and Governance by Metrics and Algorithms.” Research Handbook on University Rankings. Ed. Ellen Hazelkorn and Georgiana Mihut. Cheltenham: Edward Elgar, 2021. 425-43. Print.

Clarivate. “Form 10-K.” 1-153. Web. 12 Sept. 2021. https://s25.q4cdn.com/843006813/files/doc_downloads/2022/05/2021_12-Clarivate-Plc-FSs-DOC-10K-(32).pdf

Elsevier. “Elsevier Fingerprint Engine.” Elsevier. Elsevier, n. d. Web. 12 Sept. 2021. https://www.elsevier.com/solutions/elsevier-fingerprint-engine.

Fried, Eiko. “Welcome to Hotel Elsevier: You Can Check-Out Any Time You Like … Not.” Eiko-fried.com. Eiko Fried, 9 May 2022. Web. 12 Sept. 2022. https://eiko-fried.com/welcome-to-hotel-elsevier-you-can-check-out-any-time-you-like-not/.

Fyfe, Aileen. “Self-Help for Learned Journals: Scientific Societies and the Commerce of Publishing in the 1950s.” _History of Science _60.2 (2022): 255-79. Web. 15 Dec. 2022. https://doi.org/10.1177/0073275321999901.

Hansen, Morten. “Building Education Assets, One Crumb at a Time.” _The Post-Pandemic University _20 Mar. 2022. Web. 12 Sept. 2022. https://postpandemicuniversity.net/2022/03/20/building-education-assets-one-crumb-at-a-time/.

Holtzbrinck Publishing Group. “About Us.” Holtzbrinck Publishing Group. Georg von Holtzbrinck GmbH & Co., n. d. Web. 12 Sept. 2022. https://www.holtzbrinck.com/.

Informa Group. Annual Report 2021: Digital & Data Acceleration. London: Informa Group, 2022. Web. 12 Sept. 2012. https://www.informa.com/globalassets/documents/investor-relations/2022/informa-annual-report-2021.pdf.

John Wiley & Sons. “Form 10-K.” 2022, 1-111. Web. 12 Sept. 2022. https://s27.q4cdn.com/812717746/files/doc_financials/2022/q4/Wiley-10K-Annual-Report.pdf.

Lamdan, Sarah. Data Cartels: The Companies That Control and Monopolize Our Information. Stanford, CA: Stanford UP, 2022. Print.

—. “Your Journals Are Spying on You: Research Surveillance in Library Products.” Videotaped Presentation, Indiana University Bloomington Libraries, 22 Oct. 2021. Web. 12 Sept. 2022. https://media.dlib.indiana.edu/media_objects/76537m18z.

Picadio, Doug. “Fingerprinting: What Is It, and How Can I Use It.” Presentation. Pure International Conference, Barcelona, 10 Oct. 2017. Web. 12 Sept. 2022. https://www.elsevier.com/__data/assets/pdf_file/0004/525613/Day1_Sala3_11_50_D_Picadio.pdf.

Pooley, Jefferson. “Surveillance Publishing.” The Journal of Electronic Publishing 25.1 (2022): 39-49. Web. 15 Dec. 2022. https://doi.org/10.3998/jep.1874.

RELX Group. “Annual Report and Financial Statements 2014.” London: RELX Group, 2015. Web. 3 Oct. 2022. https://www.relx.com/~/media/Files/R/RELX-Group/documents/reports/annual-reports/2014-annual-report.pdf.

—. “Annual Report and Financial Statements 2021.” London: RELX Group, 2022. Web. 12 Sept. 2022. https://www.relx.com/~/media/Files/R/RELX-Group/documents/reports/annual-reports/relx-2021-annual-report.pdf.

Siems, Renke. “When Your Journal Reads You: User Tracking on Science Publisher Platforms.” Elephant in the Lab. Zenodo, 14 Apr. 2021. Web. 12 Sept. 2022. https://zenodo.org/record/4683778#.Y1A0xi8RpQI.

Wouters, Paul. “Eugene Garfield (1925–2017).” _Nature _543 (2017): 492. Web. 12 Sept. 2022. https://www.nature.com/articles/543492a.

Zuboff, Shoshana. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. New York: Public Affairs, 2019. Print.

‘University of Texas System and Coursera Launch the Most Comprehensive Industry Micro-Credential Program Offered by a U.S. University System’

Jeff Maggioncalda, CEO of Silicon Valley for-profit Coursera, on the Coursera blog:

The job market is changing rapidly, and to meet new employer and student demands, universities must also evolve. Today, I’m excited to announce that Coursera and the University of Texas System (UT) have launched a new industry micro-credential program with a goal to prepare every UT campus student, faculty, staff, and alumni for the state’s workforce demands, at no cost to them.

Adds Maggioncalda:

This innovative new program shows where the future of higher education is headed. 

The post is full of innovate-or-die braggadocio. The rhetorical cocktail of hype and “must evolve” necessity is, here as elsewhere, in the service of corporate capture of the nonprofit university tradition. It’s depressing to read UT describe the deal—an embarrassing surrender-cum-outsourcing of its core educational mission—in the same breathless key.

‘American Sociological Association, in absentia but not silent on open science’

Philip Cohen, on his blog, addressing the American Sociological Association’s (ASA) shameful obstructionism on open access:

Alondra Nelson has had a storied career in American social science. After joining the Yale sociology faculty in 2009, she wrote, among many other works, two crucial books: Body and Soul: The Black Panther Party and the Fight Against Medical Discrimination (2013), and The Social Life of DNA: Race, Reparations, and Reconciliation after the Genome (2016). After moving to Columbia, she became Dean of Social Science in 2014, and then, in 2017, President of the Social Science Research Council.

And:

Needless to say, ASA was delighted to report it when, in 2021, she was named by President Biden to be Principal Deputy Director of the Office of Science and Technology Policy (OSTP) for Science and Society. … Then, in 2022, she was named acting head of OSTP, “the first African American and first woman of color to lead US science and technology policy.” At which point — ASA said nothing. … What happened? Long story short: ASA is fundamentally, strongly, consistently, organizationally, opposed to the crowning achievement of Nelson’s work at OSTP, known around the world as the “Nelson Memo.” It’s subject: “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research.” Which is exactly what ASA does not want.

The ASA was a signatory to the notorious and jingoistic 2019 “Dear President Trump” letter, with silence since.

As Cohen concludes:

The organization is a perpetual stagnation machine addicted to a toxic diet of publishing rents…

The key issue, at the ASA and some (but certainly not all) learned societies, is dependence on tolled publishing revenue. It’s a hard nut to crack, without resorting to APCs, but there’s lots of interesting experimentation going on, including subscribe-to-open.

MIT’s New Full-Book PDF Download Button

Speaking of the MIT Press, sometime in mid-April the press’s OA books began including a full-book, single-button download.1 Finally!

A screenshot of an MIT Press online book, with a full-book pdf button

As I and others have complained, the chapter-by-chapter download mode used by JSTOR, Project MUSE, and a number of OA publishers (MIT too, until recently) is a download-and-concatenate nightmare. It’s also baffling: Beyond edited collections, who wants just a single chapter? I always wondered if the chapter approach was publisher-driven sand-in-the-download-gears, to make OA access inconvenient enough to drive sales. Who knows. Either way, a big win for the MIT Press.


  1. The last book I could find without the Book PDF button was published April 18, 2023. 

‘The Corporate Capture of Open-Access Publishing’

An excellent Chronicle piece [paywalled, alas] from Sarah Kember (Goldsmiths Press) and Amy Brand (the MIT Press), on the slate of well-intentioned OA policies from the U.S., Europe, and Britain:

As the heads of progressive university presses on two sides of the North Atlantic, we support open and equitable access to knowledge. If history is any guide, however, the new policies may unintentionally contribute to greater consolidation in academic publishing — and encourage commercial publishers to value quantity over quality and platforms over people. Unless the new open-access policies are accompanied by direct investment from funders, governments, and universities in nonprofit publishers and publishing infrastructure, they could pose a threat to smaller scholarly and scientific societies and university presses, and ultimately to trust in published knowledge.

The commentary includes sharp takedowns of read-and-publish deals, as well as commercial-publisher data hoovering.

If I have a critique, it’s that the authors are vague about whether “truly public knowledge” should or must be open. They imply as much, and suggest direct (or collective) funding along MIT’s Direct-to-Open, with a nod to “state-owned, noncommercial platforms” (Europe!). Still, it would be possible to read the piece’s incisive critique of corporate OA as a warning agains the “false promise of ‘openness’” tout court.

I suspect the ambiguity is a result, in part, of the very challenging OA economics of university presses—especially those, unlike Kember’s Goldsmiths, built on legacy, print-based models. Though a small number of legacy presses—MIT and Michigan, for example—are leverage direct funding (with back-catalogue access as a carrot) to open up new books, most other U.S. university presses can’t—not with their cost structure—easily publish OA monographs without a large, author-excluding book processing charge (BPC). It’s telling that BPCs aren’t mentioned in the piece, even as Kember and Brand (rightly) call out Springer Nature et al for their usurious APCs.

They’re right, to wrap the point, that the nonprofit university press sector is an indispensable part of any future community-led publishing infrastructure. Yes. Still, the UP world will need to drop the BPC route, and turn instead to direct funding from libraries, host universities, and other funders.

‘What’s the point of having open scholarly infrastructures and how do we test their resilience?’

Martin Eve:

For me, the fundamental meta-principle, or ideal, that underpins POSI (the Principles of Open Scholarly Infrastructure) is forkability and persistence. Taken on aggregate and implemented, an organization that signs up for POSI should be duplicable. That is: I should be able, as a reasonably technically competent individual, to acquire all the components of a POSI-posse signatory, and rebuild/resurrect their technical architecture.

Adds Eve:

Certainly, this can be a scary proposition to those unschooled in thinking this way. Might not other organizations just usurp us if we do this? What’s to stop someone else just stepping in and re-selling all of our data?

Forkability and persistence, for sure. But why not foreclose some of the nightmare scenarios with non-commercial licensing? Eve lists NC licenses as among the ways that an organization might skirt POSI principles without fulfilling their spirit:

Likewise, you might comply with the spirit of POSI by licensing your data openly, but under conditions that limit who could ever resurrect the project (e.g. CC-BY-ND, CC-BY-NC, or, even, CC-BY-SA – even though I am usually a fan of ShareAlike licenses).

I respectfully disagree. Indeed, a major flaw in the POSI principles is that they don’t make an explicit call-out to nonprofit status. Scholarly infrastructure shouldn’t just be open, but nonprofit too. The alternative is capture-by-acquisition.1


  1. Eve had his own, OA monograph experience with CC BY profiteering. 

Jeff Pooley is professor of media & communication at Muhlenberg College and director of mediastudies.press, an open access scholarly publisher.

pooley@muhlenberg.edu | press@mediastudies.press

CV
Publications
@jeffersonpooley
Orcid
Humanities Commons
Google Scholar




Projects


mediastudies.press

A non-profit, scholar-led publisher of open-access books and journals in the media studies fields
Director

History of Media Studies

An open access, refereed academic journal
Founding co-editor

MediArXiv

The open archive for media, film, & communication studies
Founding co-coordinator

Open Access in Media Studies

To promote open access publishing in the field of media studies
Founding co-editor

Annenberg School for Communication Library Archives

Archives consulting, Communication Scholars Oral History Project, and History of Communication Research Bibliography & Archival Directory
Consultant