The new technology permits us to do exciting things with tracking software. Wave of the future, Dude. 100% electronic.
Have you heard? There’s a war on. The target? IP-based, proxy-enabled, authenticated access to the commercially-owned scholarly literature. The combatants? The largest and most powerful scholarly publishers vs. the librarians and our user base.
As reported by Times Higher Education and Coda, The Scholarly Networks Security Initiative (SNSI) sponsored a webinar in which Corey Roach, CISO for University of Utah, floated the idea of installing a plug-in to library proxy servers, or a subsidized low-cost proxy, for additional data collection. (To be clear he did not advocate for sharing this information with publishers, only that it be collected and retained by libraries for user behavior analysis.) Examples of the data collected in library logs (as distinguished from publisher logs) via the proposal are:
- extensive browser information
- account Information
- customer IP
- URLs requested
- 2-factor device information
- geographic Location
- user behavior
- biometric data
- threat correlation
I question whether such rich personally identifiably information (PII) is required to prevent illicit account access. If it is collected at all, there are more than enough data points here (obviously excluding username and account information) to deanonymize individuals and reveal exactly what they looked at and when so it should not be kept on hand too long for later analysis.
Another related, though separate endeavor is GetFTR which aims to bypass proxies (and thereby potential library oversight of use) entirely. There is soo much which could be written about both these efforts and this post only scratches the surface of some of the complex issues and relationships affect by them.
The first thing I was curious about was, who is bankrolling these efforts? They list the backers on their websites but I always find it interesting as to who is willing to fund the coders and infrastructure. I looked up both GetFTR and SNSI in the IRS Tax Exempt database as well as the EU Find a Company portal and did not find any results. So I decided to do a little more digging matching WHOIS data in the hopes that something might pop out, nothing interesting came of this so I put it at the very bottom.
A simple matrix can help visualize the main players behind the efforts to ‘improve’ authentication and security.
|Part of SNSI|
|Part of GetFTR||Yes||American Chemical Society Publications (ACS)|
Taylor & Francis
Researcher (Blenheim Chalcot)
|No||American Institute of Physics (AIP)|
American Medical Association (AMA)
American Physical Society (APS)
American Society of Mechanical Engineers (ASME)
Cambridge University Press (CUP)
Institute of Electrical and Electronics Engineers (IEEE)
Institute of Physics (IOP)
International Association of Scientific, Technical and Medical Publishers (STM)
International Water Association Publishing (IWA)
The Optical Society (OSA)
|Any/all other publishers and supporting firms.|
|Accurate as of 2020-11-16.|
It should come as no surprise that Elsevier, Springer Nature, ACS, and Wiley - which previous research has shown are the publishers producing the most research downloaded in the USA from Sci-Hub - are supporting both efforts. Taylor & Francis presumably feels sufficiently threatened such that they are along for the ride.
I think it is important to conceptually separate GetFTR from the obviously problematic snooping proposed by SNSI. It would be theoretically possible for GetFTR to dramatically improve the user experience while not resulting in additional data collection about users.
But… as Philipp Zumstein pointed out on Twitter, there are already some ways to improve the linking “problems” and user experience that GetFTR is working on. The fact that they are instead building something new gives them opportunities to control and collect data on usage patterns and users.
Given the corporate players involved here, I am not optimistic. However, I can also see large gains in usability if GetFTR works as advertised. In an ideal world, the usability/privacy tradeoff would be minimal; but as we are reminded on a daily basis, Dr. Pangloss was not a reliable guide.
For now, I have “registered my interest” with the group and am waiting to see how things are fleshed out.
O’Reilly a few years ago introduced a new type of authentication based on user email and they tried to default to it as part of a platform migration. We informed them that we wanted to continue using EZproxy, which they continue to support but have made it very cumbersome. As it currently stands, our users are presented with a different login experience depending upon how they enter the platform. While O’Reilly representatives have not been unresponsive, they clearly want users to authenticate with their “patron validation” method which collects user emails, rather than the shared-secret/proxy which is technically supported but only triggered when users enter from our alphabetical database list.
This provider ended support for proxy access. However, we achieved a slight simplification of the login experience for users that still satisfied our policy obligations through a back and forth conversation about the user variables we as IdP would release to the vendor. It was not a pleasant experience but a tolerable one. If vendors recognize and work with university SSO systems but do not require PII, improvements to user workflow and access are possible. To be clear, what O’Reilly and Fitch have done by moving away from IP access is not GetFTR, which is still in pilot phase.
How might librarians push back against (likely) excessive data collection by SNSI or GetFTR-using platforms? I can think of three tools at our disposal, though the discussion below is not meant to be exhaustive. I cover them in order of their possible strength/severity, in actuality the textual support they provide for pushback against vendors will vary.
Might State laws might have any clauses that could resist a publisher data grab? In California there are two relevant sections which covers library privacy GOV § 6267 and GOV § 6254 (j). GOV § 6254 (j) is no help as it specifically refers to “records kept for the purpose of identifying the borrower of items”, which is not the case with authentication data as in most cases it is not being used to ‘borrow’ anything. GOV § 6267 however could be interpreted in interesting ways. I reproduce the relevant clauses here with my own emphasis.
All patron use records of any library which is in whole or in part supported by public funds shall remain confidential and shall not be disclosed by a public agency, or private actor that maintains or stores patron use records on behalf of a public agency, to any person, local agency, or state agency except as follows:
(a) By a person acting within the scope of his or her duties within the administration of the library.
(b) By a person authorized, in writing, by the individual to whom the records pertain, to inspect the records.
(c) By order of the appropriate superior court.
As used in this section, the term “* * * patron use records” includes the following:
(1) Any written or electronic record, that is used to identify the patron, including, but not limited to, a patron’s name, address, telephone number, or e-mail address, that a library patron provides in order to become eligible to borrow or use books and other materials.
(2) Any written record or electronic transaction that identifies a patron’s borrowing information or use of library information resources, including, but not limited to, database search records, borrowing records, class records, and any other personally identifiable uses of library resources information requests, or inquiries.
This section shall not apply to statistical reports of patron use nor to records of fines collected by the library.
I am not well versed in the law, nor am I aware of any litigation involving § 6267 but it seems to me that a straightforward interpretation of this is that any personally identifiable information collected by vendors is subject to this law and thus must remain confidential. But confidential does not mean that vendors can’t use that PII for their own internal purposes, which is what some in the library community are worried about.
Ultimately, I did not invoke either of these sections in negotiations with O’Reilly or Fitch as there was a more clear and less legalistic option, university policy, detailed below.
The ALA has put together a page of the various state library legislative codes. http://www.ala.org/advocacy/privacy/statelaws Since these don’t change often, I assume it is up to date (unlike some other ALA pages) should any readers want to check how things might shake out in a non-California jurisdiction.
Also, I have yet to take a deep dive into the newly effective California Consumer Privacy Act of 2018, but perhaps that will be useful going forward. Unfortunately, most jurisdictions are not as proactive about privacy as California so they will have to avail themselves of the other tactics listed here.
CalState Long Beach, like virtually all universities - I should hope - has internal policies governing the release of information. Presently, the Information Classification Standard delineates three types of data: confidential, internal, and public.
There is a subheading in this section for Library Patron Information. I include it here in full.
Library database for faculty, staff, students and community borrowers which may contain:
- Home address
- Home phone
- Social Security Numbers
Note the word may. That might lead us to think that this would be a clause that could be liberally interpreted in negotiations with vendors but unfortunately the items explicitly listed as Public (below) make it clear that this section is about shielding employees’ personal and home information, not any data they might generate in the course of their remunerated activities as employees.
As a creature of the state, a lot of institutional information (which vendors no doubt would like to have and incorporate into their models of user behavior) is, and should be public, such as:
Internal information is where things get interesting.
There are a number of demographic characteristics/variables in this category which firms would love to hoover up and feed into whatever models they run on data about their users. Users might voluntarily disclose this information, e.g. by uploading a photograph of themselves to a profile on a vendor platform site. But the policy says this is information which must be protected. The implication being that this information is not of the Public category and that the University (thus library) should not routinely disclose it. Importantly, there is a subheading in this section for Library Circulation Information.
Information which links a library patron with a specific subject the patron has accessed or requested
That was the (in my opinion) the crucial piece of documentation that I provided to the Fitch Solutions staff which helped us carry the day and minimize data exchange and harvesting.
At present, we don’t have any in-house policies specific to authentication. Though I am open to such a move, my feeling is that the stronger play here for us is to continue to use the University policies (and applicable CA laws) in order to push back against overcollection of user data by publishers. A local library-specific policy is surely better than nothing in the absence of such a university policy, but when faculty in need of a specific resource such as a future SNSI-GetFTR-enabled ACS Digital Library come knocking, my suspicion is that some libraries will yield to the demands and implement the publisher’s preferred authentication mechanism. We can’t all be Jenica Rogers.
On can we? A coordinated effort on the part of libraries around the world to draft and enact clear in-house policies that reject SNSI-supported spyware (or anything similar) might just work. The ACS-RELX-SpringNature-T&F-Wiley-leviathan does not conceal its views and aims; neither should we. They want to collect “information about them as a student or an employee” and change contract language in order to “ensure attribute release compliance.” They tremble at the threat that piracy poses and, as pointed out by Sam Popowich, are working to convince everyone that “security and vendor profits should trump user privacy.” The stakes are high.
This post originally listed American Society of Clinical Oncology (ASCO) as a supporter of GetFTR. Angela Cochran, ASCO’s VP of Publishing, who served on the GetFTR advisory board has clarified this via correspondence with me. The ASCO is not a participating partner in GetFTR. I regret the error.
Here are the relevant WHOIS data for each site. Both use the privacy options their hosting providers offer to not reveal important information. In the end, comparison of WHOIS data did not reveal anything interesting.
Registrant Org Domain Proxy Service. LCN.com Limited
Registrant Country gb
Registrar Register SPA
IANA ID: 168
Whois Server: whois.register.it
Registrar Status clientDeleteProhibited, clientTransferProhibited, clientUpdateProhibited
Dates 245 days old
Created on 2020-03-16
Expires on 2021-03-16
Updated on 2020-09-23
Name Servers BRAD.NS.CLOUDFLARE.COM (has 18,590,729 domains)
PAM.NS.CLOUDFLARE.COM (has 18,590,729 domains)
Tech Contact —
IP Address 184.108.40.206 - 73 other sites hosted on this server
IP Location United States Of America - California - San Francisco - Cloudflare Inc.
ASN United States Of America AS13335 CLOUDFLARENET, US (registered Jul 14, 2010)
Domain Status Registered And Active Website
IP History 14 changes on 14 unique IP addresses over 15 years
Hosting History 9 changes on 6 unique name servers over 7 years
Website Title 500 SSL negotiation failed:
Response Code 500
Domain Name: SNSI.INFO Registry Domain ID: D503300001183540550-LRMS Registrar WHOIS Server: whois.register.it Registrar URL: http://we.register.it/ Updated Date: 2020-09-23T07:14:11Z Creation Date: 2020-03-16T12:23:58Z Registry Expiry Date: 2021-03-16T12:23:58Z Registrar Registration Expiration Date: Registrar: Register SPA Registrar IANA ID: 168 Registrar Abuse Contact Email: Registrar Abuse Contact Phone: +39.5520021555 Reseller: Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited Registrant Organization: Domain Proxy Service. LCN.com Limited Registrant State/Province: Worcestershire Registrant Country: GB Name Server: BRAD.NS.CLOUDFLARE.COM Name Server: PAM.NS.CLOUDFLARE.COM DNSSEC: unsigned URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
The Registrar of Record identified in this output may have an RDDS service that can be queried
for additional information on how to contact the Registrant, Admin, or Tech contact of the
queried domain name.
Registrant On behalf of getfulltextresearch.com owner
Registrant Org Whois Privacy Service
Registrant Country us
Registrar Amazon Registrar, Inc.
IANA ID: 468
Whois Server: whois.registrar.amazon.com
Registrar Status ok, renewPeriod
Dates 446 days old
Created on 2019-08-28
Expires on 2021-08-28
Updated on 2020-07-24
Name Servers NS-1001.AWSDNS-61.NET (has 39,473 domains)
NS-1171.AWSDNS-18.ORG (has 35,686 domains)
NS-143.AWSDNS-17.COM (has 17,174 domains)
NS-1760.AWSDNS-28.CO.UK (has 304 domains)
Tech Contact On behalf of getfulltextresearch.com technical contactWhois Privacy Service
P.O. Box 81226,
Seattle, WA, 98108-1226, us(p) 12065771368 IP Address 220.127.116.11 - 13 other sites hosted on this server
IP Location United Kingdom Of Great Britain And Northern Ireland - England - London - Google Llc
ASN United Kingdom Of Great Britain And Northern Ireland AS15169 GOOGLE, US (registered Mar 30, 2000)
Domain Status Registered And Active Website
IP History 1 change on 1 unique IP addresses over 1 years
Registrar History 1 registrar
Hosting History 2 changes on 3 unique name servers over 1 year
Website Title 500 SSL negotiation failed:
Response Code 500
Domain Name: getfulltextresearch.com Registry Domain ID: 2427634873_DOMAIN_COM-VRSN Registrar WHOIS Server: whois.registrar.amazon.com Registrar URL: https://registrar.amazon.com Updated Date: 2020-07-24T22:01:02.974Z Creation Date: 2019-08-28T12:53:20Z Registrar Registration Expiration Date: 2021-08-28T12:53:20Z Registrar: Amazon Registrar, Inc. Registrar IANA ID: 468 Registrar Abuse Contact Email: Registrar Abuse Contact Phone: +1.2067406200 Reseller: Domain Status: renewPeriod https://icann.org/epp#renewPeriod Domain Status: ok https://icann.org/epp#ok Registry Registrant ID: Registrant Name: On behalf of getfulltextresearch.com owner Registrant Organization: Whois Privacy Service Registrant Street: P.O. Box 81226 Registrant City: Seattle Registrant State/Province: WA Registrant Postal Code: 98108-1226 Registrant Country: US Registrant Phone: +1.2065771368 Registrant Phone Ext: Registrant Fax: Registrant Fax Ext: Registrant Email: .whoisprivacyservice.org Registry Admin ID: Admin Name: On behalf of getfulltextresearch.com administrative contact Admin Organization: Whois Privacy Service Admin Street: P.O. Box 81226 Admin City: Seattle Admin State/Province: WA Admin Postal Code: 98108-1226 Admin Country: US Admin Phone: +1.2065771368 Admin Phone Ext: Admin Fax: Admin Fax Ext: Admin Email: .whoisprivacyservice.org Registry Tech ID: Tech Name: On behalf of getfulltextresearch.com technical contact Tech Organization: Whois Privacy Service Tech Street: P.O. Box 81226 Tech City: Seattle Tech State/Province: WA Tech Postal Code: 98108-1226 Tech Country: US Tech Phone: +1.2065771368 Tech Phone Ext: Tech Fax: Tech Fax Ext: Tech Email: .whoisprivacyservice.org Name Server: ns-1001.awsdns-61.net Name Server: ns-1171.awsdns-18.org Name Server: ns-143.awsdns-17.com Name Server: ns-1760.awsdns-28.co.uk DNSSEC: unsigned URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/
For more information on Whois status codes, please visit
Notably, the main corporate firm players themselves do not use privacy services for their domains.