This version is published by LIBER at ZENODO, October 2020.
Publishers and suppliers of licensed online resources want to provide authorized users of institutions for higher education and research with access to their services in a controlled way. The commonly used access method based on IP address has limits when users want access from anywhere and any device at any time. Solutions based on federated authentication and Single Sign-On (SSO) are viable alternatives, as long as attention is paid to how these connections are configured. Libraries should protect the privacy of their users who in turn, should have control over their privacy.
In order to make configuration and management of federated authentication easier for both libraries as well as publishers, scholarly libraries from around the world have agreed on the following guidelines to control access to services based on licensed content.
This document aims to function as a reference for libraries and publishers who want to set up an SSO connection. Principle 4 is the core principle for this action. The library has to make a choice whether it will implement principle 4.A or 4.B. This reference is intended to be beneficial for both libraries and publishers.
These 2 terms below are helpful to understand the content of this document.
· Publishers are Service Providers (SP)
· Institutions/libraries are Identity Providers (IdP)
Please refer to the table of Terms and Definitions below.
SSO Implementation Principles
Principle 1: Legal Compliance
The configuration and solution have to be in line with data protection regulations, in particular the General Data Protection Regulation (EU GDPR).
Principle 2: Protocol
For access to services based on licensed content, next to the option of access based on IP addresses, it is recommended to use the SAML 2.0 protocol (or its follow-up technology OIDC/OAuth2 if the involved IdPs are able to handle it) to connect and control access.
Principle 3: Federation
eduGAIN has been established as a proper means to interfederate between identity federations, and thus enables service providers to greatly expand their user base. FIM4L encourages publishers to make use of eduGAIN.
Principle 4: Authentication
There are two recommended options for authentication attributes, Transitory Access (4.A) and Personalized Access (4.B) Both are defined by degree of privacy control. Transitory Access is the most private.
If the purpose of the service is to recognize returning users, so it can present personalized features such as saved searches, profile-based recommendations for reading articles, etc, then Personalized Access is recommended for providing these options to users.
4.A. Transitory Access – This access holds the highest level of privacy.
The publisher only requires a transient identifier: “privacy star”. During a session the user is identified by a transient identifier (NameID) containing a unique alphanumeric string for a certain Service Provider (SP). If the user logs in again, a new transient identifier will be generated. This allows for maximum privacy. It doesn’t allow the publisher to recognize a returning customer, which makes it impossible to know what resource is downloaded by the same user. In exceptional cases, for example where misconduct is suspected, users could be identified if libraries (IdPs) have configured their systems to allow for a thorough investigation of log files, and if libraries are willing to carry out this investigation.
4.B. Personalized Access – Maintains a high level of privacy based on a pseudonym, and more user information and tracking can be added.
The publisher requires a persistent but targeted identifier: “personalisation and subject tracking possible”. A persistent identifier (ID) contains a unique alphanumeric string, like the transient one, identifying the user for a specific SP, but persisting over multiple sessions. The same ID is then used for the same user on every authentication. This is an option for services that have a need to recognize returning customers, for instance so it can present your files, your orders etc. In SAML the Pairwise Subject Identifier is preferred over eduPersonTargetedID (deprecated) and SAML 2.0 persistent NameID.
When opting for a persistent ID, consider the following:
· A persistent ID allows the library (not the publisher) to translate the ID to a patron in case of misconduct.
· It is possible to lock down access for a particular user in case of misconduct.
· A persistent ID (like the Pairwise Subject Identifier, pairwise-id) is sufficient for the SP to provide personalization features. Sometimes an SP requests more information, like a name and email address. Adding personal information like Name and Email to enrich the user profile should be optional (not mandatory) for the user. Libraries/institutions are advised not to transfer that information during authentication, but have the SP offer the user a profile page in their service, where users provide consent and can voluntarily provide name, email or other information. Minimize the attribute set provided to the service during the authentication-flow.
· Before a service that receives a persistent identifier creates a profile for the user, the service should ask user permission to store and process his/her personal data, for instance via a button “personalize account” or at least be informed by a message on data privacy. In no way should the permission request be mandatory or seemingly mandatory for the user; the user must be free to whether or not have a personal profile.
For both privacy preferences, the SP can require extra non-identifiable information. If more information is needed to allow for billing, access control etc. identity providers can supply one or more of the following attributes (from most to least preferred):
· eduPersonEntitlement, with the specific value urn:mace:dir:entitlement:common-lib-terms
· eduPersonEntitlement, with other values, representing group or role memberships in alignment with AARC Guidelines on expressing group membership and role information
· Usage of schacLocalReportingCode attribute is recommended for statistics purposes once it is well defined.
Any combination of extra attributes like these need to be agreed upon between the SP and the federation or in bilateral agreements with the IdP.
Principle 5: Personal Identifiable Information (PII)
SPs should not require attributes with personal identifiable information (PII). Some publishers state “I need an email address, as my software can’t function without it.” Publishers with (older) systems that require more attributes for authentication to function should adapt their systems ASAP. Libraries are recommended to stop or not start using services that require more personally identifiable information (PII) than a transient or persistent ID during authentication.
Principle 6: Consent
Apart from generally working according to the GDPR, when requesting information from users, for instance in a profile page, publishers have to adhere to the most recent EU “Guidelines on Consent” to make sure that free consent is given in compliance with the GDPR.
Principle 7: Data Processing Agreement
When providing PII to a SP, whether based on consent or not, a respective data processing agreement (DPA) may be needed.
Principle 8: GÉANT Compliance
Publishers are encouraged to declare compliance with the GÉANT Data Protection Code of Conduct.
Principle 9: REFEDS Compliance
Publishers are encouraged to declare compliance with the assertions of the REFEDS Sirtfi framework (Research and Education FEDerations group, Security Incident Response Trust Framework for Federated Identity).
Principle 10: Seamless Access
Publishers are encouraged to follow the guidelines from the SeamlessAccess.org coalition (formerly known as RA21).
Risks & Concerns
The privacy recommendations impact some risks, which we want to make explicit.
● Deanonymization: If you provide a targeted ID, as recommended in Principle 4, Part B above, you have to be aware that other data, already collected by the SP, could be linked to this ID.
● Apart from the fact that for GDPR pseudonymous IDs (and even IP-addresses) are PII, normally users would see a consent or information screen when accessing an SP for the first time and would see which attribute release policy the IdP has selected. If the SP wants to try to collect more information from the user, the SP needs to ask consent via a registration form as recommended in Principle 4 B.
Terms and Definitions
[1.] Regulation (EU) 2016/679 (General Data Protection Regulation) in the current version of the OJ L 119, 04.05.2016; cor. OJ L 127, 23.5.2018, https://gdpr-info.eu
[2.] This is in line with this argumentation.
[3.] E.g., “By connecting to this service, I agree that the service provider stores my person related data (ID, affiliation, entitlements sent by my IdP, my IP address sent by my client, and my actions on this platform). Only if I want to receive emails from the service or if I want to be addressed by my name, I will add my email address and name respectively, but this is not needed for any other personalisation features like ‘point me to the last document and its last page I read’, ‘my last searches’, , etc. Whenever I wish to do so, I may request to see and to have deleted all data stored about me.”
[4.] Please note that this attribute is not available in many federations and IdPs, so if the SP would like to receive that attribute, it will take specific communication between SP and IdP and possibly the federation.
[5.] Guidelines on Consent under Regulation 2016/679, https://ec.europa.eu/newsroom/article29/item-detail.cfm?item_id=623051
[6.] We know of and are tracking the internet2 CAR-initiative about consent for optional release of attributes.