5.5 Content accessibility

5.5.1 Suggested training topics

Content provisioning
1. XML: JATS, ONIX, TEI
2. PDF
3. HTML: website features and SEO
4. Citations in metadata and content data
5. Research data
6. Version control
Content harvesting and indexing
1. OAI-PMH protocol and infrastructure
2. Introduction to indexing services and their requirements
3. Website optimization for indexing
Content depositing and export
1. Repositories and journal hosting services
2. Depositing protocols: SWORD
3. Export format types: CSV (DC), JSON, XML (JATS, ONIX)
Content long-term archiving
1. Archiving services: CLOCKSS, Portico, PKP Preservation Network, PubMed Central, national / institutional services

5.5.2 Notes on the training topics

This block covers the extensive domain of content accessibility and is divided into four sections, addressing content provisioning in human and machine-readable formats, content harvesting and indexing, content depositing, export and long-term archiving. The topics in this block partially mirror those defined for metadata (§5.2 “Metadata I”), but add some important new features relevant for the content side of the journals’ output.

The “Report on challenges and help measures faced by Open Access journals and platforms” (D3.2 (Laakso et al. 2024)) underlines the importance of content availability in machine-readable formats, which support text and data mining. The training topics on XML provisioning thus cover common standards such as JATS, ONIX and TEI. PDF is a thematic area of its own including alongside the layout features (title, headers, footers, marginal and endnotes, margins, borders, etc.) the optimization of PDF documents for findability online. Google Scholar, for example, provides the following recommendations for PDF articles on the web:

The full text of the paper should be in a PDF file that ends with ".pdf".
The title of the paper should appear in large font size on top of the first page.
The authors of the paper should be listed right below the title on a separate line.
There should be a bibliography section titled, e.g., "References" or "Bibliography" at the end.

The importance of journals’ website features cannot be underestimated, both from the perspective of article landing pages (always in HTML and generated by the publishing system) and full text articles in HTML (in many cases not generated by the system). The topic on HTML provisioning therefore covers website features such as the selection of a suitable domain name, matching keywords, unique URLs for article landing pages and related research data. It also considers the optimisation of website features for search engines, such as placing each article and each abstract in a separate HTML file, meta-tags configuration, the structuring of robots.txt file, etc. The alignment with the Counter Code of Practice, the presence of alerting services, sharing on social networks, post-publication evaluation and commenting, support for multimedia and open peer review are additional elements enhancing the visibility of the resources on the web.

The next topic in the thematic area of “Content provisioning” deals with the correct structuring of articles’ references in XML, HTML and PDF. The citations should follow the Open Citations standard to be correctly deposited to Crossref. In order to be properly indexed the references section should have a standard heading (e.g. “References” or “Bibliography”) in HTML and PDF output. Content production often includes accompanying research data, this is why this topic is also included in this block. It closes with version control obtaining more importance with the spread of pre-prints and overlay journals.

The section on content harvesting and indexing starts with explaining the technical set-up of the OAI-PMH infrastructure and the core features of this protocol. It proceeds to the introduction of indexing services, such as BASE, CORE, OpenAire, Google Scholar, etc. and their varying requirements. As these aggregators usually use repository registries (OpenDOAR, Re3Data) to obtain information on the data sources they intend to harvest, the procedure of registration in both registrars as well as directly in indexing service is explained.

The section on content depositing and hosting services comprises introductions to repository hosting software such as Eprints (eprints.org), Digital Commons (digitalcommons.bepress.com) or DSpace (dspace.org). The SWORD protocol that enables the remote depositing of resources into the repositories is also covered there. Another topic in this section is content export in CSV, XML or JSON structured according to DC, JATS, ONIX or any other common standard enabling data mining. The final training section is devoted to long-time archiving and preservation and demonstrates such services as CLOCKSS, Portico, PKP Preservation Network, PubMed Central, and potentially any national or institutional service. Archiving functions as a backup in case platforms, where publishers store their books and journals, cease to exist, or publishers themselves go out of business.

5.5.3 Modules build-up: content accessibility i - provisioning, harvesting, depositing and archiving

Topic

Level

Audience

F-A-I-R

Content provisioning: XML

Advanced

● Software developers

● Technical professionals

F-A-I-R

Content provisioning: PDF

Advanced

● Software developers

● Technical professionals

F-A

Content provisioning: HTML

Foundational/ Advanced

● Editors

● Reviewers

● Researchers

● Software developers

● Technical professionals

F-A

Content provisioning: citations

Advanced

● Software developers

● Technical professionals

F-A

Content provisioning: research data

Advanced

● Software developers

● Technical professionals

F-A-R

Content provisioning: version control

Advanced

● Software developers

● Technical professionals

Content harvesting and indexing: OAI-PMH

Advanced

● Software developers

● Technical professionals

F-A-I

Content harvesting and indexing: indexing services

Foundational

● Editors

● Reviewers

● Researchers

F-A

Content harvesting and indexing: website optimization for indexing

Advanced

● Software developers

● Technical professional

F-A

Content depositing: repositories and journal hosting services

Foundational

● Editors

● Reviewers

● Researchers

F-A-R

Content depositing: SWORD protocol

Advanced

● Software developers

● Technical professional

F-A-I

Content export

Advanced

● Software developers

● Technical professional

F-A-I-R

Content long-term archiving

Advanced

● Software developers

● Technical professional

F-A-R

Table 14: Modules for for the training block “Content accessibility”

5.5.4 Training materials

Existing training materials.

Title

Creator

Comment

Project JASPER

Open access journals must be preserved forever (https://doaj.org/preservation/#open-access-journals-must-be-preserved-forever)

DOAJ

The project website outlines the importance of the digital preservation of OADJs and how they can become part of the project. It gives readers valuable insights into the important topics in this field as well as practical advice.

OA Journals Toolkit: Journal and Article Indexing (https://www.oajournals-toolkit.org/indexing/journal-and-article-indexing)

DOAJ, OASPA

In this article of the OA Journals Toolkit the topic of indexing and choosing the right index is discussed concisely.

OA Journals Toolkit: Search engine optimisation and technical improvements (https://www.oajournals-toolkit.org/indexing/search-engine-optimisation-and-technical-improvements)

DOAJ, OASPA

SEO and other technical improvements for better indexation are summarized in this article of the OA Journals Toolkit.

Table 15: Existing materials for the training block “Content accessibility I: provisioning, harvesting, depositing and archiving”

Training materials planned to be produced by CRAFT-OA

Title

Created in context of

Institution

Introducing principles of long-term archiving

WP3 / T3.3

TIB

Accessibility requirements for online publications (based on the quality standards of AG Universitätsverlage)

WP3 / T3.3

UGOE

Editorial requirements shaping technical functionalities incl. PID registration and versioning

WP3 / T3.3

UGOE

Accessibility requirements for online publications

WP3 / T3.3

DOAJ

Creating JATS XML from DOCX documents on HRČAK

WP3 / T3.3

SRCE

Indexes knowledge base

WP5 / T5.1

AMU

Table 16: Potential materials for the training block “Content accessibility I: provisioning, harvesting, depositing and archiving”

Previous5.4 Metadata III: licensing, OA and self-archiving policy Next5.6 Upskilling curriculum: modular break-up and a checklist

Last updated 1 year ago