Research Output Repositories Tomasz Miksa, TU Wien Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP This project has been co-funded with support from the European Commission. The European Commission support for the production of this publication does not constitute endorsement of the contents which reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein. Research Output Management in PS Higher Education
Agenda What is a repository system? How to compare repository systems? What systems are out there? What can we learn from Portugal? (How to introduce a repository system?) This presentation is NOT about recommending you a specific system Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP What is a repository system?
Research Output Management in PS Higher Education Introduction Expectations evolved over time from digitization to preservation of e-Science experiments Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP http://www.infotoday.com/cilmag/apr16/Uzwyshyn--Research-Data-Repositories.shtml https://phaidra.univie.ac.at
https://phaidra.univie.ac.at/view/o:423816 http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-5993/ https:// data.ccca.ac.at/dataset/oks15-bias-corrected-eur o-cordex-models-global-radiation/resource/88d3 50e9-5e91-4922-8d8c-8857553d5d2f?view_id=e eeb1e17-c707-46eb-bf24-dd5ed169f1c6 Research Output Management in PS Higher Education
Repositories scope Specialised disciplinary data e.g. DNA sequencing General covering large knowledge areas e.g. social sciences Aggregate experts data globally
locally university country Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Architecture Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education
Architecture Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Architecture Conceptually like any web systems consisting of frontend backend
Example online shopping Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP How to compare repository systems? Research Output Management in PS Higher Education Infrastructure Locally hosted solution
own ICT infrastructure required IT staff required Externally hosted solution developers, system administrators outsourcing of infrastructure lack of control where the data is can the external party be trusted? Open source or Proprietary
who owns the code? is it allowed to introduce changes? Community support number of similar instances forum and mailing lists professional support Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Front-end Design Out of the box
or development needed? Fedora Commons is just a backend Customisable? branding Multi-lingual support? Mobile-optimized design? Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education
Content Organization Aggregations and Collections proceedings, department outputs, etc. help in navigating the repository faceted search Metadata what are the default standards? how easy to add another standard? Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP
Research Output Management in PS Higher Education Content Organization & Multimedia Single object may have many representations Content presentation PDF Viewer Video streaming Image previews Audio playback Slideshows
Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Content Discovery Inside of the repository advanced search full text indexing graphical navigation geolocation Outside of the repository
OAI-PMH Search engines optimization Google scholar indexing DOI and persistent identifiers Social Features and Notifications Share, Bookmark, Comment, RSS Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Publication tools
Customisable submit forms specify required information for a submission metadata license etc. Publishing workflow roles editor, reviewer notifications
Batch processing Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Access Control & Authentication Access control IP ranges, user accounts, Access Control Lists Embargo periods Authentication Not an issue for open access
Possible integrations to be considered LDAP CAS System accounts Shibboleth Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP
Research Output Management in PS Higher Education Interoperability OAI-PMH - Open Archives Initiative Protocol for Metadata Harvesting query to discover repository contents only for metadata not for depositing SWORD - Simple Web-service Offering Repository Deposit
deposit to multiple repositories at once deposit by third party systems (e.g. lab equipment) Export to Mendeley, DataCite, RefWorks, BibTeX, etc. Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Reporting Needed for feedback and building the case Download reports Active users
Google analytics integration Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Preservation Back-ups file system backup import / export functionality LOCKSS compatibility
Lots of Copies Keep Stuff Safe peer to peer network Preservation tools format migration tools risk management tools preservation specific metadata collection PREMIS, METS Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education
Common survey and report shortcomings Simplifications customisable metadata adding an XML template vs reprogramming the software Not up to date systems evolve fast Superficial deployment mode - sometimes both options exist Zenodo as a service or open source at GitHub
community support check how many posts there are and how fast people got answers Currently no survey to compare them all Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Best way is to get your hands dirty Sort out the basics What is the purpose of the repository?
Do I need more than one repository? Which functionality is a must and which one is nice to have? How should the system integrate with the existing infrastructure? Browse through official websites wiki pages GitHub issues mailing lists Contact people who already have an instance Make test deployment of few systems
install configure populate with sample data evaluate Be agile! Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP What repository systems are out there? Research Output Management in PS Higher Education
Fedora based NOT a Linux OS distribution! Fedora commons provide backend only Content model RDF linked data Persistent identifier Versioning Requires further development
Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Fedora based Hydra -> Samvera Ruby on Rails frontend Apache Solr for indexing Islandora Drupal frontend Virtual machines available for download
Integrates well with Archivematica Phaidra Perl + Catalyst Each installation becomes a partner of a consortium Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Archivematica Digital preservation system Automates the process of preparing digital objects
for ingest into a repository, for example: Scan for viruses Generate METS Generate DIP (e.g. migrate) Integrates with repository systems For dissemination e.g. Islandora, DSpace Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education
Archivematica Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Archivematica https://wiki.archivematica.org/Format_policies Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP
Research Output Management in PS Higher Education DSpace Open source Popular scholarly publication repositories some data repositories DataShare or Dryad Large open source community, variety of support and consultancy providers
Aims for turnkey local installation, but can be complex to set up and maintain if customisation is required Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education CKAN - Comprehensive Knowledge Archive Network Repository for datasets Store data Or holds metadata for datasets hosted externally
Pros: faceted search unrestricted metadata data viewers customisable well-established open source community wide take-up in government sector Cons: Lacks support for OAI-PMH Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP
Research Output Management in PS Higher Education Zenodo Runs on the open source Invenio platform Developed by CERN Enables upload of any file data, publication or code Enables compliance for European (H2020) projects Integration with Dropbox and GitHub
Source code available on GitHub Customisation requires software development No access or download statistics No data viewers Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Dataverse Repository system Own instance -> becomes part of community 22 installations, e.g. Harvard, DANS (NL)
Supports citation, versioning, Specific disciplinary metadata standards Easy to install Vagrant (virtual machine) Popular in social science domain Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education
EPrints Available since 2000 Wide take-up for scholarly publication repositories But also supports research data (ReCollect) Support for wide range of content types, metadata schema, interoperability standards EPrints Services not-for-profit commercial services organisation Help build your own repository host for you Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP
Research Output Management in PS Higher Education Commercial solutions Figshare Publically available repository Free of charge Alternative to zenodo Figshare for Institutions Dedicated, partially customised instance hosted by figshare Loss of institutional control
Proprietary API instead of common standards Uncertain succession plan Preservica Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education Repository registries Directory of Open Access Repositories DOAR Based on registrations http://www.opendoar.org/
Registry of Open Access Repositories ROAR Automatically harvested list based on OAI-PMH http://roar.eprints.org/ Projection of DOAR and ROAR onto google maps http://maps.repository66.org Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education
List of repository software (not instances) http://wiki.lib.sun.ac.za/index.php?title=List_of_Repository_Software Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Portugal for Palestine Research Output Management in PS Higher Education Scientific Open Access Repository of
Portugal (RCAAP) Objectives Increase the visibility, accessibility and dissemination of Portuguese research results Facilitate access to information about Portuguese scientific output Integrate Portugal in the wide range of international initiatives in this domain https://www.rcaap.pt Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP
Research Output Management in PS Higher Education RCAAP Portal Meta-repository Aggregates metadata from Portuguese and Brazilian repositories Actual data remains inside of these repositories Currently: 1,5 million documents
126 Repositories Runs on DSpace Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education RCAAP repository types Local Repository installation, configuration and operation of a repository with own facilities and infrastructure
Hosting Service for Institutional Repositories (SARI) Software as a Service Centrally hosted hardware, hosting, connectivity, foundation systems, applications, security, backup service, monitoring Institution administrates, define policies, gets customization Common Repository Centrally hosted Shared with others Intended to institutions whose scientific production does not justify the
creation of a repository also serves as a incubator repositories. Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Conclusion Research Output Management in PS Higher Education Conclusion Analysed functions Test systems on your own You may need more than one system You may decide on the scope based on domain,
institution, etc. Depends on what skills you have and how much money you have Fedora developed on your own vs paid service Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP Research Output Management in PS Higher Education References A comparison of research data management platforms: architecture, flexible metadata and interoperability
https://doi.org/10.1007/978-3-319-16486-1 Institutional Repository Software Comparison https://works.bepress.com/jean_gabriel_bankier/22/ Open Source Software for Digital Preservation Repositories: a Survey https://arxiv.org/abs/1707.06336 Research Data Repositories: Review of current features, gap analysis, and recommendations for minimum requirements https://www.rdc-drc.ca/wp-content/uploads/Review-of-Research-Data-Repositories-2015.pdf
Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra https://dx.doi.org/10.14288/1.0075768 Project number: 573700-EPP-1-2016-1-PS-EPPKA2-CBHE-JP