Skip to main content Main Header
LLS logo

Research Data Management: Preserving and sharing data

Preserving and Sharing

Increasingly there are research funder requirements surrounding long-term research data preservation and, sometimes, sharing. Even if your research was not funded by an external body, you should still give careful consideration to the long term value of your data. 

Appraising

It is not practical, desirable or, in some cases, permissible to preserve and share all research data.  Deciding which data to preserve, and what level of access to allow (if any),  or which data should be disposed of, can be difficult especially where there are ethical or commercial considerations.

How to appraise data

Where to preserve

Choosing a long-term service to look after data means asking questions similar to those you ask when choosing a publisher;  ‘if I hand this over, will they review it, safeguard the content, and make sure it is accessible for as long as it is of value?’  The following are some key considerations:

  • is a reputable repository available?
  • will it take the data you want to deposit?
  • will it be safe in legal terms?
  • will the repository sustain the data value?
  • will it support analysis and track data usage?

Where to preserve your data?

Preparing data

When preparing data for preservation you need to consider the following:

  • Know what you want users to be able to do with your data and for how long.
  • Pin down and communicate the significant properties of your data.
  • Ensure that any restrictions on access and use are communicated and respected.
  • Ensure that you provide enough context to ensure that your data can be located and used – either by the originally designated user community or new users over time.

Preparing data checklist

How to preserve

In data management, data preservation is the process of maintaining access to data so that it can still be found, understood and used in the future.

Why preservation goes beyond (immediate) storage issues

  • Storage media (particularly portable) is at risk of degrading and materials being lost.

  • Backing-up means keeping data available in the short term but saving your data in one or more places does not guarantee its longevity.  Preservation means active management in the long-term.

  • Your data may become incompatible with future software File format changes, making them unreadable.

  • Data may become unintelligible if no supporting documentation has survived.

  • Files may be altered when opened with new software so that they are no longer be understandable or reliable for continued research.

  • Funders now require preservation of data for 10+ years.

  • Access control needs to be considered to ensure that final versions of data are not changed, accidentally or deliberately.

  • Preservation needs to be considered as early as possible as part of data management planning – what preservation requirements will you need to meet, and how will you do it?

How to preserve data

 

Licensing your data

The most common way of giving permission for others to use your data is through a licence.

A licence in this context is a legal instrument for a rights holder to permit a second party to do things that would otherwise infringe on the rights held.

Only the rights holder (or someone with a right or licence to act on their behalf) can grant a licence; it is therefore imperative that the intellectual property rights (IPR) pertaining to the data are established before any licensing takes place.

It is the nature of a licence to expand rather than restrict what a licensee can do, some licences are presented within contracts, and contracts can place additional restrictions on the licensee and indeed the licensor.

For further information, including the different kinds of licence available, see the Digital Curation Centre's 'How to License Research Data'

 

Data access statement

Data access statements are used in publications to describe where data directly supporting the published paper can be found, and under what circumstances they can be accessed. Statements are required by many funders,  as part of their data management and open-access policies.

Some journals provide a section for a data access statement, however where this is not the case you should still include a statement in your manuscript.

A data access statement should include:

  • The name of the data repository where the data is held, and any persistent identifiers (e.g. a DOI) for the data set.
  • Any ethical or commercial reasons why the data is not openly available.
  • Instructions on how to request data that is not openly available.
  • Any specific terms of re-use.

It is not sufficient to suggest that interested parties contact the author for access to data.

Sample data access statements

Open data

"All data cited in this paper are available from [name of data repository, and persistent identifier]"

"All data supporting this paper are provided as supplementary information accompanying this paper."

Secondary use of data

"This paper was based on data already available from [insert location and any persistent identifier e.g. a DOI]"

"This paper was based on existing data obtained under license. Details of how the data were obtained can be found at  [insert URL]"

Restricted data

"Because of [ethical, commercial, other] sensitivity issues, supporting data is not openly available. Further information about the data, and conditions for access, can be found at [insert URL]"

"Supporting data will be available from [insert data repository and persistent identifier] after a 6 month embargo period, to allow for commercialisation of findings"

"Because of confidentiality agreements, supporting data can only be made available to researchers on acceptance of a non-disclosure agreement. Details of how to request access are available from [insert URL]."

No new data

"No new data were created during this research."

Multiple datasets

If the underlying data is held in a variety of locations, it might be appropriate to cite each datasets, including a persistent identifier, individually and direct readers to the references, eg

"This paper is supported by multiple datasets which are available at locations cited in the reference section."