Data management

Most research projects generate a significant amount of data. This data should be of good quality because it is underpins the quality of the study. Therefore, good data management is fundamental for high quality research. Good practice in data management also helps researchers ensure that the required processes of data collection and analysis are organized, understandable, and transparent.

The main data management responsibilities include:

  • Organizing and ensuring the collection of accurate data.
  • Capturing the data on a database.
  • Validating and correcting the data.
  • Providing data in a form that will enable analysis.
  • Storing and sharing the data.

It is important to remember that the confidentiality of the respondents’ identities must be guaranteed at all times in the data management process. This is usually stipulated in the ethical approval you will receive (e.g. keeping files on a password-protected computer, locked cabinets, limited number of persons with access to any anonymized data). In addition, you will outline how long you will keep the data after the research has been completed. You must ensure that these ethical criteria are maintained throughout the course of your IR project.

Data management is a cyclical process (Figure 4). The data life cycle starts with creating data, followed by processing and using data for analysis. The last two stages of the cycle are storing and sharing the data.

Creating data is the first step in the data management processes. In quantitative studies, this stage consists of defining what type of data will be collected, their format and the procedure to create them. The researcher must ensure that all the collected data reflect the realities, using standardized instruments, data collection procedures, checking error rates during data collection (e.g. checking the completeness and consistency of respondents’ responses in the questionnaires, checking the validity of the responses by random re-interview process).

In a qualitative study this stage starts with defining different types of information the researcher intends to gather, different tools (e.g. interview or FGD guidelines) and data collection activities. The researcher needs to ensure that all recording devices are placed in a way that will best record the conversation or discussion, and that the space for interview or discussion creates a safe atmosphere for open discussion while ensuring privacy.

This is the course of translating information from the rawest form to a form that is ready for analysis to the researcher. In quantitative study this means creating an electronic database that is appropriate to manage different types of data (e.g. multiple responses, numerical data, visual analogue scale data, etc.). It involves creation of file and coding structures that are understandable, the development of a codebook, making decisions on which data can be kept in the database which should be discarded. As data is entered, data entry errors should be prevented by double entry and checking the consistency of responses. In qualitative studies, this means that all the recorded data are transcribed verbatim and in some cases transcripts can be shared with respondents to verify content. It also entails the development of a codebook particularly when more than one researcher is conducting the analysis. All collected qualitative data should be saved in a qualitative data management application.

Data analysis in quantitative studies consists of identifying patterns through descriptive analysis, comparing data, hypothesis testing and finding relationship between variables. In qualitative studies, this process consists of identifying, understanding meaning and assigning code to the data, identifying patterns and emerging themes, and constructing framework to explain certain phenomena. This activity will be described in a subsequent section.

Storing data involves activities not only during the study period, but also in the long term by archiving data in a repository or data centre. Presently, electronic data storage/repository is the medium of choice as it requires little space and is simple to back up. However, a data storage strategy is needed as digital storage media also have several limitations e.g. quality and life cycle of storage media, software interoperability, relevant data reading equipment and power supply. Data security is another issue in storing data. Security issues include physical data security (e.g. locked room or cabinet, access log book), and electronic data security (e.g. secure access using password, level of access, and data encryption for sharing and transmission). The WHO Good Clinical Practice Guideline recommends that data and essential documents should be stored for at least two years after the research project has ended.20

Sharing data is particularly important in collaborative multi-centre/country studies. Data sharing, together with data transfer, data storage and access for all collaborative partners or institutions can be challenging as it may involve different regulations. Cloud-based file sharing may be preferable, although it may not be suitable for all types of data, particularly identifiable, confidential data. Furthermore, researchers do not have control over where data is actually stored.

Data sharing is becoming mandatory in many fields as a way to ensure transparency, avoid duplication as well as plagiarism. Since IR may involve different institutions/organizations, the guidelines for data sharing and ownership should be clearly spelt out at the beginning through agreements such as a memorandum of understanding. Data sharing should follow a clear process and can be carried out between two research institutions though not between two individuals. Please check your own institutional and national guidelines before designing any data sharing agreements.

Collection and storage/documentation of accurately recorded and retrievable results are essential for any research. Good data collection practices will ensure that data can be traced to their source and their original form (i.e. the raw data that constitutes the first recording of the observation). To ensure these characteristics, raw data must be recorded:

  • Promptly: After a specific task is completed. Delaying data recording will reduce data quality as memory may fail or be inaccurate.
  • Accurately: Inaccurate data recording will reduce the reliability of the data collected; Accuracy is therefore a critical part of the integrity of the study.
  • Legibly: Hand-written data should be clearly written and electronic records should not be difficult to decipher.
  • Indelibly: Handwritten raw data should be recorded in permanent ink. Any changes to the raw data should not obscure the previous entry. The date, reason for the change and signature of the person responsible for the change should be added.

Clear and regularly checked data flow prevents data loss. As IR collects different types of data (i.e. patient, organizational and surveillance-related data) from various sources (i.e. human subjects, medical records, health services and laboratory registers, surveillance systems, and administrative systems) a detailed chart should be made describing the critical pathway(s) to be used for the data collection process in handling questionnaires, coding, data entry, data verification, cleaning and storage of hard copies and back-up of data files.

Data quality is key to having authentic and scientific data and therefore should be taken seriously. Activities such as staff training, supportive supervision and data feedback can be used to enhance the quality of data. Refer to the planning of an IR project module for details.

TDR Implementation research toolkit(Second edition)

© 2024 TDR. All rights reserved