Basics of Data Collection

Data collection is a structured approach to gathering and measuring information from a wide range of sources. This systematic way of collecting data allows people to answer research questions, assess outcomes, and ultimately gain insight into any topic of interest. The main goal of any data collection effort is to compile quality data that can be easily analyzed and result in a meaningful conclusion. While the process of collecting data can differ greatly from discipline to discipline, one aspect remains the same throughout: maintaining data integrity.

Data Integrity

Data integrity refers to the fact that data must be reliable and accurate over its entire lifecycle (for more information on data integrity, read this lesson featured in our Data School Course, “Data Governance”). Data collection is really where this lifecycle begins, so it’s important that the implementation of integrity starts here. The fundamental reason for maintaining data integrity is to detect errors in the data collection process, whether they are made intentionally (deliberate falsifications) or not (systematic or random errors). Some consequences of improperly collected data include:

  • Inability to answer research questions accurately
  • Inability to repeat and validate the study
  • Distorted or incorrect findings resulting in wasted resources
  • Misleading future research with incorrect methods of investigation
  • Causing harm to human participants and animal subjects

Throughout the data collection process, there are two means of preserving data integrity: data quality assurance and data quality control.

  1. Quality Assurance – refers to actions taken before data is collected. The focus of quality assurance is prevention i.e. stopping the possibility of having invalid data before the whole process even begins. Putting a stop to issues before they begin can save you a lot of time and, most importantly, money on the data collection process. To prevent a loss of integrity in your data, it’s recommended to create and follow a standardized protocol that’s contained in a procedure manual for data collection. If you create a procedure manual for your collection process, here are some things to consider:
    • Ensure the manual is well written to prevent failure in identifying issues early on in the process.
    • Properly train staff according to the protocol of the manual to prevent deviation from acceptable methods.
  2. Quality Control – refers to actions taken during and after data collection. While these actions take place when collection is ongoing, they should be explicitly stated in the procedure manual and should be acknowledged before the process begins. Quality control can occur in the form of many different activities, but routinely auditing records is necessary to ensure data collection is following the procedures established in the manual. To further ensure data integrity with quality control, it’s important to identify the required actions needed to correct in progress or already completed faulty data collection practices. These actions will also help in preventing malpractices in the future.

While the level of impact from inaccurate data collection can vary depending on the discipline, there is equal opportunity to cause excessive harm when these research results are used to support largely important decisions. Data collection methods can also vary depending on discipline, the nature of the information needed, and the goal of the users. Let’s take a look at the different collection methods.

Collection Methods

There are a great variety of data collection methods and, as stated before, they can differ for numerous reasons. However, there are two main types of data that are commonly collected: qualitative and quantitative.

Qualitative data is closely associated with words, sounds, feelings, emotions, colors and other elements that are non-quantifiable. Though these data are difficult to measure, they aim to ensure a greater depth of understanding that can’t be obtained from numbers or mathematical calculations.

Quantitative data deals with quantities, values, or numbers, making it the measurable type of data. These data are usually expressed in numerical form and are based on mathematical calculations in various formats.

On the surface it may seem that these data types have very similar collection methods, but the specifics of each method differ, tailoring to the uniqueness of each data type. Often times, these collection methods are used to gather information from a sample of a population; more information on sampling can be found here.

Collection MethodDescriptionQualitativeQuantitative
InterviewA formal meeting between two people where questions are asked by one person to the other in order to obtain a certain level of information. An interview is used when interpersonal contact is important and when opportunities for follow up of interesting comments are deemed useful.Unstructured or semi-structured conversation provides very personalized data from the interviewee.Structured conversation consisting of pre-planned questions provides more analytical data from the interviewee.
SurveyA structured way of collecting data consisting of two components: questions and responses. Surveys can be constructed in many different forms, but are typically used when information is collected from a large number of people when standardization is important. Open-ended, short questions provide uniquely detailed answers and very descriptive data.Close-ended questions with supplied answer choices provide structured and limited data.
ObservationA systematic way of collecting data that simply requires watching ongoing behavior and recording observations. Observations are made without disturbing, influencing, or altering the environment or the participants in any way.Subjective collection of data that focuses on physical qualities or properties. Provides data that can be observed with our senses which is helpful in comparative analyses.Objective collection of data that focuses on numbers and precise measurements. Provides accurate numerical data that are useful in statistical and numerical analyses.
Focus GroupA controlled group interview made up of individuals that have shared characteristics relating to the evaluation. Focus groups are used instead of individual interviews when group interaction can promote data and insights that would be unlikely to emerge otherwise.A session moderator presents the evaluation topic to the group and opens up the floor for group discussion. Provides unique data that results from the group members’ interactions and influences.Not a suitable quantitative data collection method.
Case StudyAn up-close, in-depth, and detailed examination of a subject of study, as well as its related contextual conditions. Case studies often use a combination of other collection methods to gather data for a holistic analysis.An investigator examines a small number of items (towns, projects, people, schools, etc.) while becoming deeply involved in each situation. Provides very engaging, rich, and realistic data about a case as it develops in a real-world setting.Not a suitable quantitative data collection method.
ExperimentA structured method for testing different assumptions, or hypotheses, under certain conditions. Experiments should follow strict procedures to ensure accurate results and allow for replication.Not a suitable qualitative data collection method.An independent variable is manipulated while control is maintained over dependent variables. Provides data that is used later on for analysis of relationships and correlations.



There are many different types of data collection methods that can be used in any evaluation. Each has its advantages and disadvantages and must be carefully chosen based on the circumstances of the situation; no one approach is always the best. No matter the collection method you choose to use, it’s essential to maintain data integrity. Implementing methods of quality assurance and control can guarantee that your data is reliable and accurate, and that your business practices good data governance.


TAGGED WITH: data collection

About Bryn Burns

Hi! I'm Bryn Burns. I am a current senior at Virginia Tech pursuing degrees in Statistics and Mathematics. Data science and visualization are two things I'm very passionate about, as well as working with numbers and helping people learn. I'm thrilled to share my knowledge here at The Data School!