(Volume: 4, Issue: 5)
Does dataset size influence research quality?
Currently, there is a strong belief among the researchers that experimenting with large datasets only ends up in quality research. However, this can be only a myth because journals with high impact factor do publish high-quality articles that involve smaller datasets, which are also synthetic at times. So, what factors related to the dataset size influence the research quality?
Let’s have a glimpse at it…
Datasets and sizes
Research usually involves data, which differ by the data collection procedures as primary or secondary. Primary data is originally collected by the researcher himself/herself to carry out the research. In contrast, the secondary data is the data that the researcher uses from prior relevant research. A researcher might choose to use primary or secondary data as a whole or just consider their subsets to prove their research notion. A researcher might often choose large datasets to prove his/her methodological generalizability or if his/her research involves data-driven approaches. On the contrary, a researcher might also choose smaller datasets, if the research notion is novel and he/she prioritizes controlled experiments or have risky experimentations associated with the research.
Dataset size affects research quality?
A researcher might choose the dataset size, in accordance with his/her research. For instance, a secondary data might be small and inadequate for carrying out the research and so, the researcher would have adopted the primary data collection procedures. Moreover, there are also chances that the primary data might be scarce again, if the research problem is newly-addressed in the literature with riskier experimental procedures to acquire them. It is at this point a researcher gets confused of his/her research quality. Though the researcher might be satisfied of data acquisition or uses the collected new data to test his/her methodology and produces impactful results, still a question disturbs his/her mind and that is- “Will a high impact journal with wider scientific audience deem my research as trivial, especially by quoting the smaller dataset size as the reason?” However, this thought is only trivial because the research quality does not solely depend on the dataset sizes. In fact, the journals do consider few factors regarding the dataset size, before deciding on the research quality of the submission it received. They are explained as follows:
Relevance to the research problem: The journals check whether the dataset is truly associated with the research, rather than considering their sizes. Using a smaller dataset that is relevant to the research increases the credibility of research than employing a larger dataset that completely differs from the research notion
Dataset adequacy: The journals check whether the research claim is strongly met or not. For instance, a smaller dataset is enough to prove a method’s supremacy in solving problems, which involve fewer samples. On the contrary, if the research needs to be generalized or provide solution to a problem involving wider population, large datasets are to be incorporated. Hence the journals prefer research articles, which are adequately tested for application to a problem needing its research outcomes than considering the dataset sizes
Dataset transparency: This factor deals with the dataset collection procedures, the sources and the pre-processing steps applied on the dataset to carry out the research. With this factor, the journal editor or the reviewers could realize that why the researcher has chosen the dataset size to be small or big. Since the limitations or the significances associated with the dataset are explicitly understood from this factor, neither the credibility nor the quality of the research gets affected by the dataset size
Ethical considerations: This factor is related to the consents, which are obtained for involving human participants or animals in the research. The significance of the collected data in serving a societal purpose is clearly understood with these consents and hence, dataset sizes does not greatly affect the research quality
Benchmarking and Dataset validation: This is one of the important factors with which a journal determines the research quality of an article. The editor or the reviewer confirms that the dataset is sufficient enough to compare and prove the research claims against baseline models or benchmark datasets. It simply means that no accidental research outcomes have been produced by experimenting with the dataset
Dataset reproducibility: The journals always check the reproducibility of the dataset to promote similar research or its advancements in future. The research that involves smaller dataset with better reproducibility is considered as quality research, when compared to the research involving massive, non-reproducible datasets
Hence, The Research Seer’s Rationale on “Does dataset size influence research quality?” is:
“Dataset Adequacy, Relevancy, Transparency & Reproducibility Decides Research Quality Than Dataset Sizes”