Deborah, a third-year graduate student, and Kathleen, a postdoc, have made a series of measurements on a new experimental semiconductor material using an expensive neutron source at a national laboratory. When they get back to their own laboratory and examine the data, they get the following data points. A newly proposed theory predicts results indicated by the curve.

During the measurements at the national laboratory, Deborah and Kathleen observed that there were power fluctuations they could not control or predict. Furthermore, they discussed their work with another group doing similar experiments, and they knew that the other group had gotten results confirming the theoretical prediction and was writing a manuscript describing their results.

In writing up their own results for publication, Kathleen suggests dropping the two anomalous data points near the abscissa (the solid squares) from the published graph and from a statistical analysis. She proposes that the existence of the data points be mentioned in the paper as possibly due to power fluctuations and being outside the expected standard deviation calculated from the remaining data points. “These two runs,” she argues to Deborah, “were obviously wrong.”

  • How should the data from the two suspected runs be handled?
  • Should the data be included in tests of statistical significance and why?
  • What other sources of information, in addition to their faculty advisor, can Deborah and Kathleen use to help decide?

“How do Scientist Select Data?” is Part 3 of our Keeping it Real in STEM series, where we explore science within the context of society. Stay tuned for Part 4 of our series.

Adapted from “On Being A Scientist: Responsible Conduct in Research“, National Academy Press, Second Edition

