Skip to Main Content

Data Literacy (HC)

How to Find, Evaluate, Organize, and Document Data

Data Ethics Overview

A big part of data literacy is being ethical--and being aware of potential ethical pitfalls--around data collection, data analysis, and data dissemination.

View this video for an overview (sign in with your Haverford credentials): 

Data Ethics (8-min)

(From LinkedIn Learning. Data Literacy: Exploring and Describing Data. Accessed Jan 29, 2025.)

 

A variety of potential concerns are also outlined below.

Potential Ethical Concerns around Data

Harm to Subjects

  • Data collected without consent (permission)
  • Consent should be (1) voluntary (subject was not coerced) and (2) informed (subject knows what they're agreeing to)
     
  • Violation of anonymity (if promised) (i.e. individuals are identifiable)
     
  • Violation of confidentiality/privacy (if promised) (i.e. data shared without consent)

__________

Harm to Data Workers

  • Data workers--particularly in the AI industry--are often overworked, underpaid, and exposed to harmful information
     
  • See Data Workers‘ Inquiry: Data workers worldwide report on their respective workplaces and labor conditions.

__________

Bias

  • Sample bias (such as self-selection, attrition, or over-/under-representation)
  • Conflict of interest (data collection or analysis tied to a financial, political, or social interest)
  • Missing data
  • Data was never collected 
  • Consciously or subconsciously, certain data is simply not collected.
  • See The Library of Missing Datasets: "...a physical repository of those things that have been excluded in a society where so much is collected....Spots that we've left blank reveal our hidden social biases and indifferences." --Mimi Ọnụọha, 2016
     
  • Data was collected but later purged 
  • Results in an incomplete and inaccurate representation of reality
  • For example, a high volume of U.S. government data--haphazardly deemed to be harmful, not useful, or incorrect--is currently being purged. Many data rescue projects are underway to save at-risk data. 

__________

Data Fabrication & Falsification

While instances of data fabrication (inventing data) and data falsification (manipulating data) are not rampant in the scientific community, occurrences are serious breaches of appropriate research conduct. 

  • A handful of high profile cases of data fabrication and falsification (Stapel, Wansink, Lacour) have been uncovered in the past decade, largely due to the Open Science movement, which makes possible the ability to analyze research objects for oneself. ​​​​
     
  • Retraction Watch reports on published papers that are retracted, at times due to research misconduct regarding data.
     
  • While not as egregious as falsification, data can also be visually represented or otherwise finessed in a way that is biased toward one conclusion over other plausible conclusions. This can be intentional or unintentional.

Data Ethics Resources

Find more detailed information in these resources.