Volume 58, Issue S2 p. 262-267
COMMENTARY
Open Access

Extending critical race, racialization, and racism literatures to the adoption, implementation, and sustainability of data equity policies and data (dis)aggregation practices in health research

Matthew Lee DrPH, MPH

Corresponding Author

Matthew Lee DrPH, MPH

Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA

Correspondence

Matthew Lee, Department of Population Health, NYU Grossman School of Medicine, 180 Madison Avenue, 17th Floor, New York, NY 10016, USA.

Email: matthew.lee@nyulangone.org

Search for more papers by this author
Jake Ryann C. Sumibcay DrPH, MPH

Jake Ryann C. Sumibcay DrPH, MPH

Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA

Search for more papers by this author
Hannah Cory PhD, MPH, RD

Hannah Cory PhD, MPH, RD

Division of Epidemiology and Community Health, School of Public Health, University of Minnesota – Twin Cities, Minneapolis, Minnesota, USA

Search for more papers by this author
Catherine Duarte PhD, MSc

Catherine Duarte PhD, MSc

Department of Epidemiology & Population Health, Stanford University School of Medicine, Stanford, California, USA

Search for more papers by this author
Arrianna Marie Planey PhD, MA

Arrianna Marie Planey PhD, MA

Department of Health Policy and Management, UNC Gillings School of Global Public Health, Chapel Hill, North Carolina, USA

Search for more papers by this author
First published: 18 May 2023
Citations: 1

1 INTRODUCTION

Improved data equity—specifically, a transparent, critically grounded approach to race and ethnicity data (dis)aggregation—is necessary to document, understand, and address the health effects of racism.1 The absence of such a systematic process may result in unintended and unattended public health harms. For example, in early efforts to monitor the rapid spread of COVID-19 across the United States at the pandemic's onset, disaggregated race and ethnicity data were largely unavailable, rendering inequities in morbidity and mortality largely invisible.2-4 Subsequent “one-size-fits-all” interventions, like COVID-19 vaccine/treatment dissemination through public–private partnerships with pharmacy chains that underserve rural, predominantly Latinx communities and majority-Black neighborhoods in large metropolitan areas, were ultimately ill-equipped to address the systems-level determinants driving disproportionately higher risk of exposure among minoritized communities and likely exacerbated inequities.2, 7-9 Among the most vulnerable were smaller minoritized populations like Native Hawaiian and Pacific Islanders who, by the end of 2020, had among the highest COVID-19 rates across 15 of 20 states reporting data.5, 6 Later, as racialized inequities in COVID-19 morbidity and mortality surfaced more broadly,10 prevailing explanations often invoked debunked essentialist understandings of race and ethnicity, requiring dedicated rebuttals11 and highlighting a need for critically, theoretically, and historically grounded data disaggregation efforts.

This example, while rooted in a specific instance, illustrates a more widespread concern.12 Current health data collection, analysis, interpretation, and reporting approaches (e.g., the data life cycle13) often obscure meaningful detail by race and ethnicity along several dimensions, including: (1) omission of racial and ethnic groups at measurement, which can drive systematic nonresponse or self-selection into groups not aligned with lived experiences; (2) failure to disaggregate across racial and ethnic data, without specifying theoretical support, even when more nuanced data are collected; (3) mis-categorization in data cleaning; (4) re-aggregation in data visualization and dissemination; and (5) nonreporting of how race and ethnicity variables are conceptualized and operationalized in findings. Centering data equity considerations across the data life cycle by operationalizing actionable, sustainable, and theoretically-grounded disaggregation practices is essential for valid documentation of the growing diversity of the US population, which is a crucial step in monitoring progress towards achieving and sustaining health equity.14

Despite decades of local and national advocacy demanding greater granularity in data collection (e.g., among Asian American, Native Hawaiian, American Indian, Alaska Native, Pacific Islander, Indo-Caribbean, and Middle Eastern/North African groups) and resulting efforts to improve data equity, advancements in the conceptualization, sample representativeness, measurement, interpretation, and reporting of race and ethnicity data have been limited, scattered, or sparse.15-17 For example, analyses of the 2020 Census demonstrate persistent data collection quality concerns, including an undercounting of some groups (those who identify as Black, Latinx, and American Indian), overcounting of others (Asian or non-Latinx White), and in some cases, an inability to make a determination on count accuracy (Native Hawaiian or “Other” Pacific Islander).18, 19

More concerted efforts—including new policies and dedicated funding to improve data quality across jurisdictions (e.g., states, cities) and systems (e.g., hospitals, insurers)—are currently underway, including the Office of Management and Budget's efforts to review and revise standards concerning federal data on race and ethnicity20 and the 2021 Presidential Executive Order on “Advancing Racial Equity and Support for Underserved Communities through the Federal Government.”21 Both present key opportunities for reflection on the benefits and challenges associated with data (dis)aggregation.22, 23

In this commentary, we extend critical conceptualizations of race, racialization, and racism to guide a series of recommendations on improving data equity via grounded data (dis)aggregation practices. Specifically, drawing on insights from critical theories, we explore the benefits of data disaggregation; the risks of mis-implementing disaggregation practices,24, 25 including cautionary examples of the weaponization of disaggregated data to target historically marginalized groups; and outline recommendations for policy makers, practitioners, and researchers. Doing so, we aim to define a critically grounded roadmap for the pragmatic implementation and sustainability of data equity practices that explicitly attend to historical harms and iteratively guard against potential risks.

1.1 A note on language

Throughout this commentary, we refer to the “data life cycle,” which includes the following stages: funding, motivation, study design, data collection and sourcing, data analysis, interpretation, and communication and distribution of data.13, 46 We also use “(dis)aggregation” to highlight that data on race and ethnicity are often re-aggregated during analysis and dissemination, even when more granular data are available. Further, we also use “(dis)aggregation” to encompass circumstances in which critically implementing data equity practices requires that data remain aggregated to protect participants' privacy. Thus, the critical considerations and pragmatic recommendations outlined below speak to both the disaggregation and re-aggregation of demographic race and ethnicity variables. In Supplemental Table 1, we provide a glossary of terminology to inform interdisciplinary discussions of data equity and data (dis)aggregation for health research, policy, and practice.

1.2 Explicit considerations for critical theory

Engaging critical theories of race, racialization, and racism is a first step in understanding for what purpose existing racialized categories were created and maintained, why data equity matters, and what interests are served when race and ethnicity variables are used uncritically (e.g., preserving racialized hierarchies and power dynamics). That is, using theoretical frameworks to ground consideration, selection, and implementation of categorical race across the data life cycle helps make implicit assumptions explicit. This, in turn, allows researchers and practitioners to evaluate biases, facilitates transparent dissemination that clearly explains data decisions to audiences, and encourages dialogue around underlying assumptions. Supplemental Table 2 outlines key readings across several critical theories of race, racism, and racialization along with example applications within health research, policy, and practice. Notably, literatures are a collection of ideas and do not derive from a uniform worldview; rather, they evolve through cross-disciplinary insights, intersectional inquiry, and action.

The importance of reimagining a critically grounded approach to how race and ethnicity variables are gathered and presented in health data becomes all the more clear with the acknowledgment that public health and medicine are disciplines with histories of investing in eugenics movements, both explicitly and implicitly, through the production of ostensibly “objective” or “neutral” research.26 Critically grounded inquiry, therefore, serves as a key tool for systematically interrogating—and guarding against—how study design and analytic methodologies may continue to reify principles of White supremacy, colonialism, and anti-Indigeneity.27, 38

A growing body of work has engaged critical race theory (CRT) as a particularly well-suited approach for developing an antiracist public health praxis.28, 29 CRT is designed to explicitly name the centrality of racism, illuminate contemporary racial phenomena, expand the discourse about complex racial concepts, and challenge racialized hierarchies. Thus, CRT emphasizes the need to critique notions of the “objectivity” and “neutrality” of data throughout the data life cycle. For example, CRT would stimulate researchers to recognize when poor-quality data infrastructure is a form of, and intrinsically linked to, structural racism.30 Take how structural racism undergirds an American health care system that systematically excludes racially minoritized populations via barriers to health insurance coverage and health care access, among others.31 These considerations make explicit that studies conducted with nonrepresentative samples of health care utilizers likely underestimate the health harms of structural racism.

Lessons in the use of race variables across the data life cycle can also be gleaned from efforts in the field of education to critically approach data (dis)aggregation in quantitative methods. These approaches use the central tenet of CRT—understanding, changing, and challenging relationships between race, racism, and power32-34—to interrogate and dispel key misrepresentations of minoritized groups (e.g., deficit-framings that certain groups are “lesser” or “needier”). This scholarship posits that research through an anti-racism lens must prioritize understanding how policies and practices have been enacted over time, contextualizing them in power structures. Also central to this work is disrupting hierarchies of evidence that uniformly privilege quantitative approaches and valuing qualitative data, such as personal narratives and counter-storytelling, as key for their independent contributions, as well as for their capacity to contextualize quantitative analyses. This approach elevates the voices, perspectives, and experiences of those who have been historically marginalized or invisibilized in mainstream literature. One specific approach in the Education field is QuantCrit, which calls for engaging CRT and Critical Race Quantitative Intersectionality (CRQI) any time quantitative data is used.35, 36 For example, a 2005 study drawing on these critical approaches intentionally disaggregated Latinx populations, illuminating how race and racism are woven into the structures, policies, and practices of higher education, driving disproportionate experiences of pushout (multi-causal early exit from education trajectories) among Salvadoran and Chicanx students who were otherwise masked when aggregated.37

1.3 Consideration for histories of data practices

While data (dis)aggregation has rapidly gained traction as a means of monitoring efforts toward a more just and equitable society, it is important to critically reflect on ways that “counting” has served to subvert this aim. Historical and contemporary cases evidence the (mis)use of both how we count—including when and where—as well as who we count with implications for reifying oppressive social and political orders.

For example, racializing policies of the settler-colonial state deployed American Indian and Native Hawaiian blood quantum thresholds to invisibilize Indigenous people, therefore limiting state recognition and undermining Indigenous land claims.38, 39 Simultaneously, policies to the opposite effect advanced the “one-drop rule” to categorize Black people, with the intent of sustaining systems of chattel slavery and later racial segregation.40, 41 Under both circumstances, these investments in an oppressive political order served as structural determinants, since they are linked to adverse health outcomes among American Indians, Native Hawaiians, and Black Americans (including descendants of Africans enslaved in the United States, Black immigrants to the United States, and their US-born children).42

Other historical and contemporary examples demonstrate the conflation of “surveillance” in the demographic sense with “surveillance” in the policing sense. During World War II, US Census Bureau officials provided technical support and small-area data tabulations from the 1940 Census to facilitate the US military-led identification and incarceration of Japanese Americans in concentration camps; then again in 2002 and 2003, US Census officials released small-area tabulations on Arab Americans by ZIP code to the Department of Homeland Security.43 As a precedent for the weaponization of US Census data to target politically marginalized groups, the consequences of these past actions persist, including concerns over 2020 Census undercounts potentially resulting from a “citizenship” question regarding immigration status. Other contemporary instances demonstrate the durability of these harms, re-normed to assuage present-day sensibilities. For example, consider the Census Bureau's policy of counting incarcerated individuals—who, due to structural racism, are predominantly Black and Latinx—in the often rural, White communities in which they are imprisoned. The implications of this practice extend from federal resource misallocation to artificially inflated powers for residents of these communities to shape political processes while the incarcerated people who are “counted” remain excluded from political participation.45, 44 Without critical considerations of these histories of harm, (mis)implementation of data collection and (dis)aggregation may therefore default our efforts to the (il)logics of enduring instruments of oppression.

1.4 Pragmatic steps toward data equity practices

Building on the data equity framework of the WE ALL COUNT project that outlines seven key steps in data-driven work,46 we offer additional considerations and recommendations for policy makers, practitioners, and researchers focused more concretely and pragmatically on the meaningful translation of data equity into data (dis)aggregation practices. These recommendations reflect points along the data life cycle in which critical reflection on the underlying histories and theories guiding data equity may be especially necessary. While progress has been made in specific settings along some of these recommendations, we encourage leaders and decision-makers to take a more comprehensive approach recognizing the interrelatedness of every step of the data life cycle, rather than addressing individual steps in isolation.

The first step is funding. For this step, we recommend dedicating specific and sufficient funding to engage community-based organizations in the co-design, recruitment, administration, and follow-up/dissemination of research for historically undercounted communities.

The second step is motivation, where we encourage explicitly identifying both your data equity goals (e.g., to have a more representative dataset? to better reach historically/typically excluded or undercounted groups?) and the decisions (e.g., resource allocation, community priorities) that are impacted by the final analyses, interpretations, and dissemination.

The third step is design. When designing instruments (e.g., surveys), it is essential to engage community leaders and organizations to partner in providing substantive feedback on survey items and response options for demographic race and ethnicity variables. When feasible, compensate these partners to culturally adapt the survey using a transcreation approach (beyond literal translation of words to integrate cultural relevance) that meaningfully reaches and engages those with limited English proficiency.47 Justify racialization categories according to historically (e.g., considerations for disaggregation to document if/how participants traditionally collapsed into a single racial category may be differentially affected depending on nativity, ancestry, immigration, generational status) and theoretically informed (e.g., CRT) principles and show how these categories specifically further aims identified in the motivation step.

The fourth step is data collection and sourcing (recruitment, sampling, and conduct). For primary data, explicitly communicate data protection measures to participants (e.g., how data will be used and stored, whether it will be kept or destroyed once the analysis is complete, who will have access). Articulate these measures in all resulting products. For secondary data, prioritize datasets that have explicitly communicated data protection measures to participants and hold researchers to the highest standards of data protection. Articulate these measures in all resulting products. In manuscripts and reports, clearly describe measurement of demographic variables (e.g., Are these data self-report? Assigned? Why? What latent construct is this measure operationalizing? Are implications addressed in limitations section?). For example, in instances where existing data might include administrator-assigned race (e.g., birth certificate data), be explicit in writing about the impact on both data quality (i.e., how assigned race differs from self-reported race), the particular latent construct these assigned race data proxy (e.g., racialization processes as a domain of racism), and potential mechanisms underlying the effects of such constructs on health outcomes (e.g., how systems may interact with people and patients as a function of the assigned racial category to which they are assumed to belong).

The fifth step is analysis. Here, we recommend transparent data cleaning records, with codebooks clearly describing the operationalization of demographic variables (e.g., Were any response options collapsed? Why? Bring in context for what latent constructs the measure is aiming to reflect—such as race as a proxy for exposure to racism, with racism then being the true exposure and underlying driver of health and health inequities). Address implications in limitations sections of later manuscripts with recommendations for future efforts. Also, present population estimates at the most disaggregated level possible depending on the purpose of the analysis, permissions from participants, and resulting product. For smaller sample sizes (small “n” concerns), re-aggregate back up to larger categories when needed for public-facing figures and reports, but when informing internal documents and decision-making (e.g., within health systems and health departments), keep data granular for as long as possible. Aggregate responses into an “Other” category only when necessary (e.g., privacy concerns), then highlight in the limitations section, the needs for targeted sampling efforts in future work and ensure that proper justification and detailed explanations are provided for disaggregated data (e.g., “estimates for X group should be interpreted with extreme caution due to smaller sample sizes”) and re-aggregated data (see step 7 below).

The sixth step is interpretation. Recognize the implications of the twin histories of structurally mediated migration and racial and ethnic segregation for the identifiability of racially and ethnically minoritized groups in spatially referenced data (note earlier examples of targeting Arab and Japanese American communities). In an era of “data-driven” decision-making, this means that disaggregation may render historically minoritized groups, such as immigrants of color, hypervisible in ways that may harm their communities.48

Finally, the seventh step is communication and distribution (reporting/dissemination). Be transparent in public-facing reports and manuscripts' methods sections by explicitly describing measurement and operationalization decisions and motivations for those decisions (e.g., theory, limitations of data with recommendations for data improvement, etc.).49 Support minoritized and marginalized groups to have shared ownership of their data and meaningfully engage them on being represented in data visualization, reporting, and dissemination on their own terms (e.g., advance data sovereignty and data justice by supporting Indigenous decision-making across the data life cycle and acknowledging Indigenous rights and interests to having stewardship over their data).50

2 CONCLUSION

Ongoing critical engagement around the history and theory guiding public health data practices and a reimagining of the data life cycle as a whole are key to continuously ensuring that data equity goals and data disaggregation practices can be ethically sustained. On its own, data disaggregation is inadequate to achieve and sustain data equity, so it is important for leaders, researchers, and decision-makers to be mindful of the risks for exacerbating historical and contemporary harms that accompany mis-implementation of these policies and practices. While not intended to be exhaustive, the reflections in this manuscript aim to precipitate a foundational roadmap for the pragmatic implementation and sustainability of data equity practices that remain engaged with critical literatures and explicitly acknowledge historical and potential harms. If health equity research aims to move toward a more just and equitable society, centering equity at each step of the data life cycle is essential.

ACKNOWLEDGMENTS

This manuscript was supported in part by the New York University (NYU) Global Center for Implementation Science Pilot Awards (Dr. Lee), National Institute on Minority Health and Health Disparities of the National Institutes of Health (Award Number U54MD000538) (Dr. Lee), National Cancer Institute of the National Institutes of Health (Award Number 3R01CA240092-03S1) (Dr. Planey), National Heart, Lung, and Blood Institute of the National Institutes of Health (Award Number T32HL150452) (Dr. Cory), the IDEAL Provostial Fellowship at Stanford University (Dr. Duarte), and the Yerby Postdoctoral Fellowship in the Department of Health Policy and Management and the FXB Health & Human Rights Fellowship in the François-Xavier Bagnoud Center for Health & Human Rights, both at the Harvard T.H. Chan School of Public Health (Dr. Sumibcay). The content is solely the responsibility of the authors and does not necessarily represent the official views of NYU, the National Institutes of Health, Stanford University, or Harvard University.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.