Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Accurate prediction of cell composition, age, smoking consumption and infection serostatus based on blood DNA methylation profiles

Abstract : DNA methylation is a stable epigenetic alteration that plays a key role in cellular differentiation and gene regulation, and that has been proposed to mediate environmental effects on disease risk. Epigenome-wide association studies have identified and replicated associations between methylation sites and several disease conditions, which could serve as biomarkers in predictive medicine and forensics. Nevertheless, heterogeneity in cellular proportions between the compared groups could complicate interpretation. Reference-based cell-type deconvolution methods have proven useful in correcting epigenomic studies for cellular heterogeneity, but they rely on reference libraries of sorted cells and only predict a limited number of cell populations. Here we leverage >850,000 methylation sites included in the MethylationEPIC array and use elastic net regularized and stability selected regression models to predict the circulating levels of 70 blood cell subsets, measured by standardized flow cytometry in 962 healthy donors of western European descent. We show that our predictions, based on a hundred of methylation sites or lower, are less error-prone than other existing methods, and extend the number of cell types that can be accurately predicted. Application of the same methods to age, smoking consumption and several serological responses to pathogen antigens also provide accurate estimations. Together, our study substantially improves predictions of blood cell composition based on methylation profiles, which will be critical in the emerging field of medical epigenomics.
Document type :
Preprints, Working Papers, ...
Complete list of metadata
Contributor : Marie-Christine Vougny <>
Submitted on : Tuesday, June 1, 2021 - 11:16:56 AM
Last modification on : Monday, June 14, 2021 - 4:24:03 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License




Jacob Bergstedt, Alejandra Urrutia, Darragh Duffy, Matthew Albert, Lluís Quintana-Murci, et al.. Accurate prediction of cell composition, age, smoking consumption and infection serostatus based on blood DNA methylation profiles. 2021. ⟨pasteur-03244420⟩



Record views


Files downloads