Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data

Q. Giai Gianetto 1, 2, 3, * S. Wieczorek 3 Y. Couté 3 T. Burger 4, 3
* Corresponding author
2 Plateforme de Protéomique / Proteomics platform
UTechS MSBio - Spectrométrie de Masse pour la Biologie – Mass Spectrometry for Biology
Abstract : Motivation: Quantitative mass spectrometry-based proteomics data are characterized by high rates of missing values, which may be of two kinds: missing completely-at-random (MCAR) and missing not-at-random (MNAR). Despite numerous imputation methods available in the literature, none account for this duality, for it would require to diagnose the missingness mechanism behind each missing value. Results: A multiple imputation strategy is proposed by combining MCAR-devoted and MNAR-devoted imputation algorithms. First, we propose an estimator for the proportion of MCAR values and show it is asymptotically unbiased under assumptions adapted to label-free proteomics data. This allows us to estimate the number of MCAR values in each sample and to take into account the nature of missing values through an original multiple imputation method. We evaluate this approach on simulated data and shows it outperforms traditionally used imputation algorithms. Availability The proposed methods are implemented in the R package imp4p (available on the CRAN Giai Gianetto (2020)), which is itself accessible through Prostar software. Contact quentin.giaigianetto@pasteur.fr ; thomas.burger@cea.fr
Complete list of metadata

https://hal-pasteur.archives-ouvertes.fr/pasteur-03243577
Contributor : Quentin Giai Gianetto <>
Submitted on : Monday, May 31, 2021 - 4:43:21 PM
Last modification on : Wednesday, August 25, 2021 - 3:26:45 AM

Identifiers

Collections

Citation

Q. Giai Gianetto, S. Wieczorek, Y. Couté, T. Burger. A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data. 2021. ⟨pasteur-03243577⟩

Share

Metrics

Record views

82

Files downloads

86