A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data - Institut Pasteur Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data

Résumé

Motivation: Quantitative mass spectrometry-based proteomics data are characterized by high rates of missing values, which may be of two kinds: missing completely-at-random (MCAR) and missing not-at-random (MNAR). Despite numerous imputation methods available in the literature, none account for this duality, for it would require to diagnose the missingness mechanism behind each missing value. Results: A multiple imputation strategy is proposed by combining MCAR-devoted and MNAR-devoted imputation algorithms. First, we propose an estimator for the proportion of MCAR values and show it is asymptotically unbiased under assumptions adapted to label-free proteomics data. This allows us to estimate the number of MCAR values in each sample and to take into account the nature of missing values through an original multiple imputation method. We evaluate this approach on simulated data and shows it outperforms traditionally used imputation algorithms. Availability The proposed methods are implemented in the R package imp4p (available on the CRAN Giai Gianetto (2020)), which is itself accessible through Prostar software. Contact quentin.giaigianetto@pasteur.fr ; thomas.burger@cea.fr
Fichier principal
Vignette du fichier
2020.05.29.122770v1.full.pdf (1.07 Mo) Télécharger le fichier

Dates et versions

pasteur-03243577 , version 1 (31-05-2021)

Identifiants

Citer

Q. Giai Gianetto, S. Wieczorek, Y. Couté, T. Burger. A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data. 2021. ⟨pasteur-03243577⟩
201 Consultations
170 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More