Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts
Naeimeh Atabaki-Pasdar
(1)
,
Mattias Ohlsson
(1, 2)
,
Ana Viñuela
(3, 4)
,
Francesca Frau
(5)
,
Hugo Pomares-Millan
(1)
,
Mark Haid
(6)
,
Angus G. Jones
(7)
,
E. Louise Thomas
(8)
,
Robert W. Koivula
(9, 1)
,
Azra Kurbasic
(1)
,
Pascal M. Mutie
(1)
,
Hugo Fitipaldi
(1)
,
Juan Fernandez
(1)
,
Adem Y. Dawed
(10, 11)
,
Giuseppe N. Giordano
(1)
,
Ian M. Forgie
(10, 11)
,
Timothy J. Mcdonald
(7)
,
Femke Rutters
(12)
,
Henna Cederberg
(13)
,
Elizaveta Chabanova
(14)
,
Matilda Dale
(15)
,
Federico Masi
(16)
,
Cecilia Engel Thomas
(15)
,
Kristine H. Allin
(17, 18)
,
Tue H. Hansen
(17)
,
Alison Heggie
(19)
,
Mun-Gwan Hong
(20)
,
Petra J. M. Elders
(12)
,
Gwen Kennedy
(10)
,
Tarja Kokkola
(21)
,
Helle Krogh Pedersen
(17)
,
Anubha Mahajan
(22)
,
Donna Mcevoy
(23)
,
Francois Pattou
(24)
,
Violeta Raverdy
(24)
,
Ragna S. Häussler
(20)
,
Sapna Sharma
(25)
,
Henrik S. Thomsen
(14)
,
Jagadish Vangipurapu
(21)
,
Henrik Vestergaard
(17)
,
Leen M. T Hart
(26, 12)
,
Jerzy Adamski
(27, 28, 6)
,
Petra B. Musholt
(5)
,
Soren Brage
(29)
,
Søren Brunak
(16, 30)
,
Emmanouil Dermitzakis
(31, 4)
,
Gary Frost
(32)
,
Torben Hansen
(17, 33)
,
Markku Laakso
(21, 34)
,
Oluf Pedersen
(17, 30)
,
Martin Ridderstråle
,
Hartmut Ruetten
(5)
,
Andrew T. Hattersley
(35)
,
Mark Walker
(19)
,
Joline W. J. Beulens
(12, 36)
,
Andrea Mari
,
Jochen M. Schwenk
(20)
,
Ramneek Gupta
(16)
,
Mark I. Mccarthy
(9, 37, 38, 22)
,
Ewan R. Pearson
(10)
,
Jimmy D. Bell
(8)
,
Imre Pavo
,
Paul W. Franks
(1, 39)
1
Lund University
2 Halmstad University
3 SIB - Swiss Institute of Bioinformatics [Genève]
4 Department of Genetic Medicine and Development [Geneva]
5 Sanofi-Aventis Deutschland GmbH [Francfort, Allemagne]
6 HMGU - Helmholtz Zentrum München = German Research Center for Environmental Health
7 University of Exeter
8 UOW - University of Westminster [London]
9 OCDEM - Oxford Centre for Diabetes, Endocrinology and Metabolism
10 Ninewells Hospital and Medical School [Dundee]
11 University of Dundee
12 Amsterdam UMC - Amsterdam University Medical Centers
13 HUS - Helsinki University Hospital [Finland]
14 Herlev and Gentofte Hospital
15 CBH - School of Engineering Sciences in Chemistry, Biotechnology and Health [Stockholm]
16 DTU - Danmarks Tekniske Universitet = Technical University of Denmark
17 CBMR - Novo Nordisk Foundation Center for Basic Metabolic Research
18 BUH - Bispebjerg University Hospital
19 Newcastle University [Newcastle]
20 Science for Life Laboratory [Solna]
21 University of Eastern Finland
22 The Wellcome Trust Centre for Human Genetics [Oxford]
23 Institute of Cellular Medicine [Newcastle]
24 RTD - Recherche translationnelle sur le diabète - U 1190
25 DZD - German Center for Diabetes Research - Deutsches Zentrum für Diabetesforschung [Neuherberg]
26 LUMC - Leiden University Medical Center
27 TUM - Technische Universität Munchen - Technical University Munich - Université Technique de Munich
28 NUS - National University of Singapore
29 CAM - University of Cambridge [UK]
30 UCPH - University of Copenhagen = Københavns Universitet
31 iGE3 - Institute of Genetics and Genomics in Geneva
32 Imperial College London
33 SDU - University of Southern Denmark
34 University of Kuopio
35 University of Exeter Medical School
36 Julius Center for Health Sciences and Primary Care
37 University of Oxford
38 Genentech, Inc. [San Francisco]
39 Harvard School of Public Health
2 Halmstad University
3 SIB - Swiss Institute of Bioinformatics [Genève]
4 Department of Genetic Medicine and Development [Geneva]
5 Sanofi-Aventis Deutschland GmbH [Francfort, Allemagne]
6 HMGU - Helmholtz Zentrum München = German Research Center for Environmental Health
7 University of Exeter
8 UOW - University of Westminster [London]
9 OCDEM - Oxford Centre for Diabetes, Endocrinology and Metabolism
10 Ninewells Hospital and Medical School [Dundee]
11 University of Dundee
12 Amsterdam UMC - Amsterdam University Medical Centers
13 HUS - Helsinki University Hospital [Finland]
14 Herlev and Gentofte Hospital
15 CBH - School of Engineering Sciences in Chemistry, Biotechnology and Health [Stockholm]
16 DTU - Danmarks Tekniske Universitet = Technical University of Denmark
17 CBMR - Novo Nordisk Foundation Center for Basic Metabolic Research
18 BUH - Bispebjerg University Hospital
19 Newcastle University [Newcastle]
20 Science for Life Laboratory [Solna]
21 University of Eastern Finland
22 The Wellcome Trust Centre for Human Genetics [Oxford]
23 Institute of Cellular Medicine [Newcastle]
24 RTD - Recherche translationnelle sur le diabète - U 1190
25 DZD - German Center for Diabetes Research - Deutsches Zentrum für Diabetesforschung [Neuherberg]
26 LUMC - Leiden University Medical Center
27 TUM - Technische Universität Munchen - Technical University Munich - Université Technique de Munich
28 NUS - National University of Singapore
29 CAM - University of Cambridge [UK]
30 UCPH - University of Copenhagen = Københavns Universitet
31 iGE3 - Institute of Genetics and Genomics in Geneva
32 Imperial College London
33 SDU - University of Southern Denmark
34 University of Kuopio
35 University of Exeter Medical School
36 Julius Center for Health Sciences and Primary Care
37 University of Oxford
38 Genentech, Inc. [San Francisco]
39 Harvard School of Public Health
Résumé
Background
Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.
Methods and findings
We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one.
Conclusions
In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community.
Domaines
Sciences du Vivant [q-bio]Origine | Fichiers éditeurs autorisés sur une archive ouverte |
---|