A systematic analysis of regression models for protein engineering

Michael, Richard and Kæstel-Hansen, Jacob and Mørch Groth, Peter and Bartels, Simon and Salomon, Jesper and Tian, Pengfei and Hatzakis, Nikos S. and Boomsma, Wouter and Fariselli, Piero (2024) A systematic analysis of regression models for protein engineering. PLOS Computational Biology, 20 (5). e1012061. ISSN 1553-7358

[thumbnail of journal.pcbi.1012061.pdf] Text
journal.pcbi.1012061.pdf - Published Version

Download (10MB)

Abstract

To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically different conclusions, depending on the choice of metric, and how one defines generalization. We highlight the fundamental issues of sample bias in typical regression scenarios and how this can lead to misleading conclusions about regressor performance. Finally, we make the case for the importance of calibrated uncertainty in this domain.

Item Type: Article
Subjects: Eprints AP open Archive > Multidisciplinary
Depositing User: Unnamed user with email admin@eprints.apopenarchive.com
Date Deposited: 06 May 2024 10:06
Last Modified: 06 May 2024 10:06
URI: http://asian.go4sending.com/id/eprint/2135

Actions (login required)

View Item
View Item