DATE: Derivative Alignment Training for Extrapolation with Neural Networks
Lopedoto, E., Weyde, T. ORCID: 0000-0001-8028-9905 & Salako, K. ORCID: 0000-0003-0394-7833 (2024). DATE: Derivative Alignment Training for Extrapolation with Neural Networks. Paper presented at the SGAI BCS AI2024, 17-19 Dec 2024, Cambridge.
Abstract
In this work we introduce DAT E (Derivative Alignment Training for Extrapolation), a method to improve the extrapolation behaviour of neural networks (NN) with Rectified Linear Unit activation (ReLU) on univariate regression tasks. ReLU NNs naturally lend themselves to linear extrapolation beyond the training data range. However, there are two known limitations of extrapolation properties of trained ReLU NNs, that we address in this paper. When minimising the error of the prediction, the derivative of the NN model function can still show high variation, which can cause variable extrapolation. Non-linearities of the model function outside the training data range can lead to inconsistent extrapolation behaviour. In prior work, these issues have been addressed with a set of regularisation functions, called ReLEx. To address the issues named, we introduce two new regularisation terms: the D1-loss and the IR-loss. D1-loss directly penalises the deviation of the model derivative from a target derivative estimated from the data as the interpolation between neighbouring data points. The IR-loss penalises positions of the non-linearities of the ReLU units outside a given range. Optimising the combination of D1 with IR loss and/or some of the ReLEx functions constitutes the DAT E method. We evaluate DAT E on regression tasks with noiseless data generated from analytic functions. We test different DAT E configurations and find that training with DAT E can reduce the variability of the model slope, prevent nonlinearities outside the training data range, and improve extrapolation consistency as measured by different metrics.
Publication Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | This version of the contribution has been accepted for publication, after peer review but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms |
Publisher Keywords: | Extrapolation, Regression, Neural Networks, Derivative |
Subjects: | Q Science > QA Mathematics Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Departments: | School of Science & Technology School of Science & Technology > Computer Science |
SWORD Depositor: |
This document is not freely accessible due to copyright restrictions.
To request a copy, please use the button below.
Request a copyExport
Downloads
Downloads per month over past year