City Research Online

What Really Counts: Theoretical and Practical Aspects of Counting Behaviour in Simple RNNs

El-Naggar, N. (2024). What Really Counts: Theoretical and Practical Aspects of Counting Behaviour in Simple RNNs. (Unpublished Doctoral thesis, City, University of London)

Abstract

Recurrent Neural Networks (RNNs) are commonly used in sequential tasks and have been shown to be Turing complete, and are therefore theoretically capable of computing any task if the correct configuration is used. However, there is a long standing debate on the systematicity of NN learning. There has recently been an increased interest in the abilities of RNNs to learn different systematic tasks such as counting. In this thesis, we develop a better understanding of RNN learning of counting behaviour.

We conduct experiments to evaluate the learning and generalisation of counting behaviour with different commonly used RNN models, using different initialisations and configurations. We formalise counting as Dyck-1 acceptance and focus on generalisation to long sequences. We find that in general RNN models do not learn to count exactly and eventually fail on longer sequences. We also find that weights correctly initialised for Dyck-1 acceptance are unlearned during training. Analysing the results, we find different failure modes for different models.

For single-cell linear RNNs and ReLU RNNs, we propose two theorems, where we establish Counter Indicator Conditions (CICs) on the weights of the model that result in exact counting behaviour. We formally prove that the CICs are necessary and sufficient for exact counting to be realised. However, we find in experiments that the CICs are not found and are even unlearned by correctly initialised models. In experiments on ReLU RNNs, we find that there is mismatch between the task and the loss function such that the correct model does not coincide with the minimal loss value. This indicates that gradient descent based optimisation is unlikely to reach exact counting behaviour with a standard setup.

As an alternative approach, we develop a discrete non-negative counter module to 5 use with RNNs. To allow for training with backpropagation, we equip the discrete non-negative counter with artificial gradients. The discrete non-negative counter is implemented and unit-tested. We find in experiments that RNNs equipped with this module achieve improved performance over the standard RNN models.

In this thesis, we address the ability of different Recurrent Neural Networks to learn counting behaviour. In the first part we conduct experiments and find that when RNNs learn to count they eventually fail to generalise to arbitrarily long sequences. In the second part, we propose and prove two theorems establishing Counter Indicator Conditions (CICs) in Recurrent Neural Networks. In our experiments, we find that the CICs are not reached by the models during training. We investigate the learning dynamics and find that there is a mismatch between the CICs and the minimum of the loss function. In part 3 we then propose a discrete non-negative counter module, which we design, implement and integrate into RNNs. The integration of the discrete non-negative counter module results in improved performance over our baseline models.

Publication Type: Thesis (Doctoral)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Departments: School of Science & Technology > Computer Science
School of Science & Technology > School of Science & Technology Doctoral Theses
Doctoral Theses
[thumbnail of El Naggar thesis 2024.pdf]
Preview
Text - Accepted Version
Download (3MB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login