City Research Online

Abstract pattern learning with neural networks for improved sequential modeling

Kopparti, R. M. (2020). Abstract pattern learning with neural networks for improved sequential modeling. (Unpublished Doctoral thesis, City, University of London)

Abstract

Deep neural networks have been widely used for various applications and have produced state-of-the-art results in domains like speech recognition, machine translation, and image recognition. Despite the impressive successes achieved with deep neural networks, there has been an increasing awareness that there are tasks that still elude neural network learning, specifically the learning of abstract grammatical patterns and generalisation of the abstract patterns beyond the training data.

In this thesis, the problem of learning abstract patterns based on equality (also called identity) with neural networks is addressed. It was found in this study that feed-forward neural networks do not learn equality. This leads to feed-forward and recurrent neural networks’ inability to learn abstract patterns. This problem is studied empirically and constructive solutions are developed in this thesis.

A solution is proposed, which is called ‘Relation Based Patterns’ (RBP) models abstract relationships based on equality by using fixed weights and a special type of neuron. An extension of RBP called ‘Embedded Relation Based Patterns’ (ERBP) is also proposed which models RBP as a Bayesian prior on network weights implemented as a regularisation term in otherwise standard neural network learning. Both RBP and particularly ERBP are very easy to integrate into standard neural network models. It is observed in experiments that integration of (E)RBP structures leads to almost perfect generalisation in abstract pattern learning tasks with synthetic data and to improvements also in neural language and music modeling. (E)RBP has been successfully applied on various neural network models like Feed-forward neural network (FFNN), RNN and their gated variants like GRUs and LSTMs, Transformers and Graph Neural Networks. It leads to improvements on real-word tasks like melody prediction, character and word prediction, abstract compositionality and graph edit distance.

Publication Type: Thesis (Doctoral)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: Doctoral Theses
Doctoral Theses > School of Mathematics, Computer Science and Engineering Doctoral Theses
School of Mathematics, Computer Science & Engineering > Computer Science
Date Deposited: 14 Jan 2021 15:21
URI: https://openaccess.city.ac.uk/id/eprint/25524
[img]
Preview
Text - Accepted Version
Download (2MB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login