City Research Online - Representation decomposition for knowledge extraction and sharing using restricted Boltzmann machines

Representation decomposition for knowledge extraction and sharing using restricted Boltzmann machines

Tran, Son (2016). Representation decomposition for knowledge extraction and sharing using restricted Boltzmann machines. (Unpublished Doctoral thesis, City University London)

Abstract

Restricted Boltzmann machines (RBMs), with many variations and extensions, are an efficient neural network model that has been applied very successfully recently as a building block for deep networks in diverse areas ranging from language generation to video analysis and speech recognition. Despite their success and the creation of increasingly complex network models and learning algorithms based on RBMs, the question of how knowledge is represented, and could be shared by such networks, has received comparatively little attention. Neural networks are notorious for being difficult to interpret. The area of knowledge extraction addresses this problem by translating network models into symbolic knowledge. Knowledge extraction has been normally applied to feed-forward neural networks trained in supervised fashion using the back-propagation learning algorithm. More recently, research has shown that the use of unsupervised models may improve the performance of network models at learning structures from complex data. In this thesis, we study and evaluate the decomposition of the knowledge encoded by training stacks of RBMs into symbolic knowledge that can offer: (i) a compact representation for recognition tasks; (ii) an intermediate language between hierarchical symbolic knowledge and complex deep networks; (iii) an adaptive transfer learning method for knowledge reuse. These capabilities are the fundamentals of a Learning, Extraction and Sharing (LES) system, which we have developed. In this system learning can automate the process of encoding knowledge from data into an RBM, extraction then translates the knowledge into symbolic form, and sharing allows parts of the knowledge-base to be reused to improve learning in other domains. To this end, in this thesis we introduce confidence rules, which are used to allow the combination of symbolic knowledge and quantitative reasoning. Inspired by Penalty Logic - introduced for Hopfield networks confidence rules establish a relationship between logical rules and RBMs. However, instead of representing propositional well-formed formulas, confidence rules are designed to account for the reasoning of a stack of RBMs, to support modular learning and hierarchical inference. This approach shares common objectives with the work on neural-symbolic cognitive agents. We show in both theory and through empirical evaluations that a hierarchical logic program in the form of a set of confidence rules can be constructed by decomposing representations in an RBM or a deep belief network (DBN). This decomposition is at the core of a new knowledge extraction algorithm which is computationally efficient. The extraction algorithm seeks to benefit from the symbolic knowledge representation that it produces in order to improve network initialisation in the case of transfer learning. To this end, confidence rules o_er a language for encoding symbolic knowledge into a deep network, resulting, as shown empirically in this thesis, in an improvement in modular learning and reasoning. As far as we know this is the first attempt to extract, encode, and transfer symbolic knowledge among DBNs. In a confidence rule, a real value, named confidence value, is associated with a logical implication rule. We show that the logical rules with the highest confidence values can perform similarly to the original networks. We also show that by transferring and encoding representations learned from a domain onto another related or analogous domain, one may improve the performance of representations learned in this other domain. To this end, we introduce a novel algorithm for transfer learning called “Adaptive Profile Transferred Likelihood”, which adapts transferred representations to target domain data. This algorithm is shown to be more effective than the simple combination of transferred representations with the representations learned in the target domain. It is also less sensitive to noise and therefore more robust to deal with the problem of negative transfer.

Publication Type:	Thesis (Doctoral)
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments:	Doctoral Theses School of Science & Technology > School of Science & Technology Doctoral Theses School of Science & Technology > Computer Science

Preview

Text - Accepted Version
Download (3MB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

Altmetric

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Tran, Son
Status:	Unpublished
URI:	https://openaccess.city.ac.uk/id/eprint/14423
Date available in CRO:	18 Apr 2016 15:49
Date deposited:	26 July 2017
Dates:	Date Event 2016 Completed