High Resolution Capabilities of Free-space Optical Neural Networks
Ibadulla, R. (2024). High Resolution Capabilities of Free-space Optical Neural Networks. (Unpublished Doctoral thesis, City, University of London)
Abstract
Deep Learning (DL) models are powerful tools for computer vision tasks, such as image classification and segmentation. To meet the computational demands of modern deep learning, many DL models rely on AI accelerators. In addition to these hardware-based accelerators, optical accelerators, such as 4f free-space systems, take advantage of Fourier optics to efficiently perform convolutions, bypassing Moore’s law limitations. While 4f system offers high- resolution capabilities, it faces limitations in modulation speed and data readout.
This thesis addresses these limitations by developing methods to adapt traditional neural network architectures for high-resolution tasks within 4f free-space optical AI accelerators. We introduce FatNet, an algorithm specifically designed to convert conventional neural network models into a format optimised for the 4f system by accounting for the system’s advantages and constraints. FatNet reduces the number of channels while increasing the resolution of feature maps, aligning with the high-resolution capabilities of the 4f system. Since a bottleneck in 4f optical accelerators lies in the readout process, FatNet enhances model efficiency by decreasing the number of channels. This conversion assumes that the number of trainable parameters and pixels in the feature maps remains equal or as close as possible to those in the original layers.
FatNet was applied to convert architectures such as ResNet, AlexNet, and VGGNet into Res-FatNet, Alex-FatNet, and VGG-FatNet, respectively. These models were trained and evaluated on the CIFAR-100 dataset using a custom-built simulator of the 4f system. Our results demonstrate significant acceleration with minimal loss in accuracy. Furthermore, the FatNet approach was scaled to the U-Net architecture, resulting in Fat-U-Net, which was tested on image segmentation tasks using the Oxford-IIIT Pet and HeLa cells datasets, showcasing its effectiveness in image segmentation within the free-space optical accelerator. The efficacy of FatNet was further examined in the Fat-U-Net study through experiments involving Intuitive-Fat-U-Nets, which prioritised layer weight equality over pixel count in feature maps to avoid overfitting, demonstrating that the FatNet conversion is the optimal approach. Additionally, the impact of skip connections in U-Net and Fat-U-Net was investigated to evaluate Fat-U-Net’s ability to preserve localisation accuracy.
Moreover, this thesis explored the potential for implementing Vision Transformers (ViTs) within the 4f optical system. Methods are proposed for realising ViTs using only convolutional operations to enable full functionality on the 4f system, with a particular focus on investigating potential parallelism techniques suitable for optical settings. Additionally, the study included visualising attention maps to determine if the methods are training using feature extraction, similar to CNNs, or genuinely learning attention mechanisms as intended by ViTs.
This research also addressed challenges in optical computing, such as the lack of support for negative values, by introducing algorithmic solutions to mitigate the issue.
Overall, this work contributes to the advancement of optical neural networks, providing a pathway toward faster and more efficient deep learning models tailored for the emerging era of optical computing.
Publication Type: | Thesis (Doctoral) |
---|---|
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Departments: | School of Science & Technology > Computer Science School of Science & Technology > School of Science & Technology Doctoral Theses Doctoral Theses |
Download (39MB) | Preview
Export
Downloads
Downloads per month over past year