1 Trends in Computer-Aided Diagnosis Using Deep 2 Learning Techniques : A Review of Recent Studies on 3 Algorithm Development 4

With recent focus on deep neural network architectures for development of algorithms 12 for computer-aided diagnosis (CAD), we provide a review of studies within the last 3 years (201513 2017) reported in selected top journals and conferences. 29 studies that met our inclusion criteria 14 were reviewed to identify trends in this field and to inform future development. Studies have 15 focused mostly on cancer-related diseases within internal medicine while diseases within gender16 /age-focused fields like gynaecology/pediatrics have not received much focus. All reviewed studies 17 employed image datasets, mostly sourced from publicly available databases (55.2%) and few based 18 on data from human subjects (31%) and non-medical datasets (13.8%), while CNN architecture was 19 employed in most (70%) of the studies. Confirmation of the effect of data manipulation on quality 20 of output and adoption of multi-class rather than binary classification also require more focus. 21 Future studies should leverage collaborations with medical experts to aid future with actual clinical 22 testing with reporting based on some generally applicable index to enable comparison. Our next 23 steps on plans for CAD development for osteoarthritis (OA), with plans to consider multi-class 24 classification and comparison across deep learning approaches and unsupervised architectures 25 were also highlighted. 26


Introduction and Background
Growth in advanced computational techniques, including machine learning, has lent great support to predictive modelling which supports pattern recognition, with application in several fields including medicine, sales and marketing, etc. Algorithms modelled after human neural architecture, that is, Artificial Neural Networks (ANN), later emerged, with Deep Neural Network (DNN)-based algorithms gaining popularity in recent times across several fields including medicine where developments in disease diagnosis is on the rise [1].Deep learning algorithms are adaptive systems that have shown great effectiveness in feature classification for low-to high-level features.
They have found application in many popular systems like Google, Instagram, Pinterest, and Facebook.Their effectiveness lies in the multiple layers hidden between the input and output layers, which enables the modeling of complex, non-linear relationships.Their application in medical diagnosis supports the development of several diagnostic algorithms in the last couple of years and within various medical fields [2,3].Considering that such systems are relatively new and there are already several studies done within the short period of its emergence, identify trends in the field is crucial to future works.Though some studies have reported on review of studies within deep learning [1,4], extensive work is scarce on trends within the medical field and so are those that highlight important gaps or employ systematic approaches.We focus on the most recent work to identify areas requiring attention in terms of development and other key issues for future consideration and to assist us and other researchers and/or developers in the proper channeling of future efforts in useful projects.

Significance of the Review
The future of every job, including medical diagnosis, will be depending a lot on algorithm-based solutions.Thus, the faster the progress in various fields of medicine, the earlier we can arrive at solving the problems of easy access, on-time attention and more affordable medical services, especially among poor populations.This review focus on areas where work on development of CAD had been focused, and highlights areas where such is lacking, so that neglected fields can benefit from similar developments in the future.Other than this, the review highlights effective methodologies to aid in the design of such algorithms with higher accuracy and precision.Future systems can then address the limitations of existing ones.In addition, when properly focused, reviews can bring together related studies conducted in various domains, across global regions and by different groups of researchers who otherwise may not have any contact, thereby helping to highlight state-of-the art, as well as address frivolous claims that may not be totally true.

Objectives of the Review
Availability of equipment and dearth of medical experts indicated by as low as a 1:3500 physician-patient ratio in some countries [5] are among key healthcare issues in many developing nations.With poverty level complicating these issues, CAD underscores the potential benefits of technology-mediated medical services and efforts at developing more CAD algorithms can ensure that global health goals are achieved quickly.In addition to supporting early detection, accurate and efficient diagnosis, CAD algorithms can also serve as effective instructional systems.This review therefore focuses on identifying i) trends within this field, by capturing the fields of medicine focused by work on CAD development and those that have received less focus; types of data employed in the CAD developments; and deep learning architectures or methodologies engaged in these works and their significance; ii) main findings/results reported, their significance, suggestions regarding limitations and future work and iii) conclusions regarding trends within DNN-based development.
These conclusions are intended to guide our fourth objective, to be captured in iv) next steps.

Related Work
Machine Learning (ML) refers to the ability of machines to take data as input, and teach themselves how to make decisions based on these data through defined procedures or processes referred to as algorithms.These algorithms are often categorized as being supervised (learning based on a definite or known goal or output), or unsupervised (no output is defined).ML is based on pattern recognition and has been employed in many fields including fraud detection, translation, information retrieval, facial recognition, classification of DNA sequences, handwriting recognition, and many others.In medicine, ML has been applied for various purposes including image annotation, registration, computer-aided diagnosis (CAD), and guided therapy.In recent times, new algorithms like deep learning are beginning to gain popularity in disease diagnosis by medical imaging and developments have been reported in several studies [6][7][8].

Artificial Neural Networks
ANNs are artificial models of human brain decision-making power [9].The general scheme is composed of three main parts: the input layer and the output layer, with one or many hidden layers between them.The number of neurons in a layer being a function of system complexity.The input layer provides information on the conditions for which the network is being trained and each neuron represents an independent variable related to the expected output.The number of neurons in the output layer is a function of the intended use of the output.Data fed to the neurons in the input layer is transferred to the hidden layer where they undergo some complex mathematical computation and then transferred to neurons in the next layer, and the next, until the result is finally transferred to the output layer.Several complex mathematical computations go into determining the optimum network architecture for a system.
In ANNs, learning is based on the training algorithm, a computational rule that forms the basis on which the network learns to approximate the transfer function, f, between an input and a corresponding output vector.The network 'learns' from 'examples' provided by a combination of inputs and outputs in a training database, that is, the information or features that indicates what the network learns; for example, symptoms/results of laboratory analysis and the diagnostic decisions (outputs) in medical diagnosis.Between these layers are the hidden layers responsible for the complex processing of the input data, the basis on which the ANN architecture is regarded as a black box [10].With linear problems, one hidden layer is sufficient to address the required processing; but, with complex problems, more layers will be required [9] and the number of neurons in each layer must be estimated to achieve optimum network architecture.This 'best fit' value is determined by several methods; one method uses estimates of a regression plot of the training stopping/error function (MSE) and the number of nodes in the hidden layer, the optimal value being the lowest error   known architecture within image processing.AlexNet [13,14] is the most well-known, general classification CNN architecture.
CNNs are ANN models of human visual cortex [15].

Methodology
We employed a systematic approach in our study based on its ability to support reproducibility and focus on a specific area for in-depth review rather than just the general overview approach in unsystematic reviews.Systematic reviews focus on a definite approach to selection, review and evaluation of studies for answering specific research questions.Considering the vast amount of work that have been done in the development of CAD algorithms, it is impractical to conduct a review that captures every study there is.In addition, other studies have considered general reviews; for example, see Noting that research articles are deposited in several repositories, some of which are not wellknown, and the impracticality of reviewing every possible study that falls within the group in focus, we sampled articles from top medical journals/conferences related to neural network and medicine with purposeful selection of few articles that meet the first and second criteria.Based on these criteria, we sampled from top 10 databases as provided by OMICS International (2017) in April-May, 2017 (Note: OMICS' lists are updated regularly).The full list of articles reviewed is provided in Appendix A (Table A1).Over 600 articles were returned from our initial search; however, only 67 met our basic criteria on abstract screening.Further screening and full content filtering based on the inclusion criteria and objectives yielded a total of 29 papers which were reviewed and the findings reported in this paper.
For this study, we focused on identifying among other things: (i) the field of medicine covered, noted; for integration with our findings to draw conclusions that can inform future developments, system upgrade, and research studies.

Results and Discussion
In this section, we address each of the six objectives identified regarding the study.Each subsection addresses an objective while sub-sub-sections address separate concepts captured in the subsection.

Distribution of Studies, the fields of medicine focused and those that have received less focus
In this sub-section, we address the first objective, hence, we focus on the distribution of studies to capture the year of publication, the medical field or disease focused, the type and source(s) of data employed in the CAD developments and the methodologies engaged in the studies, with a focus on the deep learning architecture and their significance.this, we noted that most of the studies reported fall within internal medicine, that is, diseases of younger adults as opposed to those of older adults, whose ailments, are usually complicated by sarcopenia and frailty [17] as shown in Figure 3. Development that focuses on populations of younger persons (pediatrics) was only one and none for older adults (geriatrics), in the reviewed studies, highlighting a huge gap within two major global populations.Further details on fields captured within internal medicine is shown in Figure 4.There is obviously no specialized field of medicine that focuses on men's diseases; whereas obstetrics and gynaecology are devoted to the diseases of women, indicating their importance to global medical practice.In our review, apart from cancer-related fields like mammography, diseases of women have not been the focus of CAD algorithm developments.In addition, apart from heartand lung-related diseases, diseases of other internal organs, including male and female reproductive organs, the digestive system, circulatory system, and bones and joints have not received extensive focus in terms of algorithm developments.

Types and Sources of data employed in the CAD developments
One of the most striking things noted in the review is that only image datasets (MRIs, x-rays, CT-scans, HRCT images, and ultrasound) were employed in the studies; highlighting the current focus of deep learning applications within medical imaging.This necessitated the use of imaging techniques in the studies.We also noted that three types of data sources were employed in the projects as shown in Figure 5. Data from human subjects [18] were small while public medical datasets [19][20][21][22] were relatively larger in size.Some of the studies [23][24][25] also engaged non-medical image datasets for algorithm training.This appears to be a recent approach to system training that attempts to by-pass the limitation caused by non-availability or inaccessibility of medical data, especially by researcher-developers who in many cases are not health professionals.However, fine details on how this works were not provided in the studies, though it was suggested that this might be a novel attempt that could yield great benefits, but it requires further validation.

Deep learning architectures or methodologies engaged in the studies and their significance
We noted the use of CNN techniques [19] either alone or in combination with other approaches like least squares-SVM [26], ELM [27], random forests [28], adaboost [29], etc.This is not very surprising, since data are mostly image datasets.Distribution of studies by deep learning technique is shown in Figure 6.In some of the studies, the same datasets were divided into training and testing datasets, while in some, one dataset is used for training and another for testing.This is the case in studies that employed non-medical image datasets [24,[30][31][32].In such cases, methodologies are mostly domain-transfer CNN.

Main findings noted in the studies reported and the significance for future works
We were interested in a general overview of the quality of results in terms of the data size, type or quality, hence, we mapped deep learning techniques employed with the dataset used and the quality of result.We also identified the quality indicator employed for reporting in each study.
Though it is difficult to make a conclusion on the comparative effectiveness of different methods (or   [33], sensitivity and specificity [34], error rate, Jaccard index [32], error score [19], Area Under Curve, precision, percent performance [35], and F1 score [36] among others.It appears there are no fixed standard or agreed upon indices for reporting these types of studies.It may help for all work to report quality achieved based on some fixed standard to aid comparison across approaches.This might offer a lot of leverage for future works in deciding on methods.Quality metrics employed in the reviewed studies are described below [37,38].
• Diagnostic Accuracy describes how close a measure is to the true /standard value and it can be described using other indicators like sensitivity, AUC, specificity, etc.
• Sensitivity and specificity refers to how well a system or test accurately classifies a healthy/disease condition.It is measured based on how many disease conditions are classified as healthy (False Positives) and how many healthy conditions are classified as disease (False Negatives).It can also be reported as correct classification of healthy conditions as healthy (True Positives) and diseased as diseased (True Negatives).
• Area Under Curve (AUC) is the area under the ROC curve which is a plot of specificity (x-axis) against specificity (y-axis).The AUC can take values up to 1.0 (best).Values <0.5 are not acceptable.The closer the AUC is to 1.0, the better the specificity and sensitivity.
• Precision is a class agreement between the positive labels and the data labels provided by the classifier to give estimation of the predicted value of the class label based on the desired class calculated.
• F1 Score describes a relationship between the test data positive labels and those provided by the classifier.It provides a measure of the accuracy of the test considering the recall (r) "sensitivity" and the precision (p) values to calculate the score.
• Jaccard Index is a statistical measure to compare the sample set similarity and diversity; it is used to identify the similarity between procedures' pairs.

•
Error Score/Rate is the average of the classification error per-class; it refers to as the False Acceptance Rate or the False Rejection Rate.
• Performance evaluates the performance of the system or the classification task based on the overall matrix measurements results by testing the classes which are recognized correctly.

Effect of Different Metrics Employed
The type of image, (2D/3D) appears to influence quality achieved; for example, we noted that 70,000 3D images achieved a higher accuracy (99.9%) than 215,000 2D images [25].We also noted that authors reported generally higher quality metrics for hybrid approaches than single ones.Ahn et al.
[19] employed a combination of DT-CNN and Sparse Spatial Pyramid and reported an error score that ranked second among 13 techniques.Bar et al. [24] also achieved AUC up 0.94 with their CNN-GIST combination.Similarly, Saraf and Tofighi [18] achieved an accuracy of up to 96.85% by combining SVM and CNN.Single method approaches (CNN and DBN) like Miki et al. [39], Sharma et al. [30] and Alcantara [31], reported comparatively lower metrics.

Classification Approaches
Many of the algorithms focused on binary classification which appears to support higher accuracy and precision than multi-stage classification.For example, 89.60% vs 62.07% for binary vs multi-class approach was reported by Alcantara et al [31].However, real-life medical diagnosis is not a mere identification of the presence or absence (binary classification) of a disease, but, a multi-stage classification that can identify levels of severity to support proper treatment.Hence, multi-class approaches are more accurate simulations of real-life medical diagnosis, suggesting the need for future studies to focus on improving the accuracy of these types of classifications.

Effect of Data Manipulation
Data cleaning (e.g.de-noising) is a standard practice in pre-processing of data prior to datamining procedures.It assumes 'dirtiness' of raw data and its inability to provide useful or accurate information.The findings of Acharya, Fujita, and Shu Lih, et al. [34] appear to negate this; they reported an average accuracy of 93.53% with noise removal and 95.22% without noise removal.Miki, Muramatsu and Hayashi et al. [39] on their part noted an increased accuracy of 5% with data augmentation.These observations suggest the need for more studies to highlight issues within data manipulation.

Significance of Data Type/Source
Real patient data, image data from public databases and non-medical or natural image data were the 3 types of data noted.The use of non-medical/natural image datasets was noted by the users as a novel approach that can address the challenge of data scarcity while at the same time yielding useful results in terms of classification accuracy [24].However, we noted that the use of real patient datasets yielded good results despite the small sizes employed [18,29,40,41].The implication is that better results are possible with larger data sizes compared with the use of public medical datasets or natural image datasets.

Training Mode
We consider it worthy of note that every article reviewed employed supervised learning techniques for training the algorithms.At a time when the greater benefits of unsupervised learning is being highlighted, it is noteworthy that none of the studies employed unsupervised learning.
Vaidhya's presentation [42]  (2016) while trial with other techniques, data and diseases are recommended in studies employing novel approaches (Miki et al., 2017b;Wang et al., 2015).The need to establish generalizability of findings across different diseases was also noted, though, [21] and [43] reported the greater effectiveness of dedicated systems over multi-purpose ones.[31] noted deployment on mobiles as a means that might represent the ultimate usefulness of these systems for supporting self-diagnosis and timely access, especially, in poor populations.Other suggestions include the use of DT-CNNs with lower layers pre-trained on generic data and deeper (semantic) layers fine-tuned for specific image types and further tuning of algorithm trained with non-medical data with real data.[22] also suggested the use of ensemble teacher for labeling unlabelled samples to augment training set of student model to address the problem of limited annotated data.Overall, the need for larger datasets with more real patients, better features and more robust classifiers, and datasets and results made available to serve as public assets and reference point for future studies [29] cannot be overemphasized.

Conclusions regarding the general trend within DNN-based development of CAD algorithms, and directions for future work
The review highlighted important issues that require focus in future works including the scarcity of studies within some fields of medicine, like obstetrics, gynaecology, paediatrics geriatrics, psychiatry, and musculoskeletal disorders.Images datasets employed in all the studies, informed the focus on CNN approaches with supervised learning.Future studies should examine the efficacy of non-image data, for the development of useful applications within fields like mental health where clinical diagnosis remains an almost uncertain procedure complicated by comorbidity.Quality indicators reported are diverse, making comparison across studies difficult; we suggest that some generally applicable index, should always be reported.More focus should be placed on multi-class approaches while efforts are made to improve quality of results.More studies to confirm the effect of data manipulation on quality of output are required in addition to availability of large, real clinical data and direct collaboration between medical experts, hospitals, relevant researchers and machine learning experts to achieve better results.Finally, regarding our submission on the significance of reviews to clarify claims that may not be completely true, we noted that Suzuki et al. [32], in their report claimed that their 'study is the first demonstration of DCNNs for detecting the masses in mammographic images'; however, we found a similar work by [44], in which they also employed deep CNN and which was reported in a MICCAI conference paper in October, 2015.

Next Steps
In our follow-up work, we will be addressing some of the findings reported in this paper.Due to the complications of working within paediatrics field and the certification requirements of medical data handling, we will be focusing on a CAD development project for a common geriatric ailment, osteoarthritis (OA), associated with ageing.We will be considering focus on multi-class classification and a comparison of various deep learning approaches using the same data in addition to the possibilities of comparison across supervised and unsupervised learning approaches.

Figure 1 :
Figure 1: Mean Error of Training (a) and Testing (b).

4. 2 .
Deep Neural Networks Deep Neural Networks (DNN) are based on deep learning, which has gained popularity in general data analysis and was listed among the top technology breakthroughs of 2013 [12].Neural networks have great applicability in the handling of noisy datasets or those with missing variables.One disadvantage however lies in their longer training times requirement.Deep architectures are generally based on neural networks with multiple layers of stacked neurons that allows the backpropagation of a signal.Convolutional Neural Networks (CNN) have been exceptionally prevalent and have gained more popularity than others.Two of the commonest deep learning architectures [1] include systems based on unsupervised training and those based on supervised training.Unsupervised systems use layer-by-layer pre-training of DNNs, with supervised finetuning of the network; Deep Belief Networks (DBNs), Stacked Auto-Encoders (SAEs) and Restricted Boltzmann Machines (RBMs) which are essentially SAEs in nature are examples.Supervised systems are based on supervised end-to-end training of an entire DNN.Examples are Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs); CNNs being, in recent times, the most well-Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 17 October 2017 doi:10.20944/preprints201710.0117.v1 Machines (SVMs), Fuzzy Logic (FL), Decision Trees (DT), k-Nearest Neighbors (k-NN), Neural Networks (NN) and more recently, the deep learning algorithms.SVMs are clustering, supervised learning algorithms.FL operates within the domain of 'computer understanding of natural language' is based on 'degrees of truth' rather than the true-false or zero-one (0, 1) binary/Boolean logic of modern computing, thereby, being a closer representation of human cognitive abilities.DTs are nonlinear classifiers; they employ flow-chart or tree-like model of decisions and their possible outcomes, they attempt to capture important factors including unexpected consequences.In k-NN, classification is based on closest training cases; estimations of the probability of an event is based on information regarding such occurrence in a similar case based on the training data.

6. 1 . 1 .Figure 2 .Figure 2 :
Figure 2.This distribution reflects the recent focus within this area and the popularization of deep learning techniques from 2015 seeing many articles published in 2016.There is however, indication that several studies may become available before the end of 2017.

Figure 3 :
Figure 3: Distribution of Studies by Disease or Medical Field Focused

Figure 4 :
Figure 4: Distribution of Studies within Internal Medicine

Figure 6 :
Figure 6: Distribution of Studies by Deep Learning Architecture Employed

Figure 5 .
Figure 5. Distribution of Study by Source of Dataset.
They are among the commonest deep learning architectures, in the same group as RNNs and DBNs and are state-of-the-art within the field of computer vision.CNNs can learn both local and global structures in images, hence, their usefulness

17 October 2017 doi:10.20944/preprints201710.0117.v1
1) provided a comprehensive review of studies that employed deep learning in medical image analysis, identifying studies per application area within image classification, object detection, segmentation, registration, and other related tasks.For our study, we considered a tighter selection of articles that reflects the focus of our study, which includes: i) most recent studies, ii) employed DNN, and iii) focused on CAD development.We applied the search strings 'diagnosis medical algorithm', 'deep neural network diagnosis medical algorithm, 'diagnosis algorithm', 'diagnosis algorithm medical', 'diagnosis medical algorithm deep neural network' and 'deep neural network algorithm diagnosis medical' for identifying relevant articles in selected databases.Final samples for our study were selected based on three inclusion/exclusion criteria including being published within 2015-2017 (based on popularization of deep learning in 2015), study reports on deep learning approaches for CAD algorithm development and reports information on procedure, training and methodology, with findings clearly laid out.Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: the type of patient (gender, age-group, etc.) where applicable, while noting that a study can hardly be focused on a single field (e.g. a study on breast cancer, with ultrasound data combines oncology, mammography, and radiology).(ii) Data information; including the type and size of data employed and for which part of the work (feature extraction, training, etc.) where possible, as well as the source (simulated, real clinical data, medical/non-medical data).(iii) Methodology employed, including the procedure for CAD development; we aim to identify what architecture(s) is/are used in the different stages of the work.(iv) Key issues noted in the results of the study; including accuracy/precision reported, and limitations of the techniques used.(v) Suggestions for future work

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 17 October 2017 doi:10.20944/preprints201710.0117.v1 a
combination of methods) due to different reporting indices, we made the following general observations which might help in future studies.

preprints.org) | NOT PEER-REVIEWED | Posted: 17 October 2017 doi:10.20944/preprints201710.0117.v1
highlights the advantages of unsupervised learning in medical imaging especially when compared with the need and cost of 'strong, pixel-level annotations' for several images that may run into millions required for very accurate image-based classifications.He 6.2.8.Suggestions regarding limitations of the studies and future workSeveral limitations including the use of retrospective and non-clinical data in about 70% of the studies, trial with only one type of data, one disease, and testing by developers in simulated settings in most cases, are some of the limitations reported in the studies.The necessity of assessing the usefulness of the algorithms for applications in point-of-care solutions was suggested by Luong et al Preprints (www.

Table A1 .
Information on Deep Learning Architecture and Dataset, Summary of Result and Quality