"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem."
-John Tukey
Using CNN for Pneumonia Identificacion using Chest X-Ray
Pneumonia is an inflammatory condition of the lung affecting primarily the small air sacs known as alveoli. Pneumonia is usually caused by infection with viruses or bacteria and less commonly by other microorganisms, certain medications or conditions such as autoimmune diseases. Chest X-ray, blood tests, and culture of the sputum may help confirm the diagnosis. This disease may be classified by where it was acquired, such as community- or hospital-acquired or healthcare-associated pneumonia.
![Pnumo.png](https://static.wixstatic.com/media/905dab_7594d25bc08f45238a496105a5935ddf~mv2.png/v1/fill/w_312,h_314,al_c,lg_1,q_85,enc_avif,quality_auto/Pnumo.png)
VS
![Figure 2021-01-26 234919.png](https://static.wixstatic.com/media/905dab_90b81396bc86441f98a593d4a831b9c2~mv2.png/v1/fill/w_312,h_318,al_c,lg_1,q_85,enc_avif,quality_auto/Figure%202021-01-26%20234919.png)
CNN Arquitecture
For this specific project the are several onsiderations in order to build a succesful CNN. The first issue is that the images we are trying to analize the grey scale are between 0 and 255, so in order to avoid this huge variability in the data a batch normalization will help in order to make a proper convolution. Let's remember that back propagation uses the derivative if this derivative is nearly 0 the training won't be as useful as we want to.
![0_iqNdZWyNeCr5tCkc_.gif](https://static.wixstatic.com/media/905dab_94efbd5def3f4bde834249dc2101304e~mv2.gif/v1/fill/w_344,h_391,al_c,usm_0.66_1.00_0.01,pstr/0_iqNdZWyNeCr5tCkc__gif.gif)
Convolution 2D
Additionally to this we will be using a MaxPooling for Image reduction, a dropout at the end to avoid neurons inactivation and finally flatten the ouput just before the final classification.
Training Performance
![Valeant.png](https://static.wixstatic.com/media/905dab_8ba898c87c6c4c7982f568e34793b694~mv2.png/v1/fill/w_336,h_1122,al_c,q_90,usm_0.66_1.00_0.01,enc_avif,quality_auto/Valeant.png)
For this project "ReLu" was used as the activation function.
Results: Confusion Matrix and Final Thougths
Finally, the result of the training was great, reaching up to 96 % of accuracy, unfortunately due to size of the validation dataset (16 images) the accuracy could not surpass the 56%. Nonetheless, theloss graph shows a better behavior during the training, this will impact the metrics in the evaluation of the model on the test set.
![learning.png](https://static.wixstatic.com/media/905dab_61070486526744769d5237581c5483d8~mv2.png/v1/fill/w_888,h_168,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/learning.png)
The metrics obtain for this CCN were the following:
​
Recall: 0.70
Precision: 0.95
Accuracy Test: 0.84
Accuracy Train: 0.96
​
This are not the best metrics for a CNN, but let's remember that the relation between Recall/Precision is more like a trade off. In this case, the model is classification more pneumonia cases than there actually are in the data set. This given a little context is not bad actually, the probable consecuence for this is that it will involve a rework from the physician in order to confirm if is pneumonia or not. So for this specific case a recall that low is not a big issue, in exchange the model identify the 96% of the pneumonia cases.
​
![CM.png](https://static.wixstatic.com/media/905dab_f3ff666b95134192a10300e177e80a52~mv2.png/v1/fill/w_431,h_446,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/CM.png)