Visión Artificial y Grafismos: septiembre 2020

sábado, 12 de septiembre de 2020

Assignment 03 – Course COMPUTER VISION II: APPLICATIONS (PYTHON)

Performing OCR on Invoice Documents

Situation

Mark works at an office where his job is to go through the invoices received and record the sender and the amount billed to a spreadsheet. It takes him only 1 minute to see the invoice and update the sheet. Still he finds it tedious and wants to automate this task. The main motivation for him is that he can then play on his system without any distractions. Will you help Mark in his pursuit of happiness?

Your Task

In this assignment, you will implement a OCR system, which looks at an image of the invoice and finds the following:

1. The Billing Amount ( 15 Marks )

2. Sender and Receiver Email IDs ( 15 Marks )

That's it!

It requires knowledge of OCR to get the data from the image and then some basic python skills to get the required values from the recognized text.

The assignment will be manually graded ( so that you cannot simply look at the image and simply do a print(billing_amount).

More information is given in the respective section.

Import Libraries.

import pytesseract

import keras_ocr

import matplotlib.pyplot as plt

import cv2

Read and display the Test Image

doc_img = cv2.imread('../resource/asnlib/publicdata/invoice.jpg', cv2.IMREAD_COLOR)

fig = plt.figure(figsize=(20, 10))

plt.imshow(doc_img[:,:,::-1])

plt.show()

Perform OCR

You need to write the code for performing OCR on the input image. You will have to apply the concepts learned in the previous sections to perform OCR on the above image.

After you perform the OCR - you have to parse the output so that you only print the email IDs and billing amount in $

# The next lists will store the outputs:

billing_amount = []

email_ids = []

###
### YOUR CODE HERE
###

# Pre-processing image for OCR, convert image to gray scale

doc_img_gray=cv2.cvtColor(doc_img,cv2.COLOR_BGR2GRAY)

# OCR with pytesseract

all_text = pytesseract.image_to_string(doc_img_gray)

# Parse to get the required values from the recognized text

# The email adresses have the character '@' and the amount has the character '$'

# Split the OCR text in lines

all_lines = all_text.splitlines() # all_lines is a list with all lines, each one with several or none words

# We have to split each line in words

all_words=[] # Empty list to sawe all the words

for line in all_lines:

words = line.split()

all_words += words

# Search for the words with special characters

for word in all_words:

if(word.find('@')!=-1):

email_ids.append(word)

elif(word.find('$')!=-1):

billing_amount.append(word)

# Print output

viernes, 11 de septiembre de 2020

Assignment 02 – Course COMPUTER VISION II: APPLICATIONS (PYTHON)

Assignment 02 – Course COMPUTER VISION II: APPLICATIONS (PYTHON)

Instructions:

Improving CNN Training

This assignment is aimed at reinforcing your knowledge about the training process for a CNN. Given below is the task:

Situation

We have given the code for training a CNN using the SGD optimizer. On running the training for 5 epochs

we found that the Training accuracy we get is only 10%.

Your Task

Your task is to improve the overall training process, so that we get higher Training accuracy.

You can change any parameter you may find suitable ( we have provided some hints in Section 4. ).

Only Section 4 is the place where you need to make changes.

The distribution of marks is as follows:

1. Training Accuracy > 30% - 5 marks

1. Training Accuracy > 50% - 10 marks

1. Training Accuracy > 65% - 15 marks

P.S. We were able to achieve 68% Training accuracy by changing a few things from the current configuration.

NOTE: You can also download the NB and run your experiments on Google Colab or Kaggle

and when you think you have got the solution, you can add the code in section 4 and submit.

The assignment carries 30 marks and you will have a total of 5 attempts. The grading will be done manually.

# 4. TODO

“””

Currently, the model and optimizer is configured such that it gives very low accuracy ~10%.

Your task is to explore options by modifying model and optimizer to get to more than 65% Training accuracy.

Here are a few hints of what changes can help increase the accuracy in just 5 epochs:

Changing the Model parameters like activation type, droupout ratio etc.

Changing the optimizer

Changing optimizer parameters

“””

What did I do?

I wrote a list of possible parameter values in each change.

Activation = [relu, sigmoid, softmax, softplus, softsign, tanh, selu, elu, exponential]

Dropout_rate = [0.2, 0.5]

Posible_keras_optimizers = [SGD, RMSprop, Adam, Adadelta, Adagrad, AdaMax, Nadam, Ftrl]

Only import Optimizer = [ SGD, Adam, Adagrad, RMSprop]

Learning_rate = [0.001, 0.0001]

Then, I did only one change each time and run the Jupiter notebook. Biggest improves in accuracy in the Optimizer.

This is my output with best accuracy.

Asignment 01 – Course COMPUTER VISION II: APPLICATIONS (PYTHON)

Asignment 01 – Course COMPUTER VISION II: APPLICATIONS (PYTHON)

Instructions

In this assignment, you will be working on Smile Detection using Dlib's facial landmarks model. You can download the video from this link and check out the video on your local system.

Task

You will have to complete the smile_detector function. In the function, you will receive an image and you have to write the logic for smile detection and return a boolean variable indicating whether the frame has a smiling person or not.

We have provided the supporting code for keeping track of the smiling frames and we will use it to compare with the ground truth. If the percentage of overlap is high enough, you will obtain full marks.

Finally, we also write the output video in a avi file which you can download and see from the jupyter dashboard whether the smile is getting detected correctly.

You can see the output video to check whether the code detects smile or not and help to debug your code.

You will have a total of 5 attempts. This assignment carries 30 marks.

Some hints:

Find the lip and/or jaw coordinates using the facial landmarks.
For a person to be smiling, the ratio of width of the lip and the jaw width should be high.
Return True if a smile is detected, else return False.

Also, When you complete the assignment, you can use your own video to generate the output and share on your social media channels for fun.

https://youtu.be/Pngwsy-Swz0

I have to write a code snippet inserted in the TODO section(s)

Landmarks numbers

The program steps:

1.- Import all the necessary libraries, in this case cv2, dlib and numpy.

2.- Initialize dlib’s face detector

3.- Create the facial landmark predictor

4.- Define the function smile_detector(image_dlib):

This function get an input image in dlib format. (color space RGB), detects faces in it.

If there is not any face, the function return false (There isn’t face à there isn’t smile)

If there is a face the function continues.

Assign False to the variable isSmiling

###

### Here is the place for our code

### YOUR CODE HERE

###

# My code explanation:

# The program find the square distance "lips_square_distance" between the

# external corners of the lips, landmarks 48 and 54.

# The program find the square distance "jaws_square_distance" between two jaw

# points in the line of the corners of the lips, landmarks 3 and 13.

# The ratio between this two values "lips_square_distance/jaws_square_distance"

# varies when someone smile.

# The distance between the points in the jaws varies very little but the distance

# between the corners of the mouth increase a lot when someone smiles.,

# another fixed distance would have been the distance between

# the eyes external corners landmarks 36 and 45.

# I looked to the numbers to find a person smiling and the ratio is greater than 0.21

# The open smile is for a ratio value greater than 0.23

If ratio greater than 0.21: isSmiling=True

### end of my code

# Return True if smile is detected

return isSmiling

# The main function is supplied

# Initialize the video capture and writer cv2 objects.

# Capture a video, in this case load a video file

# Initialize frame_number to zero and the list smile_frames to empty

# Loop the video file

# grab the next frame using capture.read()

# convert frame from BGR to RGB

# call the function smile_detector(image_dlib) and save return in frame_has_smile

# if( frame_has_smile == True):

# put text on frame and smile_frames.append(frame_number)

# after 50 frames, print the number of processed frames and the “order number” of # frames with smiles detected.

# save the processed frame

# increase frame_number in 1

# clean the buffers

# destroyAllWindows

# release capture and smileDetectionOut

jueves, 10 de septiembre de 2020

Project 3 part 2 Mask / No mask, train a Yolov4 cnn with Darknet

Project 3 part 2"Mask / No mask", trained with Darknet and Yolov4.cfg

Task 2: Train the object detector using Yolo v4 [30 Marks]

You should repeat the above experiment using the Yolo v4 architecture and model.

HINT:

1. You need to change Step 6 along with the above changes. You can use the following link as a reference for training the model using Yolo v4 architecture and pre-trained model.

https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

I wrote a notebook with the basic pipeline for train a yolov4 cnn with Darknet in Google colab, this are the steps:

1.- I created a folder in Google drive to upload the test files.

2.- Mount a google drive and check its files.

3.- Clone, configure and compile darknet.

4.- Configure yolov3.cfg file.

5.- Create names and data files and save to Google drive.

6.- Create a folder where unzip the image dataset.

7.- Create a train.txt file.

8.- Download pre-trained weights for the convolutional layers file.

9.- Start training.

10.- Object detection with the test images. ...OOPS!

I got the weights after 12 hours training with the best "activation: mist" in .cfg file

Unsuccesfully I tried to install a version of OpenCV newer than 4.1.2 in Google Colab, I didn't get the trick.

I know that OpenCV 4.3.0 supports "activation: swish" that is almost like "activation: mish", I also tried that here after I modified "yolov4_testingb.cfg".

OpenCV 4.4.0 supports "activation: mish" in function 'ReadDarknetFromCfgStream', Then I run the tests yolov4 with OpenCV 4.4.0 in my own system, here are the outputs.

I found that yolov4 takes longer to train, but is faster and more precise than yolov3.

10.- Object detection with the test images.

Look at the right, the man wearing a black mask, the mask is detected with Yolov4

Look at the girl in black, "No Mask" is detected with yolov4.

11.- Object detection with the test videos.

Youtube link: https://youtu.be/CxfY2xjaDQ0

Youtube link: https://youtu.be/42iEsn5D5JI

I wished to run all the yolov4 using "activation: mist" in one notebook at colab, I couldn’t do that because the OpenCV version in colab was 4.1.2, so I left this project aside for several days, when I went back I understood from the hint in the specifications that darknet is a polymorph program, it can do train and test on images, videos and camera stream, it draws the standard rectangles over the images and don’t need OpenCV to do that. I trained yolov3 and yolov4 again; here are some screenshots from the notebook. I changed the file image.c from darknet source files, so the rectangle on people with facemask is green and magenta(danger) with No mask.

predictions-image1.jpg

predictions-image2.jpg

predictions-image3.jpg

predictions-image4.jpg

The videos with the Darknet standard rectangle:

The prediction for test-video1:

The prediction for test-video2

miércoles, 9 de septiembre de 2020

Project 3 part 1 Mask / No mask, train a Yolov3 cnn with Darknet

Project 3 part 1 "Mask / No mask", trained with Darknet and Yolov3.cfg

Video output from test-video1 with standard darknet rectangles:

https://www.youtube.com/watch?v=7oWrAIpfl_Y&feature=youtu.be

Video output from test-video1.mp4:

Youtube link: https://www.youtube.com/watch?v=xGH5R71luBc&feature=youtu.be

Video output from test-video2 with standard darknet rectangles:

https://www.youtube.com/watch?v=m5CghY882TE&feature=youtu.be

Video output from test-video1.mp4:

Youtube link: https://www.youtube.com/watch?v=YaLWdxp-xOw&feature=youtu.be

Task 1: Train the object detector using Yolo v3 [70 Marks]

We had shared a notebook for training a custom object detector in the last section. You can use that as a reference and make relevant changes to the files and code to train the network.

HINT: You will have to make a few changes to the different files before training as given in Step 7 in the Notebook. You might also need to change some code in Step 4.

You also need to run the model on the videos given above.

HINT: You can use the following command to run the model on the video

!./darknet detector demo yolo_mask.data yolo_mask.cfg backup/yolo_mask_best.weights test-video1.mp4 -thresh .6 -out_filename out-vid1.avi -dont_show