CV - Libraries / Tools
SSD - Single Shot MultiBox Detector
- Single Shot: this means that the tasks of object localization and classificationare done in asingleforward passof the network
- MultiBox: this is the name of a technique for bounding box regression developed by Szegedy et al.
- Detector: The network is an object detector that also classifies those detected objects
OpenCV (CV2)
- Most used computer vision library. Highly efficient. Facilitates real-time image processing.
https://www.youtube.com/watch?v=P4Z8_qe2Cu0
https://www.freecodecamp.org/news/opencv-full-course
https://www.freecodecamp.org/news/how-to-use-opencv-and-python-for-computer-vision-and-ai
pip install opencv-python
Functions
import cv2
cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
cv2.Canny(blur_gray, low_threshold, high_threshold)
cv2.GaussianBlur(gray,(kernel_size, kernel_size), 0)
cv2.HoughLinesP(masked_edges, rho, theta, threshold, np.array([]), min_line_length, max_line_gap)
First off, rho and theta are the distance and angular resolution of our grid in Hough space. Remember that, in Hough space, we have a grid laid out along the (Θ, ρ) axis. You need to specify rho in units of pixels and theta in units of radians.
The threshold parameter specifies the minimum number of votes (intersections in a given grid cell) a candidate line needs to have to make it into the output.
min_line_length
is the minimum length of a line (in pixels) that you will accept in the output, and max_line_gap
is the maximum distance (again, in pixels) between segments that you will allow to be connected into a single line.
MLKit Vision APIs
- Barcode scanning
- Face detection
With ML Kit's face detection API, you can detect faces in an image, identify key facial features, and get the contours of detected faces. Note that the API detects faces, it does not recognize people.
With face detection, you can get the information you need to perform tasks like embellishing selfies and portraits, or generating avatars from a user's photo. Because ML Kit can perform face detection in real time, you can use it in applications like video chat or games that respond to the player's expressions.
Key capabilities
- Recognize and locate facial features. Get the coordinates of the eyes, ears, cheeks, nose, and mouth of every face detected.
- Get the contours of facial features Get the contours of detected faces and their eyes, eyebrows, lips, and nose.
- Recognize facial expressions Determine whether a person is smiling or has their eyes closed.
- Track faces across video frames Get an identifier for each unique detected face. The identifier is consistent across invocations, so you can perform image manipulation on a particular person in a video stream.
- Process video frames in real time Face detection is performed on the device, and is fast enough to be used in real-time applications, such as video manipulation.
https://developers.google.com/ml-kit/vision/face-detection
https://github.com/ipazc/mtcnn
https://www.pyimagesearch.com/2019/03/11/liveness-detection-with-opencv
- Image labeling
- Object detection and tracking
- Text recognition
- Digital ink recognition
- Pose detection
Image Similarity API
https://deepai.org/machine-learning-model/image-similarity
Darknet
Convolutional Neural Networks
YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet)
https://github.com/pjreddie/darknet
Pillow - images/Python Imaging Library
Albumentations
Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to increase the quality of trained models. The purpose of image augmentation is to create new training samples from the existing data.
- Albumentations supports all common computer vision tasks such as classification, semantic segmentation, instance segmentation, object detection, and pose estimation.
- The library provides a simple unified API to work with all data types: images (RBG-images, grayscale images, multispectral images), segmentation masks, bounding boxes, and keypoints.
- The library contains more than 70 different augmentations to generate new training samples from the existing data.
- Albumentations is fast. We benchmark each new release to ensure that augmentations provide maximum speed.
- It works with popular deep learning frameworks such as PyTorch and TensorFlow. By the way, Albumentations is a part of the PyTorch ecosystem.
List of augmentations
Pixel-level transforms
Pixel-level transforms will change just an input image and will leave any additional targets such as masks, bounding boxes, and keypoints unchanged. The list of pixel-level transforms:
- AdvancedBlur
- Blur
- CLAHE
- ChannelDropout
- ChannelShuffle
- ChromaticAberration
- ColorJitter
- Defocus
- Downscale
- Emboss
- Equalize
- FDA
- FancyPCA
- FromFloat
- GaussNoise
- GaussianBlur
- GlassBlur
- HistogramMatching
- HueSaturationValue
- ISONoise
- ImageCompression
- InvertImg
- MedianBlur
- MotionBlur
- MultiplicativeNoise
- Normalize
- PixelDistributionAdaptation
- PlanckianJitter
- Posterize
- RGBShift
- RandomBrightnessContrast
- RandomFog
- RandomGamma
- RandomGravel
- RandomRain
- RandomShadow
- RandomSnow
- RandomSunFlare
- RandomToneCurve
- RingingOvershoot
- Sharpen
- Solarize
- Spatter
- Superpixels
- TemplateTransform
- TextImage
- ToFloat
- ToGray
- ToRGB
- ToSepia
- UnsharpMask
- ZoomBlur
Spatial-level transforms
Spatial-level transforms will simultaneously change both an input image as well as additional targets such as masks, bounding boxes, and keypoints. The following table shows which additional targets are supported by each transform.
Transform | Image | Mask | BBoxes | Keypoints |
---|---|---|---|---|
Affine | ✓ | ✓ | ✓ | ✓ |
BBoxSafeRandomCrop | ✓ | ✓ | ✓ | ✓ |
CenterCrop | ✓ | ✓ | ✓ | ✓ |
CoarseDropout | ✓ | ✓ | ✓ | ✓ |
Crop | ✓ | ✓ | ✓ | ✓ |
CropAndPad | ✓ | ✓ | ✓ | ✓ |
CropNonEmptyMaskIfExists | ✓ | ✓ | ✓ | ✓ |
D4 | ✓ | ✓ | ✓ | ✓ |
ElasticTransform | ✓ | ✓ | ✓ | ✓ |
GridDistortion | ✓ | ✓ | ✓ | ✓ |
GridDropout | ✓ | ✓ | ✓ | ✓ |
GridElasticDeform | ✓ | ✓ | ✓ | ✓ |
HorizontalFlip | ✓ | ✓ | ✓ | ✓ |
Lambda | ✓ | ✓ | ✓ | ✓ |
LongestMaxSize | ✓ | ✓ | ✓ | ✓ |
MaskDropout | ✓ | ✓ | ✓ | ✓ |
Morphological | ✓ | ✓ | ✓ | ✓ |
NoOp | ✓ | ✓ | ✓ | ✓ |
OpticalDistortion | ✓ | ✓ | ✓ | ✓ |
OverlayElements | ✓ | ✓ | ||
PadIfNeeded | ✓ | ✓ | ✓ | ✓ |
Perspective | ✓ | ✓ | ✓ | ✓ |
PiecewiseAffine | ✓ | ✓ | ✓ | ✓ |
PixelDropout | ✓ | ✓ | ✓ | ✓ |
RandomCrop | ✓ | ✓ | ✓ | ✓ |
RandomCropFromBorders | ✓ | ✓ | ✓ | ✓ |
RandomGridShuffle | ✓ | ✓ | ✓ | |
RandomResizedCrop | ✓ | ✓ | ✓ | ✓ |
RandomRotate90 | ✓ | ✓ | ✓ | ✓ |
RandomScale | ✓ | ✓ | ✓ | ✓ |
RandomSizedBBoxSafeCrop | ✓ | ✓ | ✓ | ✓ |
RandomSizedCrop | ✓ | ✓ | ✓ | ✓ |
Resize | ✓ | ✓ | ✓ | ✓ |
Rotate | ✓ | ✓ | ✓ | ✓ |
SafeRotate | ✓ | ✓ | ✓ | ✓ |
ShiftScaleRotate | ✓ | ✓ | ✓ | ✓ |
SmallestMaxSize | ✓ | ✓ | ✓ | ✓ |
Transpose | ✓ | ✓ | ✓ | ✓ |
VerticalFlip | ✓ | ✓ | ✓ | ✓ |
XYMasking | ✓ | ✓ | ✓ | ✓ |