Skip to main content

Computer Vision / CV Intro

  1. 13.1. Image Augmentation
  2. 13.2. Fine-Tuning
  3. 13.3. Object Detection and Bounding Boxes
  4. 13.4. Anchor Boxes
  5. 13.5. Multiscale Object Detection
  6. 13.6. The Object Detection Dataset
  7. 13.7. Single Shot Multibox Detection
  8. 13.8. Region-based CNNs (R-CNNs)
  9. 13.9. Semantic Segmentation and the Dataset
  10. 13.10. Transposed Convolution
  11. 13.11. Fully Convolutional Networks (FCN)
  12. 13.12. Neural Style Transfer
  13. 13.13. Image Classification (CIFAR-10) on Kaggle
  14. 13.14. Dog Breed Identification (ImageNet Dogs) on Kaggle

https://d2l.ai/chapter_computer-vision/index.html

Computer vision involves analyzing patterns in visual images and reconstructing the real-world objects that produced them. The process is often broken up into two phases: feature detection and pattern recognition. Feature detection involves selecting important features of the image; pattern recognition involves discovering patterns in the features.

Content-Based Image Retrieval (CBIR) is the process of building image search engines

ANPR - Automatic Number Plate Recognition

Face Detection Concepts

Face detection locates human faces in visual media such as digital images or video. When a face is detected it has an associated position, size, and orientation; and it can be searched for landmarks such as the eyes and nose.

Here are some of the terms that we use regarding the face detection feature of ML Kit:

  • Face tracking extends face detection to video sequences. Any face that appears in a video for any length of time can be tracked from frame to frame. This means a face detected in consecutive video frames can be identified as being the same person. Note that this isn't a form offace recognition; face tracking only makes inferences based on the position and motion of the faces in a video sequence.
  • A landmark is a point of interest within a face. The left eye, right eye, and base of the nose are all examples of landmarks. ML Kit provides the ability to find landmarks on a detected face.
  • A contour is a set of points that follow the shape of a facial feature. ML Kit provides the ability to find the contours of a face.
  • Classification determines whether a certain facial characteristic is present. For example, a face can be classified by whether its eyes are open or closed, or if the face is smiling or not.

Cases

  • Perfect Light
  • Dull light
  • BW
  • Half face
  • Atoneside
  • Side face
  • Multiple face dull

https://developers.google.com/ml-kit/vision/face-detection/face-detection-concepts

Image Gradient

An image gradient is a directional change in the intensity or color in an image. The gradient of the image is one of the fundamental building blocks in image processing. For example, the Canny edge detector uses image gradient for edge detection. In graphics software for digital image editing, the term gradient or color gradient is also used for a gradual blend of color which can be considered as an even gradation from low to high values, as used from white to black in the images to the right. Another name for this is color progression.

Mathematically, the gradient of a two-variable function (here the image intensity function) at each image point is a 2D vector with the components given by the derivatives in the horizontal and vertical directions. At each image point, the gradient vector points in the direction of largest possible intensity increase, and the length of the gradient vector corresponds to the rate of change in that direction.

Since the intensity function of a digital image is only known at discrete points, derivatives of this function cannot be defined unless we assume that there is an underlying continuous intensity function which has been sampled at the image points. With some additional assumptions, the derivative of the continuous intensity function can be computed as a function on the sampled intensity function, i.e., the digital image. Approximations of these derivative functions can be defined at varying degrees of accuracy. The most common way to approximate the image gradient is to convolve an image with a kernel, such as the Sobel operator or Prewitt operator.

image

On the left, an intensity image of a cat. In the center, a gradient image in the x direction measuring horizontal change in intensity. On the right, a gradient image in the y direction measuring vertical change in intensity. Gray pixels have a small gradient; black or white pixels have a large gradient.

Calculus - Gradient

https://en.wikipedia.org/wiki/Image_gradient

Hough Transform

The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing.The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

https://en.wikipedia.org/wiki/Hough_transform

Canny Edge Detection

https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_canny/py_canny.html

Peak Signal to Noise Ratio (PSNR)

Peak signal-to-noise ratio(PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Because many signals have a very wide dynamic range, PSNR is usually expressed as a logarithmic quantity using the decibel scale.

PSNR is commonly used to quantify reconstruction quality for images and video subject to lossy compression.

https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

References