Chris McCormick    About    Newsletter    Membership    Blog Archive

Become an NLP expert with videos & code for BERT and beyond → Join NLP Basecamp now!

UCF Lecture 01 - Introduction To Computer Vision

Lecture 01 Introduction to Computer Vision

This lecture covers the very basics of a what a digital image or digital video is. It also provides a brief overview of the main topics and applications in computer vision, without going into much detail about algorithms or implementations. I found it valuable to hear about what are currently the hot topics and research areas in computer vision.

My notes from the lecture:

  • He discusses what constitutes an image (2D array of pixels with values 0 - 255)

  • He discusses how an image is formed: projection of the a 3D object onto a 2D image plane.

  • Discusses approaches for reconstructing 3D information from 2D images.

    • Stereo - Depth information from two cameras.

    • Shading - makeup fools the human brain into giving your face a different shape.

    • Texture - Texture is a repeated pattern. You can look at distortions in the pattern to recover 3D information.

    • Shape from Motion - Looking at just a small collection of moving dots, we can make out that it’s a person based on the motion.

  • He recommends a book on computer vision by Rick Szeliski’s, a principal researcher at Microsoft research.

  • He shows a demo video of Microsoft’s Photosynth, which attempts 3D constructions of scenes from 2D images gathered on the web.

  • Shows some example applications of computer vision:

    • Mosaic - Stitching together images from a video sequence to construct a complete view of the scene.

      • One example is video from UAV tracking a car down a road, mosaic stitches together all of the images of the road to create more of a map of the area.
    • “Human Detection” - Does the frame contain a person?

    • Airplane detection

    • Face Recognition

    • Facial Expressions

    • Detecting Driver Alertness

    • Lip Reading - Our brain supplements audio with lip reading.

    • Video Surveillance and Monitoring

      • Automated Surveillance System - Detection & Tracking
    • They are working on a project for airport surveillance. Multiple high resolution cameras providing 360 degree view. Called wide-area surveillance (WAS), lots of people in the airport.

      • Homeland Security Advanced Research Project Agency - HSARPA

      • Called NONA system, couldn’t find links online though

    • UAV Surveillance

      • Currently the surveillance footage is reviewed by humans, because we don’t have the techniques to analyze these with a lot accuracy.

      • Part of the challenge is that you need to remove camera motion from the equation.

    • Unmanned Ground Vehicle (UGV) - Self driving cars.

    • Human Action Recognition - Recognizing the actions, activities that people are doing.

      • Weizmann Action Dataset - A collection of videos constituting 9 actors and 10 actions. Try to figure out which action the person is performing.
    • Accurate Image Localization - “Where Am I?”

    • Layer Based Video Composition - Remove a foreground object from a video, filling it in with background information acquired over the sequence of frames. This is used by the film industry? Also background replacement.