Learning Library

← Back to Library

Understanding CNNs with Simple House Example

Key Points

  • Humans recognize objects (like a house) effortlessly, but computers need specialized techniques such as convolutional neural networks (CNNs) to achieve similar object identification.
  • A CNN is a deep‑learning architecture that augments a standard artificial neural network with layers of learnable filters, making it especially good at pattern‑recognition tasks.
  • Each filter (often a small 3×3 kernel) scans across an image pixel‑by‑pixel, scoring how closely local pixel groups match the filter’s pattern (e.g., a right‑angle edge).
  • By sliding these filters over the entire image, the network builds feature maps that capture visual cues such as straight lines, enabling it to recognize varied representations of the same object (e.g., different window shapes).
  • The hierarchy of layered transformations and filter responses allows the CNN to translate low‑level pixel data into high‑level object classifications that rival human perception.

Full Transcript

# Understanding CNNs with Simple House Example **Source:** [https://www.youtube.com/watch?v=QzY57FaENXg](https://www.youtube.com/watch?v=QzY57FaENXg) **Duration:** 00:06:19 ## Summary - Humans recognize objects (like a house) effortlessly, but computers need specialized techniques such as convolutional neural networks (CNNs) to achieve similar object identification. - A CNN is a deep‑learning architecture that augments a standard artificial neural network with layers of learnable filters, making it especially good at pattern‑recognition tasks. - Each filter (often a small 3×3 kernel) scans across an image pixel‑by‑pixel, scoring how closely local pixel groups match the filter’s pattern (e.g., a right‑angle edge). - By sliding these filters over the entire image, the network builds feature maps that capture visual cues such as straight lines, enabling it to recognize varied representations of the same object (e.g., different window shapes). - The hierarchy of layered transformations and filter responses allows the CNN to translate low‑level pixel data into high‑level object classifications that rival human perception. ## Sections - [00:00:00](https://www.youtube.com/watch?v=QzY57FaENXg&t=0s) **Introducing Convolutional Neural Networks** - The speaker uses a quick drawing quiz to show how humans instantly identify objects, then explains that convolutional neural networks allow computers to achieve comparable pattern‑recognition capabilities. - [00:03:04](https://www.youtube.com/watch?v=QzY57FaENXg&t=184s) **Sliding 3×3 Filters in CNN** - The speaker explains how a convolutional neural network scans an image with 3×3 filter kernels, scores each patch, aggregates results via pooling, and builds increasingly abstract feature representations in deeper layers. - [00:06:06](https://www.youtube.com/watch?v=QzY57FaENXg&t=366s) **Video Outro Call to Action** - The presenter concludes the video by inviting viewer questions, urging likes and subscriptions, and thanking the audience. ## Full Transcript
0:00OK, pop quiz. 0:02What am I drawing? 0:06I'm going to make three 0:08predictions here. 0:10Firstly. 0:11You think at your house, you'd be 0:13right? 0:14Secondly, that that 0:16just came pretty easily to you, it 0:18was effortless. 0:18And thirdly, you're thinking 0:20that I'm not much of an artist 0:23and you'd be right on all counts 0:24there. 0:25But how can we look at this set 0:27of geometric shapes and think, 0:29Oh, how? 0:31If you live in a house, I bet it 0:33looks nothing like this. 0:34Well, that ability to perform 0:36object identification that comes so 0:38easily to us does not 0:40come so easily to a computer, 0:42but that is where we can apply 0:44something called convolutional 0:47neural networks 0:49to the problem. 0:51Now, a convolutional neural 0:54network or a. 0:56See, and and. 0:58Is a area of deep learning 1:01that specializes in pattern 1:02recognition. 1:04My name is Martin Keane, and 1:07I work in the IBM garage 1:09at IBM. 1:10Now let's take a look 1:12at how CNN works 1:14at a high level. 1:16Well, let's break it down. 1:17CNN convolutional neural network 1:20Well, let's start with the 1:21artificial neural network part. 1:24This is a standard network 1:26that consists of multiple layers 1:28that are interconnected, 1:30and each layer receives 1:32some input. 1:34Transforms that input to something 1:36else and passes an output 1:38to the next layer, that's 1:40how neural networks work and 1:42see an end is a particular 1:44part of the neural network or a 1:46section of layers that say it's 1:47these three layers here 1:50and within these layers, we have 1:52something called filters. 1:55And it's the filters that perform 1:58the pattern recognition 2:00that CNN is so good 2:02at. 2:04So let's apply 2:06this to our house example now. 2:08If this house were an actual image, 2:10it would be a series 2:12of pixels, just like any image. 2:17And if we zoom in on a particular 2:19part of this house, 2:21let's say we zoom in around here, 2:23then we would get, well, 2:26the window. 2:28And what we're saying here is that a 2:30window consists of some 2:32perfectly straight lines. 2:35Almost perfectly straight lines. 2:37But, you know, a window doesn't need 2:38to look like that window could 2:40equally look like this, and we would 2:41still say it was a window. 2:44The cool thing about CNN is 2:46that using filters. 2:47CNN could also say that these 2:49two objects represent the same 2:51thing. 2:52The way they do that, then, is 2:54through the application of these 2:55filters. So let's take a look at how 2:57that works. 2:58Now, a filter is basically 3:01just a three by three block. 3:04And within that block, we can 3:05specify a pattern to look for. 3:08So we could say, let's look 3:10for. 3:12Pattern like this, a right 3:14angle in our 3:16image. 3:17So what we do is we take this filter 3:19and it's a three by three block 3:21here. We will analyze the equivalent 3:23three by three block up here as 3:24well. 3:25So. 3:27We'll look at first of all, these 3:28first. 3:30Group of three by three pixels, 3:32and we will see how close 3:34are they to the filter 3:36shape? 3:37And we'll get that numeric score, 3:40then we will move across one, come 3:42to the right and look at the next 3:44three by three block of pixels and 3:45score how close they are to the 3:47filter shape. 3:48And we will continue to slide over 3:50or vote over all 3:52of these pixel layers until 3:54we have not every 3:57three by three block. 4:00Now, that's just for one filter. 4:01But what that will give us is an 4:03array of numbers that say how 4:04closely and the image 4:06matches filter, 4:09but we can add more filters 4:11so we could add another three by 4:12three filter here. 4:13And perhaps this one looks for a 4:15shape like this. 4:18And we could add a third filter 4:20here, and perhaps this looks 4:22for a different kind of right angle 4:24shape. 4:27If we take the numeric arrays 4:29from all of these filters and 4:31combine them together in a process 4:32called pooling, then we have 4:34a much better understanding 4:36of what is contained within 4:38this series of pixels. 4:40Now that's just the first layer 4:42of the CNN. 4:43And as we go deeper into the 4:45neural network, the filters 4:47become more abstract all they can do 4:49more. 4:51So the second 4:53layer of filters perhaps can perform 4:55tasks like basic object 4:56recognition. 4:58So we can have filters here that 5:00might be able to recognize 5:02the presence of a window 5:04or the presence of a door 5:06or the presence 5:08of a roof. 5:10And as we go deeper into the sea 5:12and into the next leg, well, maybe 5:13these filters can perform even more 5:16abstract tasks, like 5:18being able to determine whether 5:20we're looking at a house 5:22or we're looking at an apartment 5:25or whether we're looking at a 5:26skyscraper. 5:29So you can see the application 5:31of these filters increases 5:34as we go through the network and can 5:35perform more and more tasks. 5:37And that's a very high level 5:39basic overview of what CNN 5:42is. It has a ton of business 5:44applications. 5:45Think of OCR, for example, 5:47for understanding handwritten 5:48documents. 5:49Think of visual recognition 5:51and facial detection and visual 5:53search. 5:54Think of medical imagery and 5:56looking at that and determining what 5:58is being shown in an imaging scan. 6:01And even think of the fact that 6:02we can apply a CNN to perform 6:05object identification for. 6:08Body drawn houses, if 6:10you have any questions, please drop 6:12us a line below, and if you want to 6:13see more videos like this in the 6:15future, please like and subscribe. 6:18Thanks for watching.