CAT + GPU = GPT
How Cat Photos Sent a Graphics Card to the Gym
Last time, we talked about AI as a giant guessing game for words.
Before that game ever existed, there was an older cousin: a system that spent years looking at pictures—especially cats—and quietly changed what computers could do.
This article is about that cousin.
It’s a story about:
A special computer part (a graphics card)
A huge pile of internet photos
A model called AlexNet
And how “cat recognition” accidentally paved the way for the tools we use now
We’ll end with a pointer back to words and notes, but we’re not going there yet.
For now, it’s just us, GPUs, and cats.
The Late-Night Problem: Too Many Images, Not Enough Muscle
Imagine you at 9:47 p.m.:
Last note of the day
Brain gently fried
EHR cursor blinking like it has all the time in the world
Now translate that feeling to early AI researchers.
Back in the early 2010s, they had:
Millions of labeled photos (of animals, objects, outdoor scenes—everything)
Promising ideas for teaching computers to recognize what was in those photos
Regular computer chips that were… tired
Asking those chips to learn from all those photos was like asking you to write all your notes for the week in one sitting, by hand, after a full day of sessions.
Technically possible. Emotionally: no.
So they did the thing all good tinkerers do.
They looked around the room and asked, “What else could help?”
Meet the GPU: The Graphics Card That Got a Promotion
If you’ve ever seen a gaming laptop or a big desktop, you’ve met a GPU—even if you didn’t know its name.
A GPU (graphics card) is:
A separate piece inside the computer
Originally designed to make games look smooth and pretty
Very good at doing lots of tiny calculations at the same time
Games need that because every frame—every shadow, every bit of movement—is made of thousands of little dots that have to be drawn quickly.
It turns out, teaching a computer to see is… kind of similar:
You take an image
You turn every tiny piece of it into numbers
You do a lot of small calculations again and again
You repeat that for millions of images until patterns start to emerge
Researchers realized:
“This graphics card is already great at moving lots of dots around.
What if we ask it to move numbers for learning instead of just for games?”
That’s the heart of the story:
a game part quietly got promoted to a learning part.
AlexNet: The System That Looked at a Lot of Pictures (Including Cats)
In 2012, three researchers—Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton—entered a big image competition called ImageNet.
Very roughly, the challenge was:
“Here are over a million labeled photos.
Can your program tell us what’s in them?”
Their system, later known as AlexNet, did a few important things:
It used two gaming-style graphics cards as its main engine.
It stacked many simple steps so the computer could slowly notice what made, say, a “cat” different from “not a cat.”
It trained on a huge variety of images—dogs, birds, furniture, landscapes… and plenty of cats.
When the results came out, AlexNet:
Did far better than other approaches at recognizing what was in the pictures
Convinced a lot of skeptical people that this “let’s give it lots of data and a strong graphics card” approach really worked
Helped start a wave of work that eventually led Geoffrey Hinton, Yann LeCun, and Yoshua Bengio to receive the Turing Award (often described as the Nobel Prize of computing)
People summarize it as:
“That’s when AI finally got good at cat photos.”
What actually happened was more like:
“We learned how to give computers enough practice,
on the right hardware,
so they could notice patterns we couldn’t hard-code ourselves.”
The cats were the cute front end of a much deeper shift.
Why We Care About This (Besides the Cat Memes)
From our side, working on Simcha, this story matters for a few reasons:
It shows that big changes can come from reusing something ordinary (a game card) in a new way.
It reminds us that constraints are helpful—AlexNet ran on just two graphics cards by today’s standards, so its creators had to be careful and thoughtful.
It’s a clear moment where we went from “computers kind of see” to “computers can actually pick patterns out of messy, real-world data.”
And now, more than a decade later:
Your phone and laptop have their own small “pattern helpers” built in.
Smaller, more efficient versions of these ideas can run on devices you already own.
The kind of power that once needed a loud desktop in a lab is inching closer to fitting quietly on your desk or in your pocket.
We’re not saying your phone is AlexNet.
We are saying the same style of thinking—let the machine practice a lot, on the right hardware—is now part of everyday technology.
Where This Series Goes Next: From Pictures to Words
In the last article, we talked about AI as a guessing game for words—something like a supercharged autocomplete that has seen a lot of sentences.
This article is the “prequel”:
Before words, there were pictures
Before note helpers, there were cat recognizers
Before language models, there were graphics cards doing image practice at scale
We’re stopping the story here on purpose.
A small team, two graphics cards, and a pile of everyday photos
showed how far practice plus the right hardware could go.
In the next article, we’ll walk across the hallway—from the “cat cousin” to the “word cousin”:
From images to sentences
From “Is this a cat?” to “Does this look like part of a useful note?”
From GPUs staring at pictures to models that help us shape words
For now, we’re staying with this simple truth:
Cats helped teach computers how to see. 🐱⚡
Next up: how the same lineage learned to write—and what that might eventually mean for the way we build notes.

