Introduction to
Computer Vision & AI

HKU 2026 Workshop

Ahnjili ZhuParris

Part 1: Intro to Computer Vision

Part 2: Intro to Generative AI & Deepfakes

Outline

Who Am I
What is AI
Introduction to CV
The Role of Language in CV
Politics of Datasets

Surveillance & Imagination
Self-Surveillance
When Machines Fail
Creative Responses
Conclusion

Who Am I

Ex-Academic

Focusing on Medical Surveillance

AI Engineer (Computer Vision)

AI Engineer (Media Researcher)

Computational Artist

What is AI

Artificial intelligence is the name of a whole knowledge field in which machines perform tasks that mimic human intelligence.

Why AI

Easily identify trends and patterns within multi-dimensional and multi-variety data
Limited human intervention needed
Continuous improvement

Why AI Now

Easily identify trends and patterns within multi-dimensional data
Limited human intervention needed
Continuous improvement

Introduction to CV

Computer Vision Prep

Image Acquisition & Preprocessing

Capture an image or video
Adjust image size, lighting, or cropping

Feature Extraction

CV extracts information such as edges, textures, colors, etc.

Training the AI

Model → Predictions → Mario Probability / Luigi Probability

Computer Vision Applications

Object Recognition

Segmentation

3D Rendering

The Role of Language in CV

When Machines Name Things

But whose language?

Modern CV models don't just detect — they describe. Models like CLIP and ViT-GPT2 generate captions and classify images using language.

Their vocabulary is inherited from millions of image-caption pairs scraped from the internet. The biases, assumptions, and blind spots of those captions become the model's worldview.

Example

An image captioning model describes a photo of a woman in a lab:

"A woman posing for a picture"

The machine saw woman and posing. It missed scientist, researcher, laboratory.

Labels Are Not Neutral

ImageNet's "Person" Categories

ImageNet — the dataset that powered the deep learning revolution — contained 2,832 subcategories under "person." Many were racial slurs, sexual orientations, and derogatory terms.

The act of labeling is an act of power

Who decides the categories?
What gets collapsed into a single label?
What is made invisible by the taxonomy?

Every label is a simplification. When a model classifies your face as "happy" or "threatening," it is mapping your pixels onto a category that someone, somewhere, decided was meaningful.

Politics of Datasets

Those who control the data control the narrative

They control Representation

What gets collected and what is ignored?

They control The Language

Define the labels, categories, and the taxonomies

They control The Outcome

They control what gets optimized. Who and what gets to use the data.

A meaningful data is a big dataset. Once datasets meet a certain scale, they become harder to audit, and therefore have no opacity.

Surveillance & Imagination

Surveillance as World Building

Data Collection

Czech artist Jakub Geltner

Machine Vision as Frozen Imagination

Violence

Frozen Imagination is Profitable

1.5 million Uyghurs and Turkic Muslims
European & American Predictive Policing Software

Profit in 3 ways:

State contracts
Advanced Software
Cheap Labour

Frozen Imagination: Expression

Frozen Imagination: Health

Imagination as Abolitionist Counter-Tech

What would tech look like if it were built from an imagination of care rather than control?

Refusal
Redirection
Re-imaging sensing itself

Abolitionist Counter-Tech

Not enough to ask, "Is the dataset biased?"

One must also ask:

Who is presumed dangerous, border-crossing, or fraudulent?
Who is allowed opacity, privacy and anonymity?
What non-machine vision methods of sensing were ruled out before the project began?

Self-Surveillance & Beauty

Training a Cosmetic AI: World Building

Criteria

No Textured Skin
Lighter Skin
Smaller nose
Bigger Eyes
Slimmer or Wider Chin
Fuller Lips

Training Photos

Celebrities
Pre/Post Surgeries

Training a Cosmetic AI

Frozen Imagination: The Algorithmic Ideal Face

Living in the Mirror World

When Computers Fail

Is AI better than pigeons?

When Computer Vision Fails

Creative Responses

CV Dazzle

Toko Kihara

How Not to Get Hit By A Self-Driving Car

Watch on Vimeo if embed is unavailable

Toko Kihara — Is this Violence? Am I too sexy?

Dries Depoorter

The Follower, 2023–2026

Capture

Paolo Cirio

Trevor Paglen & Kate Crawford

ImageNet Roulette, 2019

A web app that classified visitors' faces using ImageNet's "person" categories — exposing labels like "rape suspect", "alcoholic", and racial slurs that the dataset had quietly been using for a decade.

Went viral — millions of people classified themselves
Forced ImageNet to remove 600,000+ images from the "person" subtree
Proved that making the system visible is itself a form of resistance

"Classification systems are political, and they have consequences."

— Kate Crawford, Atlas of AI

Conclusion

data → labels → distribution → norms → policy

Conclusion

Data Control

What you shoot (and what you choose not to shoot) shapes class balance and representation.

Label / Metadata Control

CV models and retrieval systems are deeply influenced by text. Your captions, keywords, and metadata shape how machines interpret images.

Pipeline Influence

Photographers influence what enters the pipeline — and how it gets compressed, cropped, and filtered before a model ever sees it.

Norm-Setting

What is considered a "correct" detection, a "beautiful" image, or a "faithful" reproduction reflects photographic conventions — conventions you help define.

Direct Technical Participation

If you're working with a lab, company, or museum digitization effort, you can influence system design directly — from dataset curation to annotation guidelines to what counts as ground truth.

Questions?

Workshop

DIY Surveillance State

Today

Tomorrow

Let's build a Surveillance State!

Part 1: Intro to Computer Vision

Part 2: Intro to Generative AI & Deepfakes

DEEPFAKE — A Video Essay

* All of the media content here are deepfakes

Introduction toComputer Vision & AI

Part 1: Intro to Computer Vision

Part 2: Intro to Generative AI & Deepfakes

Outline

Who Am I

Ex-Academic

AI Engineer (Computer Vision)

AI Engineer (Media Researcher)

Computational Artist

What is AI

What is AI

Why AI

Why AI

Why AI Now

Why AI Now

Introduction to CV

Computer Vision Prep

Image Acquisition & Preprocessing

Feature Extraction

Training the AI

Computer Vision Applications

Object Recognition

Segmentation

3D Rendering

The Role of Language in CV

When Machines Name Things

But whose language?

Example

Labels Are Not Neutral

ImageNet's "Person" Categories

The act of labeling is an act of power

Politics of Datasets

Those who control the data control the narrative

They control Representation

They control The Language

They control The Outcome

Surveillance & Imagination

Surveillance as World Building

Machine Vision as Frozen Imagination

Violence

Frozen Imagination is Profitable

Profit in 3 ways:

Frozen Imagination: Expression

Frozen Imagination: Expression

Frozen Imagination: Health

Imagination as Abolitionist Counter-Tech

Abolitionist Counter-Tech

Self-Surveillance & Beauty

Training a Cosmetic AI: World Building

Criteria

Training Photos

Training a Cosmetic AI

Frozen Imagination: The Algorithmic Ideal Face

Living in the Mirror World

When Computers Fail

Is AI better than pigeons?

Is AI better than pigeons?

When Computer Vision Fails

When Computer Vision Fails

Creative Responses

CV Dazzle

CV Dazzle

Toko Kihara

Toko Kihara — Is this Violence? Am I too sexy?

Dries Depoorter

Capture

Trevor Paglen & Kate Crawford

Conclusion

Conclusion

Data Control

Label / Metadata Control

Pipeline Influence

Norm-Setting

Direct Technical Participation

Questions?

Workshop

DIY Surveillance State

Today

Tomorrow

Part 1: Intro to Computer Vision

Introduction to
Computer Vision & AI