Introduction to
Computer Vision & AI

HKU 2026 Workshop

Ahnjili ZhuParris

Part 1: Intro to Computer Vision

Part 2: Intro to Generative AI & Deepfakes

Outline

Who Am I

Ex-Academic

Focusing on Medical Surveillance

AI Engineer (Computer Vision)

AI Engineer (Media Researcher)

Computational Artist

What is AI

What is AI

Artificial intelligence is the name of a whole knowledge field in which machines perform tasks that mimic human intelligence.

Why AI

Why AI

Why AI Now

Why AI Now

Introduction to CV

Computer Vision Prep

Image Acquisition & Preprocessing

  • Capture an image or video
  • Adjust image size, lighting, or cropping

Feature Extraction

CV extracts information such as edges, textures, colors, etc.

Training the AI

Model → Predictions → Mario Probability / Luigi Probability

Computer Vision Applications

Object Recognition

Segmentation

3D Rendering

The Role of Language in CV

When Machines Name Things

But whose language?

Modern CV models don't just detect — they describe. Models like CLIP and ViT-GPT2 generate captions and classify images using language.

Their vocabulary is inherited from millions of image-caption pairs scraped from the internet. The biases, assumptions, and blind spots of those captions become the model's worldview.

Example

An image captioning model describes a photo of a woman in a lab:

"A woman posing for a picture"

The machine saw woman and posing. It missed scientist, researcher, laboratory.

Labels Are Not Neutral

ImageNet's "Person" Categories

ImageNet — the dataset that powered the deep learning revolution — contained 2,832 subcategories under "person." Many were racial slurs, sexual orientations, and derogatory terms.

The act of labeling is an act of power

  • Who decides the categories?
  • What gets collapsed into a single label?
  • What is made invisible by the taxonomy?

Every label is a simplification. When a model classifies your face as "happy" or "threatening," it is mapping your pixels onto a category that someone, somewhere, decided was meaningful.

Politics of Datasets

Those who control the data control the narrative

They control Representation

What gets collected and what is ignored?

They control The Language

Define the labels, categories, and the taxonomies

They control The Outcome

They control what gets optimized. Who and what gets to use the data.

A meaningful data is a big dataset. Once datasets meet a certain scale, they become harder to audit, and therefore have no opacity.

Surveillance & Imagination

Surveillance as World Building

Data Collection

Czech artist Jakub Geltner

Machine Vision as Frozen Imagination

Violence

Frozen Imagination is Profitable

  • 1.5 million Uyghurs and Turkic Muslims
  • European & American Predictive Policing Software

Profit in 3 ways:

  • State contracts
  • Advanced Software
  • Cheap Labour

Frozen Imagination: Expression

Frozen Imagination: Expression

Frozen Imagination: Health

Imagination as Abolitionist Counter-Tech

What would tech look like if it were built from an imagination of care rather than control?

Abolitionist Counter-Tech

Not enough to ask, "Is the dataset biased?"

One must also ask:

Self-Surveillance & Beauty

Training a Cosmetic AI: World Building

Criteria

  • No Textured Skin
  • Lighter Skin
  • Smaller nose
  • Bigger Eyes
  • Slimmer or Wider Chin
  • Fuller Lips

Training Photos

  • Celebrities
  • Pre/Post Surgeries

Training a Cosmetic AI

Frozen Imagination: The Algorithmic Ideal Face

Living in the Mirror World

When Computers Fail

Is AI better than pigeons?

Is AI better than pigeons?

When Computer Vision Fails

When Computer Vision Fails

Creative Responses

CV Dazzle

CV Dazzle

Toko Kihara

How Not to Get Hit By A Self-Driving Car

Watch on Vimeo if embed is unavailable

Toko Kihara — Is this Violence? Am I too sexy?

Dries Depoorter

The Follower, 2023–2026

Capture

Paolo Cirio

Trevor Paglen & Kate Crawford

ImageNet Roulette, 2019

A web app that classified visitors' faces using ImageNet's "person" categories — exposing labels like "rape suspect", "alcoholic", and racial slurs that the dataset had quietly been using for a decade.

  • Went viral — millions of people classified themselves
  • Forced ImageNet to remove 600,000+ images from the "person" subtree
  • Proved that making the system visible is itself a form of resistance

"Classification systems are political, and they have consequences."

— Kate Crawford, Atlas of AI

Conclusion

data → labels → distribution → norms → policy

Conclusion

Data Control

What you shoot (and what you choose not to shoot) shapes class balance and representation.

Label / Metadata Control

CV models and retrieval systems are deeply influenced by text. Your captions, keywords, and metadata shape how machines interpret images.

Pipeline Influence

Photographers influence what enters the pipeline — and how it gets compressed, cropped, and filtered before a model ever sees it.

Norm-Setting

What is considered a "correct" detection, a "beautiful" image, or a "faithful" reproduction reflects photographic conventions — conventions you help define.

Direct Technical Participation

If you're working with a lab, company, or museum digitization effort, you can influence system design directly — from dataset curation to annotation guidelines to what counts as ground truth.

Questions?

Workshop

DIY Surveillance State

Tomorrow

Let's build a Surveillance State!

Part 1: Intro to Computer Vision

Part 2: Intro to Generative AI & Deepfakes

DEEPFAKE — A Video Essay

* All of the media content here are deepfakes

DEEPFAKE — A Video Essay