HKU 2026 Workshop
Ahnjili ZhuParris
Focusing on Medical Surveillance
Artificial intelligence is the name of a whole knowledge field in which machines perform tasks that mimic human intelligence.
CV extracts information such as edges, textures, colors, etc.
Model → Predictions → Mario Probability / Luigi Probability



Modern CV models don't just detect — they describe. Models like CLIP and ViT-GPT2 generate captions and classify images using language.
Their vocabulary is inherited from millions of image-caption pairs scraped from the internet. The biases, assumptions, and blind spots of those captions become the model's worldview.
An image captioning model describes a photo of a woman in a lab:
"A woman posing for a picture"
The machine saw woman and posing. It missed scientist, researcher, laboratory.
ImageNet — the dataset that powered the deep learning revolution — contained 2,832 subcategories under "person." Many were racial slurs, sexual orientations, and derogatory terms.
Every label is a simplification. When a model classifies your face as "happy" or "threatening," it is mapping your pixels onto a category that someone, somewhere, decided was meaningful.
What gets collected and what is ignored?
Define the labels, categories, and the taxonomies
They control what gets optimized. Who and what gets to use the data.
A meaningful data is a big dataset. Once datasets meet a certain scale, they become harder to audit, and therefore have no opacity.
Data Collection
Czech artist Jakub Geltner
What would tech look like if it were built from an imagination of care rather than control?
Not enough to ask, "Is the dataset biased?"
One must also ask:
The Follower, 2023–2026
Paolo Cirio
ImageNet Roulette, 2019
A web app that classified visitors' faces using ImageNet's "person" categories — exposing labels like "rape suspect", "alcoholic", and racial slurs that the dataset had quietly been using for a decade.
"Classification systems are political, and they have consequences."
— Kate Crawford, Atlas of AI
data → labels → distribution → norms → policy
What you shoot (and what you choose not to shoot) shapes class balance and representation.
CV models and retrieval systems are deeply influenced by text. Your captions, keywords, and metadata shape how machines interpret images.
Photographers influence what enters the pipeline — and how it gets compressed, cropped, and filtered before a model ever sees it.
What is considered a "correct" detection, a "beautiful" image, or a "faithful" reproduction reflects photographic conventions — conventions you help define.
If you're working with a lab, company, or museum digitization effort, you can influence system design directly — from dataset curation to annotation guidelines to what counts as ground truth.
Let's build a Surveillance State!
* All of the media content here are deepfakes