Doodle Your Keypoints

Sketch-Based Few-Shot Keypoint Detection

¹Department of Computer Science, University of Central Florida ²SketchX, CVSSP, University of Surrey, United Kingdom

Abstract

Keypoint detection, integral to modern machine perception, faces challenges in few-shot learning, particularly when source data from the same distribution as the query is unavailable. This gap is addressed by leveraging sketches, a popular form of human expression, providing a source-free alternative. However, challenges arise in mastering cross-modal embeddings and handling user-specific sketch styles. Our proposed framework overcomes these hurdles with a prototypical setup, combined with a grid-based locator and prototypical domain adaptation. We also demonstrate success in few-shot convergence across novel keypoints and classes through extensive experiments.

Architecture

Overview of the proposed few-shot key-point detection framework that processes sketches or edgemaps in the support set and photos in the query set. It employs an encoder to extract deep feature maps followed by the derivation of keypoint embeddings through Gaussian Pooling. Support keypoint prototypes are constructed by averaging keypoint embeddings after disentangling style information through the de-stylization network. Support-query correlation is performed by a point-to-point multiplication of prototype and a query feature map, and subsequently a descriptor network formulates a query descriptor, which is used for localization by the GBL module.

The design of the de-stylization network \(Z\) to disentangle the styles fusing global context to local keypoint embeddings.

Dataset Visualization

Sample support and query for training (top) and evaluation (bottom) for all evaluation settings. The support set is composed of edgemaps of the photos obtained from edge detectors.

Results

Class	Keypoints	Methods	[email protected] on Animal Pose						[email protected] on Animal Kingdom
Class	Keypoints	Methods	Cat	Cow	Dog	Horse	Sheep	Mean	Mammal	Amphibian	Reptile	Bird	Fish	Mean
Seen	Base	B-Vanilla	54.12	39.27	44.65	45.58	37.17	44.16	22.32	21.07	18.94	21.77	18.19	20.46
		FSKD	58.95	44.61	49.53	50.21	40.47	48.75	25.52	24.42	21.81	25.73	22.24	23.94
		Proposed	67.34	49.89	56.28	56.35	45.65	55.10	31.31	29.93	28.41	30.88	27.87	29.68
	Novel	B-Vanilla	24.70	15.62	19.08	12.45	18.44	18.06	9.67	7.24	6.55	8.41	4.96	7.37
		FSKD	47.70	35.44	39.81	35.42	31.59	37.99	18.89	17.39	16.72	19.23	15.94	17.63
		Proposed	55.69	43.09	46.58	43.94	36.39	45.14	25.05	23.27	22.56	24.04	21.78	23.34
Unseen	Base	B-Vanilla	43.03	40.31	36.28	44.72	38.03	40.47	12.83	14.12	12.57	13.68	12.41	13.12
		FSKD	41.54	38.10	33.72	41.02	36.30	38.14	14.86	14.28	13.79	15.65	12.16	14.15
		Proposed	47.36	42.97	38.30	46.17	41.03	43.17	21.98	20.15	18.96	21.52	17.19	19.96
	Novel	B-Vanilla	22.71	15.56	16.92	13.58	18.18	17.39	7.26	5.12	3.93	5.69	4.08	5.22
		FSKD	36.75	35.76	32.84	32.66	31.58	33.92	10.96	9.34	9.68	11.45	8.86	10.06
		Proposed	44.42	40.13	36.91	37.77	35.77	39.00	16.48	14.62	13.76	15.91	11.33	14.42

Results on Real Sketches

Class	Keypoints	Support	[email protected] on Animal Pose
Class	Keypoints	Support	Cat	Cow	Dog	Horse	Sheep	Mean
Seen	Base	Edgemap	67.34	49.89	56.28	56.35	45.65	55.10
	Base	Sketch	66.69	45.79	55.43	56.13	43.40	53.29
	Novel	Edgemap	55.69	43.09	46.58	43.94	36.39	45.14
	Novel	Sketch	55.45	42.96	46.35	43.88	36.31	44.99
Unseen	Base	Edgemap	47.36	42.97	38.30	46.17	41.03	43.17
	Base	Sketch	45.90	42.47	37.82	45.36	40.45	42.40
	Novel	Edgemap	44.42	40.13	36.91	37.77	35.77	39.00
	Novel	Sketch	43.79	39.91	36.17	37.56	35.02	38.49

Despite being trained on synthetic sketches or edgemaps, our framework generalizes well to real sketches, achieving comparable performance.

Inference (⨯) with ground-truth (●) for base (top) and novel (bottom) keypoints with annotated sketch prompts.

Photo-based Few-shot Keypoint Detection

Class	Keypoints	Method	[email protected]
Class	Keypoints	Method	Cat	Cow	Dog	Horse	Sheep	Mean
Seen	Base	FSKD	68.66	52.70	59.24	58.53	45.04	56.83
		Ours	66.97	51.38	57.72	57.31	43.81	55.44
		Ours (MM)	80.16	61.34	73.70	67.44	57.85	68.10
	Novel	FSKD	60.84	47.78	53.44	49.21	38.47	49.95
		Ours	59.17	46.49	51.89	47.93	37.65	48.63
		Ours (MM)	67.51	49.92	59.05	53.06	43.45	54.60
Unseen	Base	FSKD	56.38	48.24	51.29	49.77	43.95	49.93
		Ours	55.67	46.94	50.47	48.21	42.88	48.83
		Ours (MM)	57.68	52.06	51.75	52.27	47.74	52.30
	Novel	FSKD	52.36	44.07	47.94	42.77	36.60	44.75
		Ours	50.88	43.34	46.67	42.52	35.19	43.72
		Ours (MM)	54.61	45.92	48.02	43.86	40.31	46.54

A quantitative comparison of the proposed method on query photos using photo only as support and both edgemap and photos (MM) as support. Additional modalities like sketch or edgemap along with photo in the support set provide additional guidance and boost performance.

BibTeX

@inproceedings{maity2025dykp, title={Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection}, author={Subhajit Maity and Ayan Kumar Bhunia and Subhadeep Koley and Pinaki Nath Chowdhury and Aneeshan Sain and Yi-Zhe Song}, booktitle={IEEE/CVF International Conference on Computer Vision (ICCV)}, year={2025}}

Doodle Your Keypoints: