S5Mars: Self-Supervised and Semi-Supervised Learning for Mars Segmentation

Jiahang Zhang *     Lilang Lin *     Zejia Fan     Wenjing Wang     Jiaying Liu

* indicates equal contributions.

Wangxuan Institute of Computer Technology, Peking University, Beijing.

Figure 1. Examples for each label category (highlighted in red). Our dataset includes 9 categories with a sparse annotation style.


Deep learning has become a powerful tool for Mars exploration. Mars terrain segmentation is an important Martian vision task, which is the base of rover autonomous planning and safe driving. However, existing deep-learning-based terrain segmentation methods face two problems: one is the lack of sufficient detailed and high-confidence annotations, and the other is the over-reliance of models on annotated training data. In this paper, we address these two problems from the perspective of joint data and method design. We first present a new Mars terrain segmentation dataset which contains 6K high-resolution images and is sparsely annotated based on confidence, ensuring the high quality of labels. Then to learn from this sparse data, we propose a representation-learning-based framework for Mars terrain segmentation, including a self-supervised learning stage (for pretraining) and a semi-supervised learning stage (for fine-tuning). Specifically, for self-supervised learning, we design a multitask mechanism based on the masked image modeling (MIM) concept to emphasize the texture information of images. For semisupervised learning, since our dataset is sparsely annotated, we encourage the model to excavate the information of unlabeled area in each image by generating and utilizing pseudo-labels online. We name our dataset and method Self-Supervised and Semi-Supervised Segmentation for Mars (S5Mars). Experimental results show that our method can outperform state-of-the-art approaches and improve terrain segmentation performance by a large margin.



    title={S^5Mars: Self-Supervised and Semi-Supervised Learning for Mars Segmentation},
    author={Zhang, Jiahang and Lin, Lilang and Fan, Zejia and Wang, Wenjing and Liu, Jiaying},
    journal={arXiv preprint arXiv:2207.01200},


S5Mars dataset provides rich geomorphological data for terrain semantic segmentation, which can guide the rovers and support space research missions. The dataset includes 6,000 high-resolution images taken on the surface of Mars, by color mast camera (Mastcam) from Curiosity (MSL). The spatial resolution of RGB images in this dataset is 1200 × 1200. Our dataset is annotated at a pixel level in a deterministic sparse labeling style. There are 9 label categories, sky, ridge, soil, sand, bedrock, rock, rover, trace, and hole, respectively.

Figure 2. Numerical statistics on our S5Mars dataset. The figures show the richness of the categories contained in the image from two aspects: distribution of the number of labels and distribution of label area.


Figure 2. The framework of our method for Mars image segmentation. The framework can be divided into two stages as a whole, namely, the pre-training stage for representation learning guided by the self-supervised pretext task, and the semi-supervised fine-tuning stage based on pseudo-label fusion. In the self-supervised stage, we utilize the raw pixel value prediction and texture feature prediction on the masked image area to make the network learn the effective feature representation. In the semi-supervised fine-tuning stage, we introduce task uncertainty to generate and select high-quality pseudo-labels, making full use of the supervised information in the unlabeled area of the data.