The dominant framework inputs one or two groups of images manually annotated with pixel-level and category labels (i.e. Apple and Horse), and then uses these supervisory signals to train a model in a ...