This paper demonstrates that a representation which balances natural image encoding with metabolic energy efficiency shows many similarities to the neural organisation observed in the early visual system. A simple linear model was constructed that learned receptive fields by optimally balancing information coding with metabolic expense for an entire visual field in a 2-stage visual system. The input to the model consists of a space variant retinal array of photoreceptors. Natural images were then encoded through a bottleneck such as the retinal ganglion cells that form the optic nerve. The natural images represented by the activity of retinal ganglion cells were then encoded by many more ‘cortical’ cells in a divergent representation. Qualitatively, the system learnt by optimising information coding and energy expenditure and matched (1) the centre surround organisation of retinal ganglion cells; (2) the Gabor-like organisation of cortical simple cells; (3) higher densities of receptive fields in the fovea decreasing in the periphery; (4) smaller receptive fields in the fovea increasing in size in the periphery; (5) spacing ratios of retinal cells; and (6) aspect ratios of cortical receptive fields. Quantitatively, however, there are small but significant discrepancies between density slopes which may be accounted for by taking optic blur and fixation induced image statistics into account. In addition, the model cortical receptive fields are more broadly tuned than biological cortical neurons; this may be accounted for by the computational limitation of modelling a relatively low number of neurons. This paper shows that retinal receptive field properties can be understood in terms of balancing coding with synaptic energy expenditure and cortical receptive fields with firing rate energy expenditure, and provides a sound biological explanation of why ‘sparse’ distributions are beneficial.