Human perception naturally decomposes a scene into objects and background. Our model, SPACE, provides a unified probabilistic modeling framework for modeling scenes with multiple objects and complex background. Combining the best of previous models (i.e. mixture-scene and spatial-attention models), SPACE can explicitly provide factorized object representation per foreground object while also decomposing background segments of complex morphology. With the proposed parallel-spatial attention, SPACE resolves the scalability problem of previous methods and thus makes the model applicable to scenes with a much larger number of objects without performance degradation.