Computer Vision System Marries Image Recognition and Generation

Computers possess two remarkable capabilities with respect to images: They can both identify them and generate them anew. Historically, these functions have stood separate, akin to the disparate acts of a chef who is good at creating dishes (generation), and a connoisseur who is good at tasting dishes (recognition).

Yet, one can’t help but wonder: What would it take to orchestrate a harmonious union between these two distinctive capacities? Both chef and connoisseur share a common understanding in the taste of the food. Similarly, a unified vision system requires a deep understanding of the visual world.

Now, researchers in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have trained a system to infer the missing parts of an image, a task that requires deep comprehension of the image’s content. In successfully filling in the blanks, the system, known as the Masked Generative Encoder (MAGE), achieves two goals at the same time: accurately identifying images and creating new ones with striking resemblance to reality.

Blog