DALL·E and CLIP: Two AI Models That Can Recognize and Produce Texts and Images
기사입력 2021.02.08
Artificial Intelligence (AI)-powered neural networks like DALL·E and CLIP are able to classify and create images from text and perform language-based tasks.
  • OpenAI, an AI research and deployment company, released DALL·E and CLIP in early January. OpenAI is co-founded by Elon Musk and has big name investors such as Microsoft. The company’s mission is to create a safe general artificial intelligence that “benefits all of humanity”.
  • DALL·E and GPT-3/Image credit to OpenAI
    ▲ DALL·E and GPT-3/Image credit to OpenAI

    DALL·E and CLIP are both neural networks, which are a series of algorithms that mimic the neurons of a human brain to recognize relationships and patterns while analyzing sets of data. By mimicking the human brain, the neural network is able to learn through trial and error, analyzing relationships, and fed data.

    DALL·E and CLIP both use GPT-3, an OpenAI technology that stands for Generative Pre-trained Transformer 3 (3rd version). GPT-3 is a deep learning technology that can create anything with a language structure such as essays, answer questions or even write computer codes. The GPT-3 does so by transforming the input (language) and then predicting what the most useful answer (output) would be.

  • CLIP and DALL·E generating an image from text/ Image credit to OpenAI
    ▲ CLIP and DALL·E generating an image from text/ Image credit to OpenAI

    So far, DALL·E is able to create plausible images from various types of sentences. It will take the compositional and structural aspects of the sentence language and generate images. The biggest thing to take from here is that DALL·E’s success largely depends on how the text is written.

    CLIP is special in its own right because its neural network can recognize, learn, and predict correct labels of images. The real world and future implications for CLIP is tremendous, mainly because of its accuracy in object recognition, though CLIP lacks in abstract or systematic tasks such as counting the number of objects in an image.  It is not impossible for companies to use CLIP in image recognition in web searches or for pedestrian or object recognition in a self-driving car.

    For more in-depth information on DALL·E and CLIP, please refer to OpenAI’s website.