Mukul Pathak
2 min readOct 30, 2023

--

Lost in Translation: The Disconnect Between User Prompts and DALL-E3 Outputs

Overview

DALL-E3 uses a ChatGPT-like model to enhance user prompts for better image generation. This mediator aims to align image generation with ethical guardrails and improve “understandability” for DALL-E3. Despite these objectives, this middle layer causes several issues including delusions of expected output, inaccurate output, and hallucinations that arise from repeated attempts to get the desired result.

Flow of How Dalle3 takes user input

Benefits/What It Aims to Do

  1. Guardrails: Ensures that the images generated align with organizational and ethical guidelines.
  2. Enhance for Understandability: Refines the prompt for better comprehension by DALL-E3.

What’s Really Happening

  1. Delusion on What’s Expected: The User Prompt is not equivalent to the GPT Enhanced-User Prompt, leading to a mismatch between expectation and result.
  2. Wrong Output: The Enhanced Prompt may miss or modify important details, causing DALL-E3 to generate unintended images.
  3. Hallucination: Repeated attempts by frustrated users force the model to make increasing errors, resulting in hallucinated or irrelevant outputs.

--

--