2 min readOct 30, 2023
Lost in Translation: The Disconnect Between User Prompts and DALL-E3 Outputs
Overview
DALL-E3 uses a ChatGPT-like model to enhance user prompts for better image generation. This mediator aims to align image generation with ethical guardrails and improve “understandability” for DALL-E3. Despite these objectives, this middle layer causes several issues including delusions of expected output, inaccurate output, and hallucinations that arise from repeated attempts to get the desired result.
Benefits/What It Aims to Do
- Guardrails: Ensures that the images generated align with organizational and ethical guidelines.
- Enhance for Understandability: Refines the prompt for better comprehension by DALL-E3.
What’s Really Happening
- Delusion on What’s Expected: The User Prompt is not equivalent to the GPT Enhanced-User Prompt, leading to a mismatch between expectation and result.
- Wrong Output: The Enhanced Prompt may miss or modify important details, causing DALL-E3 to generate unintended images.
- Hallucination: Repeated attempts by frustrated users force the model to make increasing errors, resulting in hallucinated or irrelevant outputs.