Lost in Translation: The Disconnect Between User Prompts and DALL-E3 Outputs

Member-only story

2 min readOct 30, 2023

Lost in Translation: The Disconnect Between User Prompts and DALL-E3 Outputs

Overview

DALL-E3 uses a ChatGPT-like model to enhance user prompts for better image generation. This mediator aims to align image generation with ethical guardrails and improve “understandability” for DALL-E3. Despite these objectives, this middle layer causes several issues including delusions of expected output, inaccurate output, and hallucinations that arise from repeated attempts to get the desired result.

Benefits/What It Aims to Do

Guardrails: Ensures that the images generated align with organizational and ethical guidelines.
Enhance for Understandability: Refines the prompt for better comprehension by DALL-E3.

What’s Really Happening

Delusion on What’s Expected: The User Prompt is not equivalent to the GPT Enhanced-User Prompt, leading to a mismatch between expectation and result.
Wrong Output: The Enhanced Prompt may miss or modify important details, causing DALL-E3 to generate unintended images.
Hallucination: Repeated attempts by frustrated users force the model to make increasing errors, resulting in hallucinated or irrelevant outputs.

Lost in Translation: The Disconnect Between User Prompts and DALL-E3 Outputs

Overview

Benefits/What It Aims to Do

What’s Really Happening

Written by Mukul Pathak

No responses yet