How to create effective prompts for AI image generation
If you're working with AI image generation, it's important to create effective prompts that will help the AI model learn to generate images that are realistic and accurate. This guide will give you a quick overview of prompt engineering before you waste all your free trial credits. Keep in mind that this is a general guide, and there may be differences between text-to-image AI generators like DALL·E2, Stable Diffusion, and Midjourney. Therefore, not all tips might apply to the specific generative model you are using.
Without further ado let's move on to the detailed guide that explains how to come up with effective prompts for text-to-image AI generation.
What are Prompts?
A prompt is a set of instructions given to a machine learning algorithm that is used to generate a specific output. The user can provide the AI with a prompt, such as a colour or a subject, and the AI will generate a piece of art based on that prompt. For example, a prompt might be: “Generate a picture of a brown dog on a skateboard.”
Prompts are descriptions in natural language that act as an input for AI art generators, such as DALL.E, DALL.E Mini, DALL.E 2, Midjourney, Stable DIffusion, Disco Diffusion, and other machine learning models. It is the communication medium for AI art generators, also known as machine learning models.
Prompts convey your idea of what an image should contain to the text-to-image machine learning models.
Prompts can be as simple as a single line of text or they can be vague. Sometimes you can even use emojis as prompts and get the optimal output.
Where can I try my prompts or DALL.E alternatives?
If you're looking for an ultra-imaginative text-to-image AI generator, you may want to check out DALL.E and DALL.E 2. Although getting access to these generators is not the easiest, we look at similar models that are making the rounds in the world of AI. All of these models are web-based and don't require you to have a ton of CPU power.
- DALL.E
- DALL.E Mini
- DreamStudio.AI
- Midjourney - in Discord
- CLIP
- RuDALL.E - in Telegram & Discord
Prompt Basics
A good prompt must contain a noun, adjective, and verb to create an interesting subject.
- Write at least 3 to 7 words: A prompt with more than 3 words will give the AI a clear context.
- Use multiple adjectives: Multiple adjectives will infuse multiple feelings into the artwork. Eg: beautiful, realistic, colorful, massive.
- Include the name of the artist: Including an artist’s name in the prompt will mimic the style of that artist. Eg: Picasso, Vincent Van Gogh, Paul Gauguin.
- A style: If you want the art to be in a particular style, you must include styles, such as surrealism, symmetry, contemporary, minimalism, and so on.
- A computer graphics: With computer graphics, the art becomes more effective and meaningful. Eg: Octane render, Cycles, Unreal Engine, Ray tracing
- Quality: Mention the quality of the art such as low, medium, high, 4k, and 8k.
- Don't use banned words by the AI generator to avoid being banned.
Basic Anatomy of effective Prompts for AI Image Generation
When it comes to AI image generation, the prompt is everything. A good prompt can mean the difference between a realistic and accurate image, and one that looks like it was drawn by a child or a complete mess.
Generally, a text input for AI Image Generation always has the same structure, In most cases you need three ingredients:
Example:
“A squirrel – on a tree eating nut, – expressionist painting.”
“A dream of a distant galaxy, by Caspar David Friedrich, matte painting trending on artstation HQ.”
There are a few things that make up a good prompt. The input language must be clear and concise. This is because any text to image model would be mainly trained on images alt text and image descriptions from stock photography websites. If there are any ambiguity or errors in your input, the image generation will be affected.
Your input must be a complete sentence. This is to ensure that the model can learn the context of the input and generate a corresponding image.
It should also be grammatically correct. This is to avoid any confusion for the model when it is trying to generate an image.
Concrete words = All images depict the concrete thing (a squirrel).
Concrete words examples: cat, squirrel, microscope, lamp, tree, ketchup, bird, car, bike, guinea pig, seagull, spaghetti, cage, peacock, tomato, mousetrap, lantern, axe, umbrella, cupcake.
Abstract words = surprises.
Abstract words examples: hope, dream, fantasy, success, progressive, sainted, adequate, happines, sufficient, reality, intend, likability, agnosticism, ugly, determinate, dignify, standpoint, imperative, absurd.
Concrete words produce images that depict the concrete thing. Abstract words on the other hand will give you more varied results. If you want to get an accurate image of a concrete thing, you should use concrete words. All of the images in the following output depict a squirrel, so you can be sure that if you use concrete words like "squirrel" you'll get an image of a squirrel.
On the other hand, if you use abstract words, you'll get more varied results. Some text-to-image AIs are better at dealing with abstract concepts than others, so you may get an image that makes sense, or you may get an image of book covers that contain the abstract word that you specified in the title.
"Happiness, detailed image, 4K"
Write positive prompts – describe what should exist, not what is missing.
If you want to avoid having a bearded man in your image, don't write "a man without a beard." Instead, write "a cleanly shaved man." The AI takes things literally, so if something is in the prompt, it's more likely to image it.
Keep in mind that vague plural words like “cats” leave a lot of room for interpretation to the text-to-image AI. Did you mean two cats or 12 cats? Therefore, when you want multiple subjects, use plural nouns with specific numbers. However, it was reported that while DALL·E2 has no problem creating multiple subjects in a scene, it falls short in separating certain characteristics of each from each other. So it is better to stick to prompts with up to three subjects.
It is also important to know which language the model was trained on. This is because different languages can have different word order, and this can impact the image generation. For example, if a model was trained on English but the prompt is in French, the image generation may not be as accurate due to translation issues.
If you're using a text-to-image generator that has automatic translation built in, expect a lot of failures due to incorrect translation. Normally, image-generating platforms don't show you the translation of your prompt until the image is generated, which can delay the process quite a bit. I've seen some really odd translations, so it is always the best to do translation first in a separate program and then paste the translated prompt into the image to text AI.
If you're looking to get the most out of your prompt, you should phrase it in English when using one of the following; Stable Diffusion, Dalle, Dalle2, Midjourney, or Dereamstudio. The Stable Diffusion algorithm was trained on a subset of the LAION-5B database, which contains 2.3 billion English image-text pairs and 2.2 billion image-text pairs from 100+ other languages. This means you're not limited to the Western European alphabet – you can use non-Roman character sets like Arabic or Chinese, and you can even use emojis.
The country of origin of the AI can make a big difference in the output. AIs trained with data from a specific country are more likely to be familiar with the local artists from that country. So, if you're using a Russian AI, it will likely know more about Russian and Eastern European artists than it will about Western artists. However, this isn't as much of a concern if you're using an AI focused on English, since English is a global language. I highly recommend keeping this in mind and selecting the best text-to-image generator for the look that you are after.
AIs are bias: You should not forget that the image databases that an AI generator was trained on are likely to have biases. You need to be aware of them in order to get the desired output on the first try. A typical example would be a prompt "developer working in the office", such a prompt is highly unlikely to return an image of a white male working on a computer. So if you need an image that represents a more diverse workforce, you will need to fine-tune your prompt.
"A developer working in an office, photo, detailed image, 4K"
Lastly, the prompt should be specific. This is because the more specific the prompt, the more likely it is that the AI will generate an accurate image. For example, if the prompt is simply "generate an image of a dog", the AI may generate a generic image of a dog that does not look realistic. However, if the prompt is "generate an image of a golden retriever", the AI is more likely to generate a realistic image of a golden retriever.
Overall, the prompt is essential for AI image generation. A good prompt will result in a realistic and accurate image, while a bad prompt will result in an inaccurate and poor quality image.
More Examples:
"A squirrel – on a tree eating nut, – watercolour painting in pastel colours, detailed image."
Surprise - AI completely forgot about the "eating nut" part. When this happens you can try a couple more iterations or rewrite your prompt.
"A squirrel seating on a tree and eating nuts, – watercolour painting in pastel colours, detailed image, 4K"
DreamStudio.ai keeps outputting red squirrels 🐿 with green background even when being asked for an image in pastel colors. So let's be more specific and say that the squirrel should be grey.
"A grey squirrel seating on a tree and eating nuts, – watercolour painting in pastel colours, detailed image, 4K"
Sometimes AI will surprise you by creating a nice image, but nothing as expected. Here we have two subjects even though we specified that we want one squirrel. This must be due to "2matte" being mentioned. This reminds us that we need to use positive prompts – describe what should exist, not what is missing.
Plus the image looks more like a photo than a Botticelli painting.
"Cute squirrel:: wind::1 natural lighting::1 Sandro botticelli::1 white::1 --quality 2matte painting trending on Midjourney, pastel colours"
This is looking more like the expected painting of a cute squirrel by Botticelli. This proves our hypothesis that "2matte" was resulting in an output image having two squirrels.
Tips to Keep in Mind:
Here are a few tips to keep in mind when creating prompts for AI image generation:
1. Think about what kind of images you want to generate. Do you want to create realistic images, or abstract ones? Once you know what kind of images you want to generate, it will be easier to come up with appropriate prompts.
2. Consider what kind of information you want the AI to have access to. For example, if you're generating realistic images, you'll need to provide the AI with data about the scene, such as the location, lighting, and objects present. On the other hand, if you're generating abstract images, you might want to provide the AI with a list of colors, shapes, and patterns.
3. Try to be as specific as possible with your prompts. The more specific the prompt, the easier it will be for the AI model to learn and generate images that match the prompt.
4. Use different art styles like filters in your prompts. If you're looking to add a little more personality to your AI-generated images, you can try using different art styles as filters for your prompts. By doing this, you can give the AI a specific set of instructions to follow that will result in an image that's more in line with your desired aesthetic.
5. Keep the prompt simple. Complex prompts can be difficult for the AI model to understand and can lead to inaccurate image generation. The prompt should be relatively short! For example, Midjourney prompts should stay under 60 words, while prompts for DALL·E2 must stay under 400 characters.
6. Define a colour palette in your prompts. If you want more control over the final output of your image, you can try defining a colour palette in the prompt for text-to-image AI. This will allow you to choose the colours that you want your image to be, and the AI will then use those colours to generate the image.
7. Write a prompt that contains the names of multiple famous artists to get a unique style. Don't worry about whether to use "and" or a comma in your prompt or about the order of artists' names. Most text-to-image generators already handle such obvious bugs.
8. Be creative! There are no wrong answers when it comes to prompts for AI image generation. So don't be afraid to experiment and see what kind of results you can get.
9. It can be a major challenge for text-to-image AI to get relationships right. This is because AI doesn't always understand how things are connected. You can try iterating the description, changing the order of the words, repeating elements, or adding more objects. Another option is to use the style of the painting to help with the direction.
10. Be patient. It can take some time for you to understand behaviour of the AI model and to start generating images that are realistic and accurate. Don't expect perfection from the outset; give yourself time to learn and improve.
Prompt builders to try
There are many tools available to help you create prompts for text-to-image AI generators. If you're not interested in creating them manually, here is a list of some of the most popular tools.
- Midjourney Random Commands Generator
- Phraser.tech - prompt builder for multiple neural networks (AIs).
- Huggingface Midjourney prompt generator
- Text to Image prompt generator - prompt builder for artistic prompts.
Noonshot.com:
Promptomania.com:
Lexica.art - a collection of prompts and their resulting images produced with Stable Diffusion.
Here's a list of words that normally improve outputs for the text-to-image AIs.
If you're suffering from writer's block but still want to try text-to-image AI, here are a few styles, artists, and mediums that may help improve your results.
- Art styles: Abstract, Abstract expressionism, Academism, Action painting, American realism, Analytical cubism, Anime, Art Deco, Art Nouveau, Baroque, Bauhaus, Biopunk, Color Field painting, Classical realism, Conceptual art, Cubism, Cybernoir, Cyberpunk, Dada, Dark fantasy, De Stijl, Decopunk, Dieselpunk, Digital art, Expressionism, Fauvism, Futurism, Fine art, Futurism, Gothic, Impressionism, Installation art, Land art, Lyrical Abstraction, Manga, Modern art, Minimalism, Modernism, Neo-Dada, Neo-expressionism, Neoclassical, Neo-Impressionism, New realism, Nouveau Realisme, Op Art, Orphism, Photorealism, Pixel art, Pop art, Post-Impressionism, Post-minimalism, Post-painterly abstraction, Precisionism, Purism, Realism, Rococo, Romanticism, Socialist realism, Steampunk, Surrealism, Synthwave, Surrealism, Symbolism, Synchromism, Tonalism, Ukiyo-e, Video art, and Zouave.
- Paint types: Acrylic paint, Airbrush, Canvas, Cave art, Chinese painting, Coffee paint, Color field painting, Dripping paint, Fine art, Glass paint, Gouache, graffitti, Hard edge painting, Hydrodipped, Mural, Oil on canvas, Oil paint, Painting, Paper-marbling, Puffy paint, Rock art, Scroll painting, Splatter paint, Spray paint, Still-life, Street art, Tempera paint, Tibetan painting, Watercolor, Wet paint.
- Print Styles: Advertisement, Aquatint, Banner, Barcode, Block printing, Blueprint, Booklet, Business card, Collage, Coloring book, Comic book, Cyanotype, Election photo, Election poster, Etching, Graphic novel, Halftone, illuminated manuscript, illustrated-booklet, instruction manual, intaglio, Iinocut, Lithograph, Logo, Magazine, “Magic the Gathering” card, Manuscript, Map, Mezzotint, Mono printing, Movie poster, Newspaper, Newsprint, Photocollage, Photograph, Postage stamp, Poster, Product photo, Propaganda Poster, QR code, Schematic, Signage, Silver gelatin, Sticker, Storyboard, Storybook illustration, Tarot card, Visual novel, Wall decal, Woodblock print.
- Adjectives: alien, ancient, angelic, angry, anxious, athletic, award-winning, basic, beautiful, chaotic, cheerful, clean, cold, colorful, confusing, cozy, creepy, cute, depressing, detailed, dirty, disgusting, dreamy, dry, ecstatic, elderly, ethereal, evil, excited, expensive, fancy, fat, flat, flat design, flat shading, fluffy, friendly, furry, fuzzy, gloomy, good, gorgeous, greeble, hairy, happy, highly detailed, huge, hyperrealistic, impossible, incoherent, intricate, intricate maximalist, joyful, large, lonely, lucid, luminous, massive, massive scale, mature, mellow, micro, mini, minimalist, moody, morbid, mottled, muted, nano, nervous, OCD, old, ornate, otherworldly, photorealistic, plain, powerful, pretty, priceless, psychedelic, quiet, rainy, realistic, refreshing, sad, simple, sinister, sleepy, smooth, spooky, strong, surface detail
- Lighting: accent lighting, afternoon, artifical lighting, backlighting, beautiful lighting, blue hour, bright lighting, lit by candlelight, Christmas lights, cinematic lighting, colorful lighting, contre-jour, crepuscular rays, dark lighting, dawn, daylight, daytime, dim lighting, dramatic lighting, dusk, evening, film noir lighting, lit by firelight, flickering light, floodlight, fluorescent light, front lighting, global illumination, golden hour, halfrear lighting, halogen light, hard lighting, high key lighting, incandescent light, low key lighting, low lighting, moody lighting, morning, natural lighting, nighttime, noon, portrait lighting, ray tracing, ray tracing global illumination, rays of light, rays of shimmering light, realistic lighting, Rembrandt lighting, rim lighting, silhouette lighting, soft lighting, split lighting, spotlight, studio lighting, sunlight, sunrise, sunset, twilight, ultraviolet light, volumetric lighting, Xray
- Time periods: ancient Egypt, ancient Greece, ancient Rome, antique, Assyrian Empire, Aztec, Babylonian Empire, Benin Kingdom, Bronze Age, Byzantine Empire, Carolingian, Dark Ages, Edwardian, Elizabethan, Georgian, Gilded Age, Great Depression, Heian Period, Incan, Industrial Revolution, Iron Age, Maori, Mayan, medieval, Meiji Period, midcentury, Middle Ages, Ming Dynasty, Minoan, modern, Moorish, Mughal Era, Nasrid, Navajo, Neolithic, Olmec, Ottoman Empire, Paleolithic, Persian Empire, pre-Columbian, prehistoric, Qing Dynasty, Regency, Renaissance, retro, Shang Dynasty, Songhai, Stone Age, Sumerian, Tokugawa Shogunate, Tudor, Victorian, Viking, World War I, World War II, Zhou Dynasty, Zuni-Pueblo, 1100s, etc.
- Decorative Art: 3D print, amigurumi, applique, balloon modelling, balloon twisting, bas-relief, beadwork, blown glass, bone china, carved, carved ivory, carved lacquer, carving, chip-carving, claymation, cloisonne, crochet, cross stitch, diorama, embroidery, enameling, felting, fretwork, glass mosaic, ice-carving, impressionist mosaic, inlaid, intarsia, jigsaw puzzle, crochet, lacquer, lampwork, latte art, leather carving, leatherwork, marble, marquetry, micromosaic, miniature painting, modular origami, mosaic, needlework, origami, paper model, papercutting, papier-mache, photographic mosaic, Pietra dura, porcelain, pottery, puppet, puzzle, pysanky, quillwork, quilted, relief-carving, repousse, rigid origami, sand art, scrimshaw, sculpture, stained glass, statue, string-art, tapestry, tattoo, tattoo art, Venetian glass, weaving, wet-folding, whittling, wood-burning
- Rendering techniques: 3D model, 3ds Max, 500px, Arnold render, ArtStation, Blender Render, CGsociety, Cinema4D Render, CryEngine, Cycles Render, Daz 3D, DeviantArt, DirectX Render, Doughy Render, Houdini Render, Infini-D Render, KitBash3D, Luxcore Render, Marvelous Designer, MentalRay Render, OctaneRender, Optix Render, Photobashed, Photoshop, physically based render, Pixia, Quixel Megascans, Raylectron Render, Redshift Render, Sketchfab, Substance 3D, Terragen, Unreal Engine, Vray Render, Weta Digital, Zbrush Render
- Photography Styles: daguerreotype, tintype, film negative, Tri-X, Kodachrome, slide film, portra 800, Natura 1600, ilford delta 3200, polaroid, hasselblad, double exposure, multiple exposure, large format camera, wide angle lens, fisheye lens, tilt shift lens, anamorphic, lensbaby, telephoto, prime lens, f1.8, f2.8, f4, f11, f16, photoshoot, commercial, thermographic, x-ray, infrared
- Artists: William Logsdail, Beatrix Potter, Roy Lichtenstein, Richard Corben, Michaelangelo, Gerhard Richter, Bjarke Ingels, John Berkey, George Inness, Peter Andrew Jones, J.M.W. Turner, Todd McFarlane, Caravaggio, Atey Ghailan, Hirohiko Araki, Huang Guangjian, Ray Caesar, Takeshi Obata, Antoine Blanchard, Diego Velázquez, Romero Britto, Guido Borelli da Caluso, Lucas Cranach the Elder, Nele Zirnite, Bob Ross, Zdzislaw Beksinski, Glen Fabry, Jane Graverol, Krenz Cushart
- Colors: black, silver, gray, white, maroon, red, purple, pastel colours, fuchsia, green, lime, olive, yellow, navy, blue, teal, aqua
- Common phrases that can boost your results: masterpiece, trending on artstation, trending on pixiv, vivid, vibrant, geometric, intricate, high quality, high resolution, detailed, 4K