By Andreas Ervik
Abstract: This paper explores generative AI images as new media through the central questions: What do AI-generated images show, how does image generation (imagenesis) occur, and how might AI influence notions of the imaginary? The questions are approached with theoretical reflections on other forms of image production. AI images are identified here as radically new, distinct from earlier forms of image production as they do not register light or brushstrokes. The images are, however, formed from the stylistic and media technological remains of other forms of image production, from the training material to the act of prompting – the process depends on a connection between images and words. AI image generators take the form of search engines in which users enter prompts to probe into the latent space with its virtual potential. Agency in AI imagenesis is shared between the program, the platform holder, and the users’ prompting. Generative AI is argued here as creating a uniquely social form of images, as the images are formed from training datasets comprised of human created and/or tagged images as well as shared on social networks. AI image generation is further conceptualized as giving rise to a near-infinite variability, termed a ‘machinic imaginary’. Rather than comparable to an individualized human imagination, this is a social imaginary characterized by the techniques, styles, and fantasies of earlier forms of media production. AI-generative images add themselves to and become an acquisition of the reservoirs of this already existing collective media imaginary. Since the discourse on AI images is so preoccupied with what the technology might become capable of, the AI imaginary would seem to also be filled with dreams of technological progress.
Generating New Images
As two of the early experimenting artists using DALL·E 2, Matt Dryhurst and Holly Herndon, point out: “[T]his act of conjuring artworks from language feels very very new” (Dryhurst/Herndon 2022; original emphasis). OpenAI’s DALL·E 2 is just one of several online easy-to-use artificial intelligence image generators, others including Midjourney, Stable Diffusion, Imagen, Wombo Dream, and Craiyon. Some discussions on AI image generators concern aspects of copyright on the produced images, whether or not the creations are ‘works of art’, and, if so, how they will impact the livelihood of producers of artistic images. Others focus on the tendencies of AI image generators in replicating biases and discriminatory stereotypes. These are meaningful queries into generative AI images, yet do not necessarily address the feeling of ‘newness’ that these pieces of software produce.
The newness of generative AI images will be approached here in three parts: Firstly, by considering the specific qualities of these images: What and how do the images show? Secondly, by discussing the process of AI image generation: How are these images produced? Thirdly, by reflecting on the notion of AI image generators as a form of artificial imagination: In what way could generators be considered forms of imaginations? The analysis thus moves from reflecting on our understanding of images, to considering the specific technological processes of generation (or imagenesis), to speculating into notions of human and machinic imaginaries.[1]
Figure 1:
All images accompanying this essay have been produced by using certain terms from this text as prompts for generativeAI.
This image has been generated with DALL·E 2 in March 2023 by using the prompt “a completely new kind of images”
AI Images
As is stated on the introductory page of a guidebook for DALL·E 2, “nothing you are about to see is real”, as the images shown are “photos that are not real photos”, “paintings that are not real paintings and people, places and things that do not exist” (dall·ery gall·ery 2022: 2, emphases removed from original). The reality of the images produced by DALL·E 2 is put into question by negatively comparing them with paintings and photography. AI Image generators produce images with neither the registering of light, which is central to photography, nor the brushstrokes of painting. The image generation is thus an alternative form of image-making, without lenses to capture visual reality or traces of a painterly process. AI can nevertheless produce images that look like a broad range of other forms of images: from painting and photography to CGI and medical imaging technologies. In this sense, one could say that image generators turn other image-making technologies into their content. With Marshall McLuhan (2001 [1964]), this could be considered less an innovation of AI more so than a general tendency of media; the content of a new medium is an earlier medium.
As image generators turn other forms of images into their content, they are also influenced by and can influence our perception of these forms of image-making. This can be explicated through an update of what William J.T. Mitchell presents as a central aspect of the human capacity to recognize an image as an image. Mitchell points out that identifying an image requires a paradoxical dual frame of mind in which humans utilize “an ability to see something as ‘there’ and ‘not there’ at the same time” (Mitchell 1986: 4). Humans at once see something as depicted and as a depiction. Mitchell contrasts this with what happens “[w]hen a duck responds to a decoy, or when the birds peck at the grapes in the legendary paintings of Zeuxis, they are not seeing images: they are seeing other ducks, or real grapes – the things themselves, and not images of the things” (Mitchell 1986: 4). This is not to suggest that humans have a perfect ability to maintain the dual frame of mind required to see something as an image. In discussions of photography, Roland Barthes (1981) notes that photos act as pointers, stating in a childlike manner ‘there’. People have a tendency to treat images as providing direct access to what is depicted.[2] Despite this tendency to naively consider what and who images show rather than how, humans have the capability of both looking through and at images. What image generators introduce is another layer of potential challenge in identifying what and how one looks at images.
Figure 2:
An image generated with DALL·E 2 in March 2023 by using the prompt “this photograph does not exist”
Take, for instance, AI-generated images of human faces such as those produced on the website aptly titled “This person does not exist”.[3] The images found on this site are not photographs that capture the visual features of persons located in some real-world context. They are images that photo-realistically display something that has never occurred, someone that has never existed. For viewers, the images pose a novel challenge. When looking at a photograph, one risks looking through the image to simply consider what it shows, without taking into account how the image-producing technology mediates the viewer’s relation to what is depicted. For AI created images, one also risks looking at the image as a photograph. The challenge for viewers thus becomes not only that of potentially mistaking a non-existent person for an actually existing one, but of mistaking the AI-generated image for actual photographs.
AI Imagenesis
What is unique to AI-based image generators is that they not only make older forms of media the ‘content’ of the images in the McLuhan-sense. Other image media are also vital for the process of AI imagenesis. Generative AI is made possible by a learning process in which an enormous dataset of different kinds of images has been used as training data.[4] In training, the images are gradually transformed into noise. The process is then reversed to generate images. As the website of OpenAI explains, it occurs through “a process called ‘diffusion’, which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image” (OpenAI n.d.: n.pag.). The images produced may seem like concrete solids; they may resemble photographs or some other products of traditional image production. However, they are, in fact, localized zones of coherence, drawn from a flux of potential intensities in a field of noise. The generated images themselves are not solid endpoints either, as the process can be restaged indefinitely to produce virtually infinite variations.
In the previous section, AI images were rendered as something other than processes of capture or recording, perhaps AI imagenesis might instead be considered a form of recoding of the material it has been trained with. One could conceptualize AI generated images as visualization of the data in a database, but more appropriately AI imagenesis turns the database of training images virtual.[5] The virtual is the AI’s latent space, which contains the visual connections learned from the training material, and the possibilities for generating images.
The actualization of images from the latent space is generally produced by users entering ‘prompts’. Prompts are written statements, acting as requests for the program to run its diffusion, detailing what the field of noise is supposed to coalesce into displaying. The prompts can include descriptions of motifs of varying specificity, as well as stylistic registers and media technologies to be simulated. The process thereby seems to be a continuation of what Walter Benjamin described in his influential essay on photography, in which he pointed to that particular media technology as “free[ing] the hand of the most important artistic functions which henceforth devolved only upon the eye looking into a lens” (Benjamin 2007 [1935]: 2). With AI image generators, the most important artistic functions can be freed also from the eye, requiring simply the act of typing words. The relation between text and image thereby further echoes what Benjamin notes of photography:
For the first time, captions have become obligatory. And it is clear that they have an altogether different character than the title of a painting. The directives which the captions give to those looking at pictures in illustrated magazines soon become even more explicit and more imperative in the film where the meaning of every single picture appears to be prescribed by the sequence of all preceding ones (Benjamin 2007 [1935]: 8).
The essential role of the caption is introduced with photography, but with AI image generators it becomes solidified. No longer only stabilizing the interpretation of what is seen, the caption-as-prompt is also causal for visualization. The training material is itself a dataset of captioned images from which the connections between visual properties and words are formed. In imagenesis, captions at once produce what is seen and guide viewers in what to look for when engaging with the result. This double role could be seen as part of the reason why prompts are often included when AI images are shared. The images are grounded by captions, which serves as a textual explanation. This grounding is also influenced by the tendency of image generators to offer multiple image versions in response to individual prompts. These parallel versions prescribe meaning by providing variable forms of legibility and illegibility, of convincing and unconvincing instances of the caption’s concept.
When DALL·E 2 was released in April 2022, OpenAI CEO Sam Altman tweeted “AGI is gonna be wild” (Altman 2022: n.pag.). Advances in AI image generation are not necessarily to be viewed as indications of steps taken toward a so-called artificial general intelligence (AGI) that is able to learn and perform any task humans are capable of (cf. Bennett/Maruyama 2021; Marcus et al. 2022). In response to notions of artificial intelligence in image generators, one might make counterarguments akin to John Searle (1980) in that the programs do not actually understand the relations between captions and what is visualized. One might counter such arguments with an assertion that such relations may often be fuzzy for humans as well. In his essay in this issue, Hannes Bajohr (2023) proposes that AI have a form of ‘dumb meaning’ in that the understanding consists of correlations between signs rather than what the signs refer to. What is important for generative AI is not necessarily the grand question of whether or not the program actually ‘understands’ the connection between words and images. AI image generators turn the relation between images and words into a problem with solutions that can be evaluated and improved along different parameters.
OpenAI presents the parameters for improvement in terms of caption similarity, photorealism, and diversity (cf. Ramesh et al. 2022). The former is concerned with how well the generated images match a common understanding of the relation between the prompt words used and its visual referents. Photorealism is a media technological and stylistic signifier (which CGI also often strives towards). Finally, diversity refers to how varied the results for individual prompts will be. Researchers have probed DALL·E 2 with the intent of uncovering weaknesses in the synthesis (cf. Conwell/Ullman 2022; Marcus et al. 2022). In general, research on generative capacities finds that “images in realistic style are almost always physically plausible” whereas images in “non-realistic styles conform to the particular norms of the style” (Marcus et al. 2022: 2). DALL·E 2 nevertheless has difficulties with understanding relations between objects, struggling even with the most basic spatial relations. AI image-making has made great improvements over the last decade.[6] Certain identifiers for AI imagenesis still persist, such as the commonly observed inability of AI to render hands and fingers properly (cf. Wasielewski 2023).
AI imagenesis remains dependent on human effort, yet is often framed as a fully automated process. An example is found in the June 2022 issue of the magazine Cosmopolitan. Its cover stated “Meet the World’s First Artificially Intelligent Magazine Cover”, while its second tagline played into the notion of imagenesis as automated: “And it only took 20 seconds to make it”. The second line glosses over the human work that has gone into producing the image generator, the training material, and formulating prompts. The designer of the cover, Karen X. Cheng, worked with the generator to visualize an idea of a powerful, female astronaut. In an Instagram post she later detailed the process and the multitude of decisions, discussions, attempts, and editing involved in generating the finalized prompt to produce an image that would convey the central idea: “A wide angle shot from below of a female astronaut with an athletic feminine body walking with swagger towards camera on mars in an infinite universe, synthwave digital art” (Cheng 2022: n.pag.).
Cheng’s Instagram post could be considered as much a display of artistic prowess as a strategic move by the designer to indicate the continued need for what could be termed ‘DALL·E 2 artists’ or ‘prompt poets’ who develop skills in AI image generation as additions to their repertoire of other digital imaging techniques. Prominently, a DALL·E Prompt Book (dall·ery gall·ery 2022) has been produced which offers guidance on how to inquire for specific styles, camera angles, lens types, or light conditions. The book gives the overall impression of a practical textbook in creative image production. Other resources online detail the possibilities of combining AI-generated images with tools for more-or-less automatic upscaling, for facial adjustments and other forms of editing, for adding movement to the motif, for simulating lens depth, or for adding camera movement (cf. Parsons 2022).
Figure 3:
An image generated with DALL·E 2 in March 2023 by using the prompt “Meet the World’s First Artificially Intelligent Magazine Cover” in Stable Diffusion
While the magazine cover has novelty in being a first of sorts, the details of the effort by the designer in producing the desired result point toward AI image generation adding itself as another tool for digital image production rather than outright replacing creators. The aforementioned Herndon and Dryhurst introduce a novel term to describe the process of prompting: “spawning”, which “affords artists the ability to create entirely new artworks in the style of other people from AI systems trained on their work or likeness” (Dryhurst/Herndon 2022: n. pag.). The term spawning opens for an understanding of image generation as a co-creative process between the human and the generator. It is thus a form of computational symbiogenesis in which the genesis of the images is characterized by the symbiotic relationship between technology and humans.[7] The symbiogenesis of generative AI not only includes the user and the AI, but also the platforms and the delimitations that are put on the process by its providers. An example of how platform holders can shape the process comes in the form of restrictions over the words that can be entered, which for DALL·E includes names of prominent public individuals, as well as terms connected to politics, violence, and nudity. It can also take the form of OpenAI’s implementation of techniques to preempt stereotypes in the results. This has been done by covertly adding words such as “woman” or “black” into user prompts to diversify the results. As pointed out by Fabian Offert and Theo Phan, this “did not fix the model but the user” through “literally putting words in the user’s mouth” (Offert/Phan 2022: 2).
The platforms of generative AI come with different affordances. OpenAI offers a sign-in service granting a limited number of free generation-tokens each month, and paid subscription for further use. Craiyon offers entirely free versions without sign-in requirements. Stable Diffusion can be downloaded and run on one’s own hardware. With either of these tools, the user can type prompts into something akin to a search engine. The similarities to processes of searching (and the layout of images as search results) give these tools the peculiarly familiar feel of a Google image search (cf. Meyer 2023). It also renders the process a form of searching a vast latent space of images in which the AI can seemingly endlessly come up with and vary its visualizations. Midjourney is available as a free-to-start service and then through paid subscription, using the gaming discussion service Discord. On the tool’s Discord server, user prompts take place within a seemingly endless stream of others engaged in the same activity. The experience thus becomes undeniably social, but this applies to AI imagenesis in general. AI imagenesis is made possible by training data consisting of an enormous number of images, and the generated images are often shared in social networks, entering into ecosystems of likes, re-sharing, influencers, followers, trends, and algorithmic influence. AI creates a uniquely social form of images.
AI Imaginaries
Jill Walker Rettberg (2022) links image generators to the term machine vision. A perspective on machine vision, which presents a challenge to the notion of generative imagenesis as technologies of vision, can be developed from the work of Alexander Galloway (2021). Galloway theorizes virtual cameras through discussion of real-world capturing by photo and film cameras. Whereas the photographic presents a view of something from a singular point of view, the computer camera (of, for instance, a videogame) is untied from a unified, specific location and can instead display objects that can be rotated and potentially viewed from any angle. Galloway goes on to frame cinema, with a term adopted from Gilles Deleuze, as a schizophrenic machine: “[C]inema is a schizophrenic machine with its jump cuts and multiple cameras and parallel montage” (Galloway 2021: 59). Contrary to this, the virtual camera is instead rendered gnostic: “[T]he computer is most certainly a gnostic one, promising immediate knowledge of all things at all times from all places” (Galloway 2021: 59). What is important here is that in opposition to both the schizophrenic and the gnostic visions offered by cameras and computers, AI image generators offer something entirely different again. In contrast to either ‘a view’ or ‘any view’ of what is placed in front of a recording apparatus or produced with computer graphics, image generators could be said to produce multiple versions of views of nowhere. Could AI image generators perhaps instead be conceptualized as virtual imaginaries?
Lev Manovich (2022a) has argued against a notion of AI imagination. He instead conceptualizes AI image generation as a form of media art. Manovich does this to emphasize the software’s dependence on publicly available online images as training data. Without disagreeing with Manovich, pursuing notions of AI imaginaries might be productive to form an understanding of the novelty of these image-making technologies. To start with, AI as a form of imagination might be approached through reflecting on how imagination takes place in the minds of humans. While difficult to verify empirically, the way that humans imagine tends to be framed as a mental process of visualization (cf. Mitchell 1986). The process of imagining is likely informed by what one has witnessed, comparable to how image generators are dependent on training data.
Figure 4:
Images produced by Midjourney in March 2023 by using the prompt “views of nowhere”
Comparable to how image generators turn whatever textual prompt they are given into visuals, people tend to conjure mental images as responses to constellations of words.[8] (DALL·E has been used to turn poems into visuals, which in a sense literalizes this notion of literary visual imagery, cf. Osinga 2022) And as some formulations are more suggestive for the visual imaginary than others, the image generator can offer either vague or highly detailed images based on different prompts. For Manovich, part of the reason for arguing against a notion of AI imagination is the specificity of technical and stylistic registers often used in prompts. When AI produces images, however, there is a tendency towards invention as the machine contributes to what it is prompted with. Manovich points out that the AI, in a certain sense, “‘amplifies’ your short phrase (e.g., a prompt), generating nuances, details, atmospheres, meanings, associations, and moods you did not specify – and often would never even imagine” (Manovich 2022b: n. pag.). Part of the intrigue of AI image generators may lie in the unpredictability of the results, as the program associates and interprets one’s prompts through a process that can be described as imaginary; and, similar to how human imagination is varied, image generators are capable of visualizing in a broad range of styles and media registers. Such a perspective renders the style of the image generator DeepDream, which introduces spirals of animal eyes and snouts into images, as a form of machine hallucination. More broadly, it offers a perspective on glitches and mistakes not as unconvincing or unrealistic visualizations but as indications of the different forms that machinic imaginary can take.
It would be a mistake, however, to consider image generators as processes that make it possible to share what would otherwise be occurring in individual minds, hidden from others. Mitchell complicates the common notion that people’s imagination takes the form of mental imagery: “[M]ental images don’t seem to be exclusively visual the way real pictures are; they involve all the senses. Verbal imagery, moreover, can involve all the senses, or it may involve no sensory component at all, sometimes suggesting nothing more than a recurrent abstract idea” (Mitchell 1986: 13). A lack of visual memory and imagination has become a recognized part of normal neurological diversity, termed aphantasia (cf. Dawes et al. 2020). Aphantasia highlights that, despite the etymologically close connection between the ‘imaginary’ and images, visualization is only one specific form of imagination.[9] No matter it’s privileging of the visual sense over other sensory modalities, image generators seem to be infusing machines with imagination – with the ability to conjure up and in, a certain sense, visually dream. Whether one accepts framing AI image generation as machinic imagination might be a question of whether one is also prepared to consider the characteristically human ability to mentally visualize as something that machines are capable of. One might insist on differences between the two in order to maintain the notion that the ability to imagine is an exclusively human feat. Compared to humans, one might still consider AI as lifeless, without intention or imagination. For Steven J. Frank, AI image generators give reason to question the value of human intentionality and whether it can be “faked if we can identify enough examples” (Frank 2022: 2). This leads him to provocatively state: “You search in vain for the quintessentially human but it turns out there’s an app for that”, before he back-pedals and asks: “Or is there?” (Frank 2022: 2).
The AI imaginary can be conceptualized as something beyond an externalized process of what otherwise occurs in (some) people’s minds. To rephrase Benjamin writing on film: AI image generators are an acquisition and extension of the collective imaginary (Benjamin 2007 [1935]). Our collective imaginary exists today in a feedback mechanism with media, which act at once as reservoirs and prompts for it. What humans mentally visualize and what generators produce is characterized by the techniques, styles, and fantasies of media productions. The concept of AI imagination thus need not be a way of anthropomorphizing or ascribing human attributes to a piece of software, but rather a way of describing the new technological access to and potential for influence over the collective cultural imaginary. Following from such a concept of AI imaginaries, it is unsurprising that among the most widespread usages of image generation is infusing them with characters of videogames, animation, and movie franchises in order to produce memes that can further spread and vary in social networks.[10]
Generative images are themselves ‘generative’ for the collective imaginary in another way as well: They produce excitement or concern, often imaginatively preoccupied with what AI may become capable of. Influenced by media representations of artificial intelligence, the AI imaginary seems to be filled with dreams of technological progress and how any and all aspects of culture will be fundamentally altered as a consequence of these technologies. As much as the present potential of generative AI, the imaginary is filled with desires and fears over what seems to be approaching, what could become possible through technological development. Yet the outcomes of media shifts are rarely as grandiose as our dreams, nor as easily aligned with our most optimistic aspirations or worst nightmares. The reality tends to be both, more mundane and less predictable than we imagine it.
Conclusions
What does the newness of AI-generative images consist of? This paper has reflected upon ways that our understanding of images, imagenesis, and imaginaries are shifted by generative AI. This section offers a summary of the key findings of the paper:
Views of nowhere. AI-generated images are radically distinct from other images in the sense that neither light nor brushstrokes are registered for their production, nor are they renderings of graphical computer models as is the case in video games. The images are nevertheless seeped in the stylistic remains of other image media. This leads to potential uncertainty in whether an image is, for instance, an actual photograph of a person or if both the person and the photograph is an AI-fabrication.
Symbiogenesis. In generating images, agency is shared between the prompting user, the platform holders, and the AI. Users write prompts that trigger and steer the diffusion process of AI towards actualizing the possibilities of the latent space. Platform holders can both exclude certain terms and add others without user knowledge. The AI adds to the process through imaginatively associating and interpreting prompts. Part of the novelty of and interest in AI image generators can be traced to its ease of use as well as how unpredictable the results can be.
De-skilling and re-skilling. In addition to the grand question of whether or not one can make works of art with generative AI, there are smaller, more practical challenges. Image generation no longer requires visual training in capturing or producing but can be performed by anyone as a process of formulating descriptive prompts. Among ‘prompt poets’, know-how on how to prompt in order to achieve desirable and viable results is developed and shared in order to add generative AI to toolsets of established digital image-making.
An imagenesis for our time. Generative AI is formed from networks, trained on datasets of captioned images posted online, and the generated images feed back into social networks. This makes for a uniquely social form of images. On social networks, the images are exposed to the social and algorithmic formatting of attention. In their production and function AI-generated images have the ephemeral, decontextualized quality of social network posts.
The collective media imaginary. Instead of a technology of machine vision, generative AI influence and are influenced by the machinic imaginary. The machinic imaginary is conceptualized here not foremost as externalizations of individual human imagination, but rather as a collective media imaginary that the AI adds itself to. Generative AI is at once formed by and influences this media imaginary, with prompts oriented towards media styles and franchises. Central for this imaginary is also anticipatory fantasies about what might become possible.
The drive of novelty. While the images themselves may hide the labor (involved in programming, training, and prompting) going into the process, indicators of AI imagenesis remain vital for the actual interest in these images. Such interest seems to focus on AI imagenesis as much as (or perhaps even more than) on the images themselves. From artwork to social network posts, the images are commonly presented in ways that make explicit the fact that what we see is AI-generated. This could also be taken as indication of the novelty of the technology, as people are still working out its possibilities and potential uses.
Bibliography
Altman, Sam (@sama): AGI is gonna be Wild. Tweet on Twitter. April 6, 2022. https://twitter.com/sama/status/1511735572880011272?lang=en [accessed February 24, 2023]
Bajohr, Hannes: Dumb Meaning: Machine Learning and Artificial Semantics. In: Generative Imagery: Towards a ‘New Paradigm’ of Machine Learning-Based Image Production, special-themed issue of IMAGE: The Interdisciplinary Journal of Image Sciences, 37(1), 2023, pp. 58-70
Barthes, Roland: Camera Lucida: Reflections on Photography. Translated by Richard Howard. New York [Hill & Wang] 1981 [1980]
Benjamin, Walter: The Work of Art in the Age of its Technological Reproducibility. Translated by Harry Zohn. In: Illuminations: Essays and Reflections. New York [Schocken] 2007 [1935], pp. 217-254
Bennett, Michael Timothy; Yoshihiro Maruyama: Intensional Artificial Intelligence: From Symbol Emergence to Explainable and Empathetic AI. arXiv:2104.11573. April 23, 2021. https://arxiv.org/abs/2104.11573 [accessed February 16, 2023]
Cheng, Karen X: Creating the First Ever Artificially Intelligent Magazine Cover for Cosmopolitan. Post on Instagram. June 21, 2022. https://www.instagram.com/p/CfEwohiJdXW/?hl=en [accessed February 16, 2023]
Conwell, Colin; Tomer Ullman: Testing Relational Understanding in Text-Guided Image Generation. arXiv:2208.00005. July 29, 2022. https://arxiv.org/abs/2208.00005 [accessed February 16, 2023]
dall·ery gall·ery (ed.): The DALL·E 2 Prompt Book. In: Dall·ery gall·ery: Ressources for Creative DALL·E Users. July 14, 2022. https://dallery.gallery/the-dalle-2-prompt-book/ [accessed February 2, 2023]
Dawes, Alex J.; Rebecca Keogh; Thomas Andrillon; Joel Pearson: A Cognitive Profile of Multi-Sensory Imagery, Memory and Dreaming in Aphantasia. In: Scientific Reports, 10, 2020. https://www.nature.com/articles/s41598-020-65705-7 [accessed February 16, 2023]
Dryhurst, Matt; Herndon Holly: Infinite Images and the Latent Camera. In: Herndon Dryhurst Studio. May 6, 2022, https://mirror.xyz/herndondryhurst.eth/eZG6mucl9fqU897XvJs0vUUMnm5OITpS
WN8S-6KWamY [accessed February 16, 2023]
Ervik, Andreas: Becoming Human Amid Diversions: Playful, Stupid, Cute and Funny Evolution. London [Palgrave Macmillan] 2022
Faldalen, Jon Inge: Still Einstellung: Stillmoving Imagenesis. In: Discourse, 35 (2), 2014. pp. 228-247
Frank, Steven J.: The Work of Art in an Age of Mechanical Generation. In: Leonardo, 55(4), 2022.
Galloway, Alexander: Uncomputable: Play and Politics in the Long Digital Age. London [Verso] 2021
Marcus, Gary; Ernest Davis; Scott Aaronson: A Very Preliminary Analysis of DALL-E 2. arXiv:2204.13807. April 25, 2022. https://arxiv.org/abs/2204.13807 [accessed February 16, 2023]
Manovich, Lev: A New Post with My Observations about #Midjourney. Post on Facebook. September 1, 2022a. https://www.facebook.com/softwarestudies/posts/pfbid0
aCxgn7FetRqCjkCRbHMcWhdjVMDL4Vj9v1wqHgi1ZYHumpjpd
ChocSv94JW4Jbi4l [accessed February 16, 2023]
Manovich, Lev: (#midjourney Theory Notes): 5. Image – Text Relations in AI Image Synthesis (after Roland Barthes). Post on Facebook. September 3, 2022b. https://www.facebook.com/softwarestudies/posts/pfbid02EAxtG
VyTbk5igjLRvGpZakh4yqBwsJELwbucq7KBDsS7DPJAoWAREm
WquvmVkK5ql [accessed February 16, 2023]
McLuhan, Marshall: Understanding Media: The Extensions of Man. London [Routledge] 2001 [1964]
Meyer, Roland: The New Value of the Archive: AI Image Generation and the Visual Economy of ‘Style’. In: Generative Imagery: Towards a ‘New Paradigm’ of Machine Learning-Based Image Production, special-themed issue of IMAGE: The Interdisciplinary Journal of Image Sciences, 37(1), 2023, pp. 100-111
Mitchell, William J.T.: Iconology: Image, Text, Ideology. Chicago [University of Chicago Press] 1986
Offert, Fabian: Ten Years of Image Synthesis. In: Zentralwerkstatt.org. November 10, 2022. https://zentralwerkstatt.org/blog/ten-years-of-image-synthesis [accessed February 16, 2023]
Offert, Fabian; Theo Phan: A Sign That Spells: DALL-E 2, Invisual Images and the Racial Politics of Feature Space. arXiv:2211.06323. 26.10.22. https://arxiv.org/abs/2211.06323 [accessed February 22, 2023]
OpenAI: DALL·E 2. In: OpenAI.com. No date. https://openai.com/product/dall-e-2 [accessed March 1, 2023]
Osinga, Douwe: Visualizing Poetry Using DALL-E. In: Medium.com. May 31, 2022. https://dosinga.medium.com/visualizing-poetry-using-dall-e-ff3a901a0d4e [accessed February 16, 2023]
Parsons, Guy: 12 Awesome Free Image Editing Tools to Supercharge your DALL·E Generations. In: Dall·ery gall·ery: Ressources for Creative DALL·E Users. July 28, 2022. https://dallery.gallery/free-photo-image-editing-tools-AI-dalle/ [accessed February 16, 2023]
Ramesh, Aditya; Prafulla Dhariwal; Alex Nichol; Casey chu; Mark Chen: Hierarchical Text-Conditional Image with CLIP Latents. arXiv:2204.06125. April 13, 2022. https://arxiv.org/abs/2204.06125 [accessed February 16, 2023]
Rettberg, Jill Walker: Dall-E and Human-AI Assemblages. In: jilltxt.net. June 23, 2022, https://jilltxt.net/dall-e-and-human-AI-assemblages/ [accessed February 20, 2023]
Searle, John R.: Minds, Brains and Programs. In: Behavioral and Brain Sciences, 3, 1980, pp. 417-457
Wasielewski, Amanda: “Midjourney Can’t Count”: Questions of Representation and Meaning for Text-to-Image Generators. In: Generative Imagery: Towards a ‘New Paradigm’ of Machine Learning-Based Image Production, special-themed issue of IMAGE: The Interdisciplinary Journal of Image Sciences, 37(1), 2023, pp. 71-82
Footnotes
1 The term imagenesis is coined by Faldalen 2014.
2 An example of this is the tendency people have of writing in response to images of individuals posted on social media as if they talk directly to the depicted person – even if the depicted is a pet rather than a human, cf. Ervik 2022.
3 https://thispersondoesnotexist.com/ [accessed February 16, 2023].
4 It is possible to access a small subset of the LAION training material for Stable Diffusion (about 0.5% of 2.3 billion images) here: https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/images [accessed February 20, 2023].
5 This point is emphasized by Roland Meyer’s (2023) contribution in the present special issue of IMAGE.
6 For an account of developments of AI image synthesis in the previous decade, see Offert 2022.
7 The cybernetic tradition has been framed as one of steering. Yet, following the work of Alexander Galloway (2021) in resurrecting the early artificial life pioneer Nils Aall Barricelli, I have come to frame interaction with dynamic and unpredictable computer simulation as one of symbiogenesis, cf. Ervik 2022.
8 That is, when not talking about people with aphantasia, a phenomenon I will further talk about in the next paragraph.
9 Human imagination might involve any mode of sensory responses – including sound, smell, and taste as well as tactility – in conjunction with, or instead of visualizations. For some imagination may bear no similarity yo sensations.
10 Cf. the Twitter account “Weird AI Generations” (@weirddalle), https://twitter.com/weirddalle [accessed February 16, 2023].
About this article
Copyright
This article is distributed under Creative Commons Atrribution 4.0 International (CC BY 4.0). You are free to share and redistribute the material in any medium or format. The licensor cannot revoke these freedoms as long as you follow the license terms. You must however give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. More Information under https://creativecommons.org/licenses/by/4.0/deed.en.
Citation
Andreas Ervik: Generative AI and the Collective Imaginary. The Technology-Guided Social Imagination in AI-Imagenesis. In: IMAGE. Zeitschrift für interdisziplinäre Bildwissenschaft, Band 37, 19. Jg., (1)2023, S. 42-57
ISSN
1614-0885
DOI
10.1453/1614-0885-1-2023-15450
First published online
Mai/2023