Generative Imagery: Towards a ‘New Paradigm’ of Machine Learning-Based Image Production
Editors: Lukas R.A. Wilde, Marcel Lemmes und Klaus Sachs-Hombach
Table of Contents
Goda Plaum, Lars Grabbe und Klaus Sachs-Hombach
Lukas R.A. Wilde, Marcel Lemmes and Klaus Sachs-Hombach
By Lukas R.A. Wilde | This introduction examines whether generative imagery represents a new paradigm for image production and an emerging research field. It explores a humanities approach to machine learning-based image generation and questions posed by media studies. Rather than focusing on radical shifts in media history, it emphasizes continuities and connections. It highlights the unique aspects of generative imagery compared to photography, painting, and earlier computer-generated imagery. The ’new paradigm‘ is based on emergent or stochastic features, the interplay between immediacy-oriented and hypermediacy-oriented forms of realism, and a novel text-image relationship grounded in human language. The survey then discusses the conditions under which generative imagery should be seen as a distinct media form rather than a new technology. It suggests viewing it as a mediation within evolving socio-technological configurations that reshape agency and subject positions in contemporary media cultures, particularly between human and non-human actors. To understand the cultural distinctness, the essay proposes examining the establishment, attribution, and negotiation of cultural ‘protocols‘ within existing and emerging media forms.
By Lev Manovich | I’ve been using computer tools for art and design since 1984 and have already seen a few major visual media revolutions, including the development of desktop media software and photorealistic 3D computer graphics and animation, the rise of the web after, and later social media sites and advances in computational photography. The new AI ‘generative media’ revolution appears to be as significant as any of them. Indeed, it is possible that it is as significant as the invention of photography in the nineteenth century or the adoption of linear perspective in western art in the sixteenth. In what follows, I will discuss four aspects of AI image media that I believe are particularly significant or novel. To better understand these aspects, I situate this media within the context of visual media and human visual arts history, ranging from cave paintings to 3D computer graphics.
Generative AI and the Collective Imaginary The Technology-Guided Social Imagination in AI-Imagenesis
By Andreas Ervik | This paper explores generative AI images as new media, focusing on the questions of what these images depict, how image generation occurs, and how AI impacts the imaginary. It reflects on other forms of image production and identifies AI images as radically new, distinct from traditional methods as they lack light or brushstroke registration. However, they draw from the remains of other production forms, relying on connections between images and words as well as other forms of images as training data. AI image generators function as search engines, allowing users to enter prompts and explore the virtual potential of the latent space. Agency in AI image generation is shared between the program, platform holder, and users‘ prompts. Generative AI creates a social form of images, relying on human-created training datasets and shared on social networks. It gives rise to a ‚machinic imaginary,‘ characterized by techniques, styles, and fantasies from earlier media production. AI-generated images become part of the existing collective media imaginary. As discourse on AI images focuses on their future capabilities, the AI imaginary is filled with dreams of technological progress.
By Hannes Bajohr | The ongoing debate around machine learning focuses on ‘big’ terms like intentionality, consciousness, and intelligence; the philosophical challenge lies in more nuanced concepts. This contribution explores a limited type of meaning called “dumb meaning.” Traditionally, computers were seen as handling only syntax, their semantic abilities being limited by the “symbol grounding problem.” Since they operate with mere symbols lacking any indexical relation to the world, their understanding is restricted to empty signifiers whose meaning is ‘parasitically’ dependent on a human interpreter. This was true for classic or symbolic AI. With subsymbolic AI and neural nets, however, an artificial semantics seems possible that operates below meaning proper. I explore this limited semantics brought about by the correlation of data types by looking at two examples: the implicit knowledge of large language models and the indexical meaning of multimodal AI such as DALL·E 2.
By Amanda Wasielewski | Text-to-image generation tools, such as DALL·E, Midjourney, and Stable Diffusion, were released to the public in 2022. In their wake, communities of artists and amateurs sprang up to share prompts and images created with the help of these tools. This essay investigates two of the common quirks or issues that arise for users of these image generation platforms: the problem of representing human hands and the attendant issue of generating the desired number of any object or appendage. First, I address the issue that image generators have with generating normative human hands and how DALL·E has tried to correct this issue by only providing generations of normative human hands, even when a prompt asks for a different configuration. Secondly, I address how this hand problem is part of a larger issue in these systems where they are unable to count or reproduce the desired number of objects in a particular image, even when explicitly prompted to do so. This essay ultimately argues that these common issues indicate a deeper conundrum for large AI models: the problem of representation and the creation of meaning.
By Eryk Salvaggio | Generated images are data patterns inscribed into pictures, and close readings can reveal aspects of these image-text datasets and the human decisions behind them. Examining AI-generated images as ›infographics‹ informs a methodology, described in this paper, for the analysis of these images within a media studies framework. It proposes an analytical methodology to determine how information patterns manifest through visual representations. This methodology consists of generating a series of images of interest. It examines this sample of images as a non-linear sequence. The paper finds examples of patterns, absences, strengths, and weaknesses and connects them to structures of the underlying model and dataset. The hypothesis is extended to a broader sample. The paper offers a case study, reading to images of humans kissing created through DALL·E 2. The paper draws conclusions and presents avenues of future exploration.
By Roland Meyer | Text-to-image generators such as DALL·E 2, Midjourney, or Stable Diffusion promise to produce any image on command, thus transforming mere ekphrasis into a means of production. However, prompts should not be understood as instructions to be carried out, but rather as generative search commands that guide AI models through the stochastic spaces of possible images. A comparison can thus be drawn between text-image generators and stock photography databases. But while stock photography searches retrieve pre-existing images, prompts are used to explore latent possibilities. This, the article argues, fundamentally changes how value is attributed to individual images. AI image generation fosters the emergence of a new networked model of visual economy, one that does not rely on closed image archives as monetizable assets, but rather conceives of the entire web as a freely available resource that can be mined at scale. Whereas in the older model each image has a precisely determinable value, what DALL·E, Midjourney, and Stable Diffusion monetize is not the individual image itself, but rather ‘styles’: repeatable visual patterns derived from the aggregation and analysis of large ensembles of images.
By Jens Schröter | As has been remarked several times in the recent past, the images generated by AI systems like DALL·E, Stable Diffusion, or Midjourney have a certain surrealist quality. In the present essay I want to analyze the dreamlike quality of (at least some) AI-generated images. This dreaminess is related to Freud’s comparison of the mechanism of condensation in dreams with Galton’s composite photography, which he reflected explicitly with regard to statistics – which are also a basis of today’s AI images. The superimposition of images results at the same time in generalized images of an uncanny sameness and in a certain blurriness. Does the fascination of (at least some) AI-generated images result in their relation to a kind of statistical unconscious?
By Fabian Offert | What is the concept of history inherent in contemporary models of visual culture like CLIP and DALL·E 2? This essay argues that, counter to the corporate interests behind such models, any understanding of history facilitated by them must be heavily politicized. This, the essay contends, is a result of a significant technical dependency on traditional forms of (re-)mediation. Polemically, for CLIP and CLIP-dependent generative models, the recent past is literally black and white, and the distant past is actually made of marble. Moreover, proprietary models like DALL·E 2 are intentionally cut off from the historical record in multiple ways as they are supposed to remain politically neutral and culturally agnostic. One of the many consequences is a (visual) world in which, for instance, fascism can never return because it is, paradoxically at the same time, censored (we cannot talk about it), remediated (it is safely confined to a black-and-white media prison), and erased (from the historical record).
Fuzzy Ingenuity Creative Potentials and Mechanics of Fuzziness in Processes of Image Creation with AI-Based Text-to-Image Generators
By Erwin Feyersinger, Lukas Kohmann; and Michael Pelzer | This explorative paper focuses on fuzziness of meaning and visual representation in connection with text prompts, image results, and the mapping between them by discussing the question: How does the fuzziness inherent in artificial intelligence-based text-to-image generators such as DALL·E 2, Midjourney, or Stable Diffusion influence creative processes of image production – and how can we grasp its mechanics from a theoretical perspective? In addressing these questions, we explore three connected interdisciplinary approaches: (1) Text-to-image generators give new relevance to Hegel’s notion of language as ‘the imagination which creates signs’. They reinforce how language itself inevitably acts as a meaning-transforming system and extend the formative dimension of language with a technology-driven facet. (2) From the perspective of speech act theory, we discuss this explorative interaction with an algorithm as performative utterances. (3) In further examining the pragmatic dimension of this interaction, we discuss the creative potential arising from the visual feedback loops it includes. Following this thought, we show that the fuzzy variety of images which DALL·E 2 presents in response to one and the same text prompt contributes to a highly accelerated form of externalized visual thinking.
By Nicolle Lamerichs | Generative AI, exemplified by tools like DALL·E, Midjourney, and Stable Diffusion, is gaining popularity and impacting various industries. This essay explores the rise of generative AI from a fan and media studies perspective, focusing on its reception within fandom. Fan cultures, driven by data and new media platforms, embrace generative art as a means to create transformative works based on beloved characters and stories. Platforms like Reddit foster communities where users share generative art and exchange tips. However, ethical concerns arise in fandom, including issues of copyright, monetization, and unauthorized use of fan art as training data. The essay analyzes how artists and stakeholders discuss and regulate generative AI within their communities, such as implementing bans on AI-generated art at fan conventions. While AI enables playful interactions and inspiring outcomes, users are critical of turning generative images into a business model. The essay highlights the potential of AI in empowering artistic practice but acknowledges concerns regarding its misuse. Fandom serves as a case study to explore user engagement with the innovative potential and challenges of generative AI.
AI in Scientific Imaging Drawing on Astronomy and Nanotechnology to Illustrate Emerging Concerns About Generative Knowledge
By Konstantinos Michos | Recent advances in AI technology have enabled an unprecedented level of control over the processing of digital images. This breakthrough has sparked discussions about many potential issues, such as fake news, propaganda, the intellectual property of images, the protection of personal data, and possible threats to human creativity. Susan Sontag (2005 ) recognized the strong causal relationship involved in the creation of photographs, upon which scientific images, rely to carry data (cf. Cromey 2012). First, this essay is going to present a brief overview of the AI image generative techniques and their status within the rest of computational methodologies employed in scientific imaging. Then it will outline their implementation in two specific examples: The Black Hole image (cf. Event Horizon Telescope Collaboration 2019a-f) and medical imagery (cf., e.g., Oren et al. 2020). Finally, conclusions will be drawn regarding the epistemic validity of AI images. Considering the exponential growth of available experimental data, scientists are expected to resort to AI methods to process it quickly. An overreliance on AI lacking proper ethics will not only result in academic fraud (cf. Gu et al. 2022; Wang et al. 2022) but will also expose an uninitiated public to images where a lack of sufficient explanation can shape distorted opinions about science.
AI Body Images and the Meta-Human On the Rise of AI-generated Avatars for Mixed Realities and the Metaverse
By Pamela C. Scorzin | This paper examines the impact of AI on modern visual culture, focusing specifically on the design of AI avatars for social media, mixed reality, and the Metaverse. The term “AI imagery” encompasses a variety of AI-generated representations, including prompt engineering. Images produced by advanced AI generators such as Midjourney, DALL-E 2, and Stable Diffusion raise questions about their nature, reality, and connection to new body concepts and ideologies. As AI-generated images become more (photo-)realistic, their connection to reality and truth becomes less clear. Nevertheless, these synthetic images created from vast amounts of internet metadata are not considered fictional or unreal. Instead, they offer a unique perspective by revealing previously hidden information and sharing it through digital platforms. Consequently, generative images act as meta-images, representing a distinct form of reality in a simulated photo-realistic style (known as “promptography”) that effectively communicates with globally connected communities. Additionally, generative images also serve as operative images, creating a technology-based visual language within a vast platform network. As networked and meta-images, they are capable of constructing and narrating the ‘meta-human’.
By Jay David Bolter | As the essays in this collection demonstrate, AI generative imagery raises compelling theoretical and historical questions for media studies. One fruitful approach is to regard these AI systems as a medium rooted in the principle of remediation, because the AI models depend on vast numbers of samples of other media (painting, drawing, photography, and textual captions) scraped from the web. This algorithmic remediation is related to, but distinct from earlier forms of remix, such as hip-hop. To generate new images from the AI models, the user types in a textual prompt. The resulting text-image pairs constitute a kind of metapicture, as defined by William J.T. Mitchell in Picture Theory (1994).