Generative Artificial Intelligences and Pictures Synthesis

(A Tribute to a Universal Artist?)






Jean-François COLONNA
[Contact me]

www.lactamme.polytechnique.fr

CMAP (Centre de Mathématiques APpliquées) UMR CNRS 7641, École polytechnique, Institut Polytechnique de Paris, CNRS, France

[Site Map, Help and Search [Plan du Site, Aide et Recherche]]
[The Y2K Bug [Le bug de l'an 2000]]
[Real Numbers don't exist in Computers and Floating Point Computations aren't safe. [Les Nombres Réels n'existent pas dans les Ordinateurs et les Calculs Flottants ne sont pas sûrs.]]
[Please, visit A Virtual Machine for Exploring Space-Time and Beyond, the place where you can find more than 10.000 pictures and animations between Art and Science]
(CMAP28 WWW site: this page was created on 01/06/2024 and last updated on 11/17/2024 12:44:30 -CET-)



[en français/in french]


Contents:





1 - Introduction:

In a matter of months, Generative Artificial Intelligences (GAIs) have infiltrated our daily lives. I have conducted numerous experiments, particularly with ChatGPT, Bard/Gemini and Le_ChAt. These experiments revealed that, generally, using them as reliable sources of information (in Mathematics, for example) was not always very prudent, while letting them "run free" could unleash boundless imagination upon us.

However, some of these GAIs are not confined to text production. They can also rapidly [01] generate high-quality pictures. As we will see later, this objectively demonstrates their creativity.





2 - The Generative Artificial Intelligences:

To be able to generate pictures like those presented in this document, it is necessary to undergo training using "real" data, particularly pairs {picture, description} available in large quantities on the internet [02]. Specialized formal neural networks are then used to transform, on one hand, pictures in "raster" mode [03] into a more concise [04] representation closer to their semantic content. On the other hand, a similar process is applied to descriptions, which are texts written in natural languages. The result of this processing [05] for each {picture, description} pair is a set of numbers (a "vector") stored in a massive multidimensional space known as the Semantic Space (S). The treatments applied are such that two neighboring points in S correspond to semantically close notions.

Thus, learning is, in a sense, a form of semantic compression. The exploitation of space S to generate new pictures (or texts...) can be considered naively as semantic decompression. The user-provided prompt [06] positions itself in S, and one of the closest points P defines a picture that just needs to be decompressed. It seems that a random selection is made when multiple neighbors satisfy the prompt. This likely explains why submitting the same prompt twice will yield two different but semantically close pictures.

However, as always, the devil is in the detailsand reality is certainly much more complex. Indeed, as the examples to be presented later will show, in a prompt, it is generally not a single semantic concept that is specified, but multiple ones. Procedures such as "mixing", interpolation, combination,... , must therefore be implemented.

The experiences reported below showed that in fact two GAIs had to be used: the first one actually Generative and the second one Antagonistic intended on the one hand to evaluate the quality of the productions of the first one and on the other hand to filter the content so as to avoid "inappropriate" pictures [07].





3 - Some examples of the generation of pictures (1537 on Sunday November 17 2024):

The GAIs accessible on the websites 'www.bing.com/images/create' and 'designer.microsoft.com/image-creator' were used to generate these pictures [08].

So, these are 1537 images generated by this GAI that will be presented below. In fact, more were calculated but not all are exhibited and those that were rejected were either due to personal preference or because they were too similar to others already obtained. This number (1537) may seem excessive, making it impossible to visualize all of these pictures, but this is voluntary and intended to illustrate the incredible "imaginative" power of this GAI...

Nota: For all submitted prompts, french was used and a translation into english will be provided below.


3.1 - Some examples of the generation of pictures using the prompt "La bibliothèque de Babel à la façon de X" ["The Library of Babel in the style of X"]:

With virtually infinite possibilities, I decided to limit the tests by using only one prompt chosen in such a way that it references concepts with a very low probability of being encountered together on the Internet:

"La bibliothèque de Babel à la façon de X" ["The Library of Babel in the style of X"] [09]

Where X is chosen from an arbitrary list of artists (writers, musicians, painters, sculptors,...), engineers, places,... In most cases, the same prompt was iterated multiple times, resulting in a series of pictures on a given theme (defined by X), all different (illustrating the use of randomness mentioned above, randomness that further explains the a priori impossibility of obtaining each of them again) but referencing the same concepts. Here are 1537 pictures thus obtained:







The pictures obtained in this way are undeniably breathtaking, incredible,... accurately addressing the queries. Indeed, they depict libraries full of books, but also convey the sense of infinity one experiences when reading Jorge Luis Borges's short story, all within an appropriate temporal context.



3.2 - Some examples of the generation of pictures using the prompt "Une image à la façon de X" ["A Picture in the style of X"]:

Let's simplify the prompt by using only:

"Une image à la façon de X" ["A Picture in the style of X"]

thereby giving more freedom to the GAI. Here are the pictures thus obtained:








3.3 - Some "free" examples of the generation of pictures:

And now let's use some "free" prompts...









4 - Best Of:






5 - Some Comments, Remarks and Questions:

These images unequivocally demonstrate that this GAI is capable of transforming a few words (the prompt) into coherent images of remarkable complexity in a relevant manner. Regarding those inspired by known artists, some have argued that they are merely mediocre copies that could fool no one. This might be true, but the achievement does not lie there. It resides in the digital formalization of concepts gleaned from hundreds of millions of documents on the internet. While it is indeed evident that upon closer inspection, an observant eye cannot be deceived and will immediately recognize that this image is not an unknown canvas by Rembrandt, one can't help but question whether it fits within his style and cannot be confused with this one. If I chose to direct my prompts towards art and painting in particular, it was to narrow my experiments, not to play the role of a forger. Thus, what is truly astounding is the performance of the designers of this GAI and that cannot be contested, unlike the artistic value of these images...


Once the amazement and dare I say, wonder, has subsided, a number of questions arise:




One will nevertheless note a small numbre of anomalies (but some are perhaps "voluntary"...) and for example:



At last, one will note an astonishing, unexpected convergence: the Library of Babel is practically infinite, and therefore, it is impossible to explore even a partial portion of it. Is it not the same with this GAI that seems to contain a quasi-infinite number of pictures, of which we can never see more than a minuscule fraction?

Is this GAI the Library of Babel?






6 - About Creativity and Consciousness:

Once again, it seems challenging to dispute the quality and originality of these pictures generated by this GAI (and others). There's no hesitation in asserting that it exhibits creativity! While this statement may be surprising to some, let's reflect on our own creative acts. How do we generate new ideas? Certainly not out of nothing and I see two possible origins: firstly, interaction with our environment [12], particularly through vision concerning pictures. Secondly, I am convinced that at the subconscious level, there is a continuous "mixing" of previous ideas stored in our brain, which should be viewed as a dynamic semantic space. These new tools inevitably lead us to question whether our brain is nothing more than a "mere" machine.

With these undeniable successes, do Artificial Intelligences not demonstrate intelligence in its broad sense? And if so, could they become conscious? If so, would we be aware of it? It seems that the emergence of consciousness is linked to complexity (especially in connections), but also to "external" stimulation, ensured in us (and in "higher" animals) by our five senses, and this may be what our Artificial Intelligences lack to reach this higher level of evolution.

Finally, can't these studies on Artificial Intelligences enlighten us about our own memory [13] and the production of our dreams during which, as in the pictures presented above, known or fictional characters appear in real or imaginary settings?

Do these pictures reveal us the dreams of our GAIs?






7 - Conclusion:

Undoubtedly, in the span of a few months, a threshold has been crossed. The victory of AlphaGo over Lee Sedol in the Google DeepMind Challenge Match in March 2016 already opened a breach and today, the successes of GAIs demonstrate the enormous potential of this research. What would Alan Turing have thought about it?

However, this emergence is naturally accompanied by sometimes justified fears:



But also many questions arise and for example:



But let's imagine in our living rooms wall screens exhibiting masterpieces of world painting from yesterday, today and tomorrow, that have never existed and are constantly renewed by a GAI...


So, what surprise awaits us tomorrow?

Any sufficiently advanced technology is indistinguishable from magic

Arthur Charles Clarke (1962).





8 - Some a posteriori Remarks and Questions:

Over the past few months, I have had numerous experiences with text-based AI: BaRd1, ChAtGpT1 and Le_ChAt1.

They all demonstrated, on the one hand, that these GAIs were capable of an unbridled imagination and on the other hand, that it was generally not possible to trust them when searching for reliable information (I recall in this regard the hallucinations and mathematical ramblings of ChAtGpT2 and others...).

With the arrival of image-based GAIs, it was tempting to conduct similar experiments: their results were presented above. The conclusions drawn are the same: on the one hand, an "unimaginable" imagination and on the other hand, the difficulty or even impossibility of obtaining exactly the simplest requested representations and finally, the impossibility of obtaining the same picture twice in a row.

Three criticisms were addressed to me following the establishment of this Museum of the Twenty-First Century. First, it cannot be considered Art because Art can only arise from experience (and suffering?). Second, there can be no creativity when it comes to machines. Finally, one cannot confuse these pictures with "original" works.

Let us immediately address the issue of artifacts: indeed, a problem seemingly known to the designers disrupts the hands, limbs, or faces of any characters when their size is small relative to the picture frame. This allows for distinguishing between "classic" works and those from GAIs, although some artists such as Jean-Michel Basquiat, Paul Rebeyrolle and Egon Schiele did not hesitate to do the same voluntarily.

To respond to these objections, let us examine some pictures from the collection presented above:

These few pictures, obtained almost instantly by "evoking" the names of Hieronymus Bosch, Rembrandt, Jean-Baptiste Camille Corot, Salvador Dali and Hans Ruedi Giger can obviously be easily associated with these artists. This means that the GAI, during its training, was able to formalize the style (and nightmares regarding Hans Ruedi Giger...) of the encountered artists, allowing it to subsequently create pictures in their manner. These are not mere copies of original works with some alterations or cut-and-paste jobs. No, these are indeed new pictures (I cautiously do not say "works of art") resembling in their style, colors, lights,... , old, or even very ancient pictures.

If we look closely, for example, at the picture made in the style of Rembrandt, it seems to me that one would have to be in very bad faith not to recognize the style of the painter from Leiden in the use of light, the characters and their costumes, the setting and the food, the atmosphere,... , even though it is not listed in the artist's catalog! As for these two pictures from "bad anonymous painters":

it seems to me that we have seen worse in museums and galleries...

How is this possible?

These two pictures referencing Sandro Botticelli clearly demonstrate the creative capabilities of the GAI. The locomotive in the style of Sandro Botticelli, even if it is not functional (at the level of the connecting rods in particular), features three-dimensional decorations typical of the Italian Renaissance. Moreover, its plume of smoke obviously recalls one of the artist's major works: "The Birth of Venus". As for the airplane in the style of Sandro Botticelli, it shows that the GAI has learned what an airplane is: a machine designed to transport people (hence the carriage) in the air (hence the bird wings) and equipped with means of propulsion (hence the horse). It seems to me that few creators would have imagined such an ensemble and thus, if the GAI produced this three-dimensional consistent picture, did it not exhibit creativity? The answer seems obvious to me and thus, we must question what our imagination is: could it not be "simply" the result of the constant mixing of the contents of our memory, continuously fed by our senses, making us more predictable than we think? What if these GAIs were relevant models of ourselves?

This GAI, like most others, relies on the concepts of: All this can help explain how an original image specified by a simple prompt like a cat can be obtained. But what about a more subtle prompt like an airplane in the style of Sandro Botticelli, where one sees a sort of "reinterpretation" of "airplane" as {carriage,bird,horse}? Unfortunately, this is still not enough for ME to explain:

And finally, what about the designers of these GAIs? Are they themselves surprised by the wonders obtained? On the famous site 'openai.com/index/dall-e/' one can read:

We did not anticipate that this capability would emerge and made no modifications to the neural network or training procedure to encourage it

What to conclude from this? Could it be that it works so well without us really knowing why, as is the case with Mathematics and its formidable efficiency?

And ultimately, could this not signify the emergence not of an Artificial Intelligence (AI), but of a New Intelligence (NI)?




[See all documents regarding GAIs -including this one-]






  • [01] - About twenty seconds for the given examples.

  • [02] - It generally involves processing several hundred million {picture, description} pairs, requiring the use of high-performance computing and storage servers. Particularly for formal neural networks, highly parallel NVIDIA processors are used.

  • [03] - A picture in "raster" mode can be defined by three arrays of numerical values (with horizontal and vertical dimensions matching the picture), each corresponding to the luminance of a primary color: Red, Green and Blue.

  • [04] - It is, in a way, a form of semantic compression.

  • [05] - This processing is called Embedding.

  • [06] - The prompt corresponds to the natural language query (English, for example) addressed to the GAI to describe what one wishes to obtain (a picture in this case).

  • [07] - This was seen on several occasions with Sandro Botticelli, certainly because naked bodies had been generated.

  • [08] - It is highly probable that the two sites 'www.bing.com/images/create' and 'designer.microsoft.com/image-creator' correspond to a single GAI, but with different access interfaces.

  • [09] - Jorge Luis Borges is an Argentine man of letters. In 1941, in a fascinating short story, he takes us into the universe of The Library. The narrator, one of its countless servants, reveals what it could be: made up of shelves, corridors, and endless stairs, it would actually contain all possible books printed in a single format: 410 pages, each with 40 lines of 80 characters chosen from 25 possibilities. Although finite (on the order of 101834097), the number of works surpasses comprehension, but very few, of course, contain a completely intelligible text in a certain language (and yet, they exist somewhere, but where?). And the only treasure the narrator has ever discovered in his tedious travels is a single readable yet incomprehensible sentence: Ô time your pyramids.

  • [10] - Hans Ruedi Giger is the designer of the monster and sets for the film Alien, directed in 1979 by Ridley Scott.

  • [11] - The style of certain artists of past decades is easy to formalyse as I have shown myself. This is the case with: Jean Arp, Jean-Michel Atlan, Robert et Sonia Delaunay or again Victor Vasarely. But until recently, flemish artists seemed "inaccessible" and "untouchable" to me! And it is no longer the case (see for exemple Jerôme Bosch and Pieter Bruegel der elder)...

  • [12] - Nihil est in intellectu nisi prius fuerit in sensu (Nothing exists in the mind that has not previously been felt), Saint Thomas d'Acquin.

  • [13] - For example, do we really know how faces are stored in our brain?



  • Copyright © Jean-François COLONNA, 2024-2024.
    Copyright © CMAP (Centre de Mathématiques APpliquées) UMR CNRS 7641 / École polytechnique, Institut Polytechnique de Paris, 2024-2024.