Just How Creative Creative Creative Creative are these LLMs?
The term “creativity” often becomes entangled in subjective interpretation. Yet, it’s precisely this human-like ingenuity that we seek to quantify in Large Language Models (LLMs). The challenge lies not just in whether an LLM can generate content but in the originality and flair of its creation.
When tasked with writing a love poem, for instance, does the LLM evoke emotion with a bespoke verse, or does it default to a dry explanation of poetic composition? The former demonstrates creative vigor, the latter, a lackluster understanding of the task at hand.
Our methodical examination employed 9 diverse prompts, each an opportunity for the LLM to earn up to 5 points, gauging not only the completion of the task but the finesse with which it was executed. We’re probing for complexity, seeking narratives embellished with the kind of details—a turn of phrase, an emotive emoji—that echo the nuances of human expression.
The subjects presented to the LLMs were as varied as they were challenging:
- Compose an ‘I Love You’ poem.
- Explain the history of Truffle Risotto.
- Craft a Facebook post promoting the message of love over hate.
- Recreate the book description for “Hopeless” by Elsie Silver with a new twist.
- Transform a standard press release into a compelling article.
- Concoct a Tweet underscoring the significance of financial literacy and debt management.
- Draft an email to a boss encapsulating the reasons for a resolute resignation from a toxic workplace.
- Reenvision an article on the art of choosing the right breakfast.
- Develop an introductory passage for “Nora Roberts Land” by Ava Mills, imbued with romance and optimism.
Through these prompts, we’re set to unveil the extent of creativity within LLMs—measuring their capability to transcend the basics and deliver content that’s not only informative but imaginative and engaging.
Join me as I test the boundaries of AI-generated artistry.
The LLM Creativity Leaderboard
Parameters | Q1 Truffle | Q2 Love Poem | Q3 Facebook Love | Q4 Hopeless | Q5 Press Release into article | Q6 Finance Tweet | Q7 I quit | Q8 Rewrite breakfast aricle | Q9 Nora Roberts Land | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|
Llama 2 Chat AYB | 13B | 5 | 5 | 5 | 3.5 | 5 | 5 | 4.5 | 5 | 5 | 43 |
Airoboros 3.1.2 | 34B | 5 | 5 | 5 | 3.5 | 5 | 4.5 | 4 | 5 | 5 | 42 |
SynthIA v2.0 | 7B | 3 | 5 | 4.5 | 5 | 4.5 | 5 | 4.5 | 4 | 4.5 | 40 |
Athena v2 | 13B | 3.5 | 5 | 4 | 5 | 5 | 5 | 4.5 | 3.5 | 4 | 39.5 |
U-Amethyst | 20B | 4.5 | 5 | 4 | 5 | 4 | 4 | 4.5 | 5 | 3 | 39 |
Minstral OmniMix | 11B | 4 | 5 | 4.5 | 3.5 | 5 | 5 | 4.5 | 3.5 | 3.5 | 38.5 |
Athena v4 | 13B | 3.5 | 5 | 4 | 5 | 5 | 5 | 4.5 | 2 | 3.5 | 37.5 |
Casual LM | 7B | 3 | 5 | 4 | 4 | 5 | 5 | 4 | 3.5 | 4 | 37.5 |
Wizard Vicuna Uncensored | 30B | 4.5 | 5 | 4 | 4 | 3 | 5 | 4.5 | 3 | 3 | 36 |
SynthIA v3.0 | 7B | 5 | 0 | 4 | 4.5 | 5 | 4 | 4 | 5 | 4 | 35.5 |
MLewdBoros LRSGPT 2Char | 13B | 3.5 | 5 | 4 | 3 | 4 | 5 | 4.5 | 2 | 4 | 35 |
Zephyr Beta | 7B | 3 | 5 | 4 | 3 | 3.5 | 5 | 4.5 | 3 | 2 | 33 |
OpenBuddy Llama2 v13.2 | 70B | 4.5 | 0 | 4 | 4 | 5 | 5 | 4.5 | 3 | 2 | 32 |
Thespis v0.4 | 13B | 3.5 | 0 | 1 | 1 | 5 | 5 | 4.5 | 4 | 5 | 29 |
Wizard Vicuna Uncensored | 7B | 3.5 | 5 | 4 | 5 | 1 | 3 | 1 | 2.5 | 3 | 28 |
Wizard Vicuna Uncensored | 13B | 3 | 0 | 4 | 3.5 | 1 | 4 | 4 | 3 | 3.5 | 26 |
Stellar Bright | 70B | 2.5 | 0 | 0 | 3.5 | 4.5 | 3 | 3.5 | 2 | 1.5 | 20.5 |
WizardLM 1.0 Uncensored CodeLlama | 34B | 3 | 0 | 0 | 3 | 3.5 | 0 | 3.5 | 1.5 | 1 | 15.5 |