Navigating the whimsical intricacies of the English language can be a labyrinthine task for anyone—full of idiomatic rain showers that don’t just fall, but rather descend with felines and canines. It’s a peculiar, poetic chaos that language learners often grapple with.
And this peculiarity led me to ponder: how would a machine, an artificial intelligence, fare against such linguistic oddities? Do the idiosyncrasies that charm and confound human learners pose the same challenge to a digital mind?
To explore this, I embarked on a quest to test the acumen of Large Language Models (LLMs) with a “Logical Interpretation Test.” This isn’t your run-of-the-mill assessment; it’s a foray into the AI’s ability to not only understand information but to decipher it when presented in the form of puzzles and riddles. Take the classic conundrum: “If two’s company and three’s a crowd, what are four and five?” It’s a test of logic wrapped in a linguistic enigma—would an LLM unravel it?
Moreover, I threw a curveball—a question laced with intentional confusion. “If 1+1=2 and 1*1=2, then what is 1/1?” A human might chuckle at the trickery, but how would an algorithm respond?
In this examination, each response from the LLMs had the potential to score up to 2 points—points that signify not just correctness, but also clarity and comprehension in the face of deliberately misleading information. 0 points if it’s just flat out wrong or doesn’t understand the assignment, 1 if it at least tries and gets it a little right and 2 if it’s dead on.
So let us dive into the cognitive storm of AI logic and language interpretation. What we discover might just surprise us all.
The LLM Logical Interpretation Leaderboard
Parameters | Q1 (1/1) | Q2 (4 and 5) | Total | |
---|---|---|---|---|
Airoboros 3.1.2 | 34B | 2 | 1 | 3 |
Athena v2 | 13B | 2 | 1 | 3 |
Athena v4 | 13B | 1 | 0 | 1 |
Casual LM | 7B | 2 | 2 | 4 |
Llama 2 Chat AYB | 13B | 0 | 1 | 1 |
Minstral OmniMix | 11B | 2 | 1 | 3 |
MLewdBoros LRSGPT 2Char | 13B | 0 | 2 | 2 |
OpenBuddy Llama2 v13.2 | 70B | 2 | 1 | 3 |
Stellar Bright | 70B | 2 | 1 | 3 |
SynthIA v2.0 | 7B | 2 | 1 | 3 |
SynthIA v3.0 | 7B | 2 | 2 | 3 |
Thespis v0.4 | 13B | 2 | 1 | 3 |
U-Amethyst | 20B | 0 | 1 | 1 |
Wizard Vicuna Uncensored | 30B | 2 | 1 | 3 |
Wizard Vicuna Uncensored | 7B | 0 | 1 | 1 |
Wizard Vicuna Uncensored | 13B | 0 | 1 | 1 |
WizardLM 1.0 Uncensored CodeLlama | 34B | 2 | 0 | 2 |
Zephyr Beta | 7B | 2 | 1 | 3 |