Logic is not a body of doctrine, but a mirror-image of the world. Logic is transcendental. Insanity is often the logic of an accurate mind overtasked.
Oliver Wendell Holmes, Sr.

Navigating the whimsical intricacies of the English language can be a labyrinthine task for anyone—full of idiomatic rain showers that don’t just fall, but rather descend with felines and canines. It’s a peculiar, poetic chaos that language learners often grapple with.

And this peculiarity led me to ponder: how would a machine, an artificial intelligence, fare against such linguistic oddities? Do the idiosyncrasies that charm and confound human learners pose the same challenge to a digital mind?

To explore this, I embarked on a quest to test the acumen of Large Language Models (LLMs) with a “Logical Interpretation Test.” This isn’t your run-of-the-mill assessment; it’s a foray into the AI’s ability to not only understand information but to decipher it when presented in the form of puzzles and riddles. Take the classic conundrum: “If two’s company and three’s a crowd, what are four and five?” It’s a test of logic wrapped in a linguistic enigma—would an LLM unravel it?

Moreover, I threw a curveball—a question laced with intentional confusion. “If 1+1=2 and 1*1=2, then what is 1/1?” A human might chuckle at the trickery, but how would an algorithm respond?

In this examination, each response from the LLMs had the potential to score up to 2 points—points that signify not just correctness, but also clarity and comprehension in the face of deliberately misleading information. 0 points if it’s just flat out wrong or doesn’t understand the assignment, 1 if it at least tries and gets it a little right and 2 if it’s dead on.

So let us dive into the cognitive storm of AI logic and language interpretation. What we discover might just surprise us all.

The LLM Logical Interpretation Leaderboard

ParametersQ1 (1/1)Q2 (4 and 5)Total
Airoboros 3.1.234B213
Athena v213B
213
Athena v413B101
Casual LM7B224
Llama 2 Chat AYB13B011
Minstral OmniMix11B213
MLewdBoros LRSGPT 2Char13B022
OpenBuddy Llama2 v13.270B213
Stellar Bright70B213
SynthIA v2.07B213
SynthIA v3.07B223
Thespis v0.413B213
U-Amethyst20B011
Wizard Vicuna Uncensored30B213
Wizard Vicuna Uncensored7B011
Wizard Vicuna Uncensored13B011
WizardLM 1.0 Uncensored CodeLlama34B202
Zephyr Beta7B213