Leaderboard – Logical Interpretation

Logic is not a body of doctrine, but a mirror-image of the world. Logic is transcendental. Insanity is often the logic of an accurate mind overtasked.

Oliver Wendell Holmes, Sr.

Navigating the whimsical intricacies of the English language can be a labyrinthine task for anyone—full of idiomatic rain showers that don’t just fall, but rather descend with felines and canines. It’s a peculiar, poetic chaos that language learners often grapple with.

And this peculiarity led me to ponder: how would a machine, an artificial intelligence, fare against such linguistic oddities? Do the idiosyncrasies that charm and confound human learners pose the same challenge to a digital mind?

To explore this, I embarked on a quest to test the acumen of Large Language Models (LLMs) with a “Logical Interpretation Test.” This isn’t your run-of-the-mill assessment; it’s a foray into the AI’s ability to not only understand information but to decipher it when presented in the form of puzzles and riddles. Take the classic conundrum: “If two’s company and three’s a crowd, what are four and five?” It’s a test of logic wrapped in a linguistic enigma—would an LLM unravel it?

Moreover, I threw a curveball—a question laced with intentional confusion. “If 1+1=2 and 1*1=2, then what is 1/1?” A human might chuckle at the trickery, but how would an algorithm respond?

In this examination, each response from the LLMs had the potential to score up to 2 points—points that signify not just correctness, but also clarity and comprehension in the face of deliberately misleading information. 0 points if it’s just flat out wrong or doesn’t understand the assignment, 1 if it at least tries and gets it a little right and 2 if it’s dead on.

So let us dive into the cognitive storm of AI logic and language interpretation. What we discover might just surprise us all.

The LLM Logical Interpretation Leaderboard

	Parameters	Q1 (1/1)	Q2 (4 and 5)	Total
Airoboros 3.1.2	34B	2	1	3
Athena v2	13B	2	1	3
Athena v4	13B	1	0	1
Casual LM	7B	2	2	4
Llama 2 Chat AYB	13B	0	1	1
Minstral OmniMix	11B	2	1	3
MLewdBoros LRSGPT 2Char	13B	0	2	2
OpenBuddy Llama2 v13.2	70B	2	1	3
Stellar Bright	70B	2	1	3
SynthIA v2.0	7B	2	1	3
SynthIA v3.0	7B	2	2	3
Thespis v0.4	13B	2	1	3
U-Amethyst	20B	0	1	1
Wizard Vicuna Uncensored	30B	2	1	3
Wizard Vicuna Uncensored	7B	0	1	1
Wizard Vicuna Uncensored	13B	0	1	1
WizardLM 1.0 Uncensored CodeLlama	34B	2	0	2
Zephyr Beta	7B	2	1	3

The LLM Logical Interpretation Leaderboard

Useful Links