AI is Fast - Yoav Anaki

# AI is Fast ChatGPT is very smart. In OpenAI's [GPT-4 technical report](https://cdn.openai.com/papers/gpt-4.pdf), dated March 2023, GPT-4 scored 163 on its LSAT, which, to my understanding, places it in the 88th percentile. 88th percentile! Pick a random person on the street: 88 out of 100 times, that person will be outsmarted by GPT-4. And, of course, GPT-4 is _so last year_. But being smart isn't enough. In a Hitchhiker's Guide to the Galaxy, the supercomputer Deep Thought is asked the question, "what is the answer to the ultimate question of life, the universe, and everything". It took Deep Thought 7.5 million years to come up with the answer: 42. If only the pan-dimensional beings had asked an LLM instead! Compared with Deep Thought, LLMs are very fast. When chatting with ChatGPT, answers sometimes feel almost immediate. They're clearly much faster than human answers. But how much faster? The fastest human talker was [Steve Woodmore](https://en.wikipedia.org/wiki/Steve_Woodmore#:~:text=Career-,Fastest%20talker,he%20was%20seven%20years%20old.). Steve was a British salesman able to articulate at 637 words per minute, 4 times faster than the average human. [Here's a link](https://www.google.com/search?q=Steve+Woodmore&sourceid=chrome&ie=UTF-8#fpstate=ive&vld=cid:86ba2137,vid:YUEho571Lts,st:0). Anecdotally, typing (212 wpm) and handwriting (68 wpm) are nowhere near as fast as talking. A single kilobyte contains around 500 words, so Steve was capable of producing 1.274 kilobytes of text per second. The average OpenAI token contains 0.77 words, so Steve could generate 490 tokens per minute (tpm). At [Yodel](www.hiyodel.com), we generate approximately 10 billion tokens per year (all on gpt-3.5-turbo or gpt-4o-mini). That's about 19,025 tokens per minute, on average. And that's not even the technical limit. Our fastest ever token was generated at the mind-boggling rate of 147,999 tokens per minute (this is very close to OpenAI's rate limits for o4-mini). To put this into perspective, ChatGPT generated: * 113,959 words in a single minute * 302x faster than Steve could talk * for about 9 cents These numbers are aggregated over several different API requests. 4o-mini is slower when tested on a per-request basis: in a recent test, it took 18 seconds to generate 1000 completion tokens, so a rate of just about 3000 tpm. Other LLMs are much faster: Groq claims to generate 240 tokens per second, or just under 15,000 tpm, in a single thread in production! Isn't that wild? These models are so fast they can produce a book the length of Harry Potter and the Sorcerer's Stone in about 7 minutes, or 1.5 minutes when using multiple threads. While we can still hold our own in raw intelligence, when it comes to speed, humans are completely left in the dust - if we were ever in the race. *Note to self: another element of this that is worth considering is input time. It takes models considerably longer to process input tokens. Humans are also slower at doing that, however.*