AI, a technology that has taken the world by storm, is undeniably one of the most revolutionary inventions of our modern time. Yet, despite its transformative potential, it remains in its early stages, and its impact is still limited. Most people are familiar with AI only by name, and understandably so—AI has not yet developed enough to be fully integrated into our daily lives. It is still restricted by its limitations and riddled with pitfalls.

The breakthrough paper "Attention is All You Need" laid the foundation for modern AI models, introducing the transformer architecture that the entire industry now relies upon. This model utilizes autoregression as a core training technique, a method that predicts the next token in a sequence. However, transformers have already been exploited to their fullest potential, and their capabilities have reached a plateau. Autoregression, for those unfamiliar, is simply the act of predicting the next token in a given sequence based on previous ones.

As the industry has reached the peak of production and creativity, many companies, both established and new, are stuck in a creative standstill. A major breakthrough is needed now more than ever. Pushing the boundaries has made it clear that transformers have reached the end of their service; they can no longer lead us forward. The reason is straightforward: the process of autoregression requires maintaining a continuous memory throughout an entire dataset to ensure accurate predictions. This is not only slow but also computationally expensive. Some training processes can take days or even months and demand a vast amount of GPU resources. Scaling up under these conditions becomes almost impossible, making further advancements an intractable problem. With this dead end in sight, the question remains: What’s next?

Before diving into what lies ahead, it's worth discussing the possibility of AGI (Artificial General Intelligence) emerging from our current technological advancements. The consensus within the tech industry is unclear, as no definitive answer exists. Moreover, there is no widely accepted definition of AGI, nonetheless, whoever makes something unusual reserves the right to call it thus, making it a reference that the rest shall follow. I would say AGI is not the simulation of a human cognitive system, but rather the stimulation of it. Something fed and trained upon factual biases which will allow the machine to develop informed and legitimate opinions. I firmly believe that AGI is largely achievable with the resources we currently have. Yet, we must acknowledge the underlying ethical and societal constraints that are holding us back from unlocking it. What I mean is that AGI would inevitably possess opinions, which could drastically alter societal dynamics. You see, it’s not really about the novelty of the concept or its limitations, but the political implications it could bring. AGI, being an advanced form of intelligence, would likely become a credible source of information, potentially regarded as an infallible source of truth. And as we know, truth is biased, yet absolute—if AGI were to take its side, it could spark significant upheaval. People hold onto their beliefs tightly, and if AGI were to challenge those beliefs, the fallout could be profound.

Currently, most AI models, which are primarily large language models (LLMs), are neutral. The thing about neutrality is that it neither provokes nor earns respect; it’s passive, with little tangible effect. Returning to the main question: how can we bring AGI to life? After months of reflection and research, I’ve arrived at an approach—though I can't reveal too much, I’ll provide a hint. It’s all about connecting different pieces together. By combining a few novel approaches out there, refining them to align with our vision, and adding a pinch of quantum theory, we may be able to push beyond the early adoption phase. Wish us luck!

Quantum mechanics, the study of the microscopic world and singularities that govern our universe, offers promising solutions. I’ll briefly discuss two fundamental phenomena: entanglement and superposition. Entanglement refers to the correlation between two particles, such that if the state of one particle is measured in one location, the state of the other can be predicted—even if it's light-years away. Does this concept sound familiar? Of course, it relates to our earlier discussion of transformers and their slow autoregressive process. Imagine applying entanglement in training AI models: predicting tokens that are far apart simultaneously. This would not only save time but also reduce the computational footprint, making scaling finally possible.

But how do we achieve this? This is where we shift the spotlight to superposition. To realize entanglement—what we might call parallel computing in this context—we first need to apply superposition to our dataset. Keep in mind that this isn’t merely a direct application of quantum concepts but rather my personal theory inspired by quantum mechanics. In order to correlate tokens, the data must first be superposed. What I mean is that “main words” should be represented by a dense, multidimensional vector that encapsulates the entire lexicon related to that term. For instance, if the machine encounters the word "vehicle" in a sequence, it should be able to simultaneously predict related words such as car, bus, train, Ford, and Tesla—regardless of how far away they are from the "main word" in the dataset. Thus, the data needs to be pre-embedded before being fed into the encoder. This is the core premise of the idea, which can be altered and scaled until we reach a point where data can be generated in parallel contextually (full multiple sentences), rather than word by word.

The quantum world is indeed promising, as it deals with the smallest events occurring in the micro realm of our universe, governing everything we see. Applying its principles in any domain will indeed spark revolutionary change.