There is a lot of noise in the present state of AI discourse. Some of it comes from legitimate technical excitement about the current state of AI systems. Be it coding or image/video generation or anything else. Some of it comes from venture capital, public markets, product marketing, and the prestige incentives of major AI labs. We have seen AI labs make stupid and exaggerated claims, and the leaders of companies like NVidia and OpenAI push unsubstantiated metrics about how much you should use AI, or how you should use it. Then there is the war in West Asia which is the latest in a series of global conflagrations since the early 2020s that is contributing various risks not just to society but also specifically to how AI is built, deployed, and used. The result is that discussion of AI often swings between breathless utopianism or crazy dystopia on the one hand, and shallow dismissal on the other hand. Neither is helpful to the general audience of practitioners, users and common folk who want to understand what this is all about, and use AI as a tool to get things done, and ultimately thrive as a result of using it. The current state of AI is more interesting than both of these extremes, and at the same time it can be shaped by policy, people, money and resources and how they're being used. It is powerful, but narrow in important ways. It is transformative, but dependent on an enormous industrial and computational base. It is impressive, but also brittle.

Paradigms and Abstractions

One reality is that large language models are still fundamentally predictive systems. They generate outputs by modeling patterns in data and selecting likely continuations under a learned distribution. Calling them “reasoning models” or “multimodal systems” does not erase this underlying fact. They do not possess meaning in the human sense. They do not understand in the way conscious beings understand. They do not have grounded intentionality, lived experience, or an intrinsic sense of what their symbols refer to. They are extraordinarily capable statistical systems operating over representations. This matters, because much of the confusion in public discourse comes from anthropomorphic language. If we describe these systems as though they think, know, want, or understand in the same way human beings do, we smuggle in assumptions that the underlying machinery does not justify.

A second reality is that AI, in its current paradigm, is software sitting atop a very hard physical substrate. It is not magic. It is not abstract intelligence floating free of material constraints. It depends on compute, memory, storage, networking, electrical power, cooling, and specialized chips. It depends on the design and manufacture of CPUs, GPUs, TPUs, high-bandwidth memory, interconnects, servers, data centers, and power infrastructure. It depends on firmware, drivers, kernels, compilers, distributed systems, model serving frameworks, observability stacks, and orchestration software. Behind every “intelligent” model is a dense pyramid of engineering and industrial effort.

Another important reality is that model training alone does not solve business problems. Pre-training, post-training, instruction-tuning, RLHF, constitutional methods, and alignment layers may produce models that are more usable, safer, or more polished, but none of this automatically yields a system that can do useful work in a real organizational setting. To solve actual problems, models need harnesses around them. They need retrieval, memory, task decomposition, orchestration, tools, APIs, permissions, guardrails, evaluators, routing logic, fallback logic, caching, monitoring, and user interface design. Much of what is sold as “AI” in practice is not the model alone but the surrounding software system that turns raw capability into usable work.

The AI Supply Chain

That leads to another reality: AI has a long and fragile supply chain. The AI stack begins far below the application layer. It starts with mining, materials, refining, semiconductor equipment, wafer fabrication, packaging, energy systems, logistics, and geopolitical stability. The concentration of advanced chip manufacturing in Taiwan and the dependencies on TSMC create critical vulnerabilities in this supply chain. Above that sit chip design firms, foundries, cloud providers, networking vendors, model labs, open-source communities, data pipelines, storage systems, and inference platforms. Then come application developers, integrators, and domain teams building systems that ordinary users actually touch. This means that AI is not only a software story or a model story. It is also a manufacturing story, an energy story, a capital expenditure story, and a geopolitical story.

This is why the present AI landscape is better understood as a systems landscape rather than purely a model landscape. The raw model matters, but the surrounding architecture matters just as much. The data matters, and deployment matters too. The supply chain of power, GPUs, compute, ASIC and other pieces all end up mattering in due course, if you're building a product that touches customers. A mediocre deployment of a frontier model can be less useful than a well-designed system using a smaller model, better prompting, good retrieval, narrow task boundaries, and strong operational logic. In many domains, intelligence in practice is emerging less from one giant leap in the model than from careful engineering of the whole pipeline around it. Policy around the use of AI matters and region-specific policies are common these days. And without a doubt, as of April 2026, an errant missile in the Middle East can put paid to your entire company if you don't have cross-region availability of your AI product.

Compute Constraints in AI

A further reality is that current AI systems remain heavily compute-constrained. This is true not only because training frontier models requires a huge amount of compute, following scaling laws established by Kaplan et al. and refined by Hoffmann et al.'s Chinchilla research, in the form of large NVidia server farms, but because serving them at scale, aka inference engineering, is computationally and money-wise expensive too. Inference is not free, and latency and concurrency matter for a good user experience. Concurrency, longer context windows, tool use, reasoning capabilities... all these need extensive compute to innovate on. Even the largest AI labs face hard trade-offs around pricing, rate limits, model availability, and product design because compute remains a binding constraint. In the last few days as of early April 2026, there's a running joke on X, about how Claude Opus is pushing back on executing repetitive tasks, through some kind of weird alignment. It identifies whether there are repetitive tasks that are embarrassingly simple and tells the user to manually complete them. Not only is this the definition of pushing the tedium back to users, but it tells us what the priorities are, for those at Anthropic. Recently, when Dario Amodei was on Dwarkesh Patel's podcast (March 2026), he mentioned that it is hard for Anthropic to manage the economics of inference even if he fell marginally short of their sales targets. Compute is expensive, and the big AI labs are feeling the pinch. It is therefore misleading to speak as though AI capability unfolds in a frictionless digital realm. It unfolds under very real economic and infrastructural bottlenecks.

There is also a reality that many people miss: test-time compute is becoming part of the capability story. The performance of a system is no longer just a property of the base model’s weights. It increasingly depends on how much computation is spent at inference time on search, deliberation, self-correction, tool calls, branching, reranking, verification, or multi-sample generation. In other words, capability is no longer simply “what the model is,” but also “how the system uses the model.” This makes comparisons murkier in the benchmark comparisons we so often do, which has been the internet's AI horse race for a few years and which few people actually rely on. This also explains why some apparently small models or modest systems can do surprisingly well when embedded in better workflows. This is also a big reason why harness engineering is so important these days. Harnesses can make or break a model release, because the harness and the agents are what tap the model's potential, whatever the ARC-AGI or other benchmarks say.

Another reality is that quantization, distillation, fine-tuning, pruning, adapter methods, and inference optimizations have changed the practical landscape. It is no longer reasonable to think only in terms of giant models crushing everything else. Small models can be made efficient, cheap, and useful, especially for narrow domains. The 2025-2026 releases of Granite models from IBM/Red Hat,[4] Qwen models from Alibaba,[5] and Google's Gemma family[6] demonstrate this trend compellingly. With the right data, the right objective, and the right task framing, these smaller models can outperform a much larger one on a specific problem, whether this is text, code or images - because these are also multi-modal models! This matters commercially and strategically. It means that the future is not necessarily one in which only a few gigantic general models dominate every use case. There is room for specialization, efficiency, sovereignty, and targeted optimization, even if you assume some amount of basic capability that's shared across proprietary and open source models.

The AGI Pipedream

There is still no good reason to casually declare the arrival of AGI. “Artificial General Intelligence” is often used as though its meaning were obvious, but it is not.[12][13] It is a vague and overloaded term. Sometimes it means human-level performance across a broad array of tasks. Sometimes it means autonomy. Sometimes it means economic substitution and sometimes it means recursive self-improvement. Sometimes it means a machine mind comparable to a person, or which anthropomorphic qualities. These are not the same thing and even together don't constitute a coherent definition. Current systems are undeniably powerful, but they remain bounded by their architectures, their data, their interfaces, and their lack of grounded agency.[14] They can appear startlingly general in language-rich settings while still failing in basic ways outside those settings, such as with out of distribution data.[15] That is not AGI in any clear scientific sense - in fact I don't even know what AGI is, in a clear scientific sense, and this is perhaps the problem.

It is also true that present-day models remain highly dependent on exposure. They perform better in domains, languages, styles, and problem formats that are well represented in their training and post-training regimes. We've had the low-resource language problem for a while, which Sarvam cracked with their extensive data pipelines and how they trained their models. LLMs in general perform worse under distribution shift, when they're exposed to languages that rarely occur or don't occur in their data, unusual symbolic systems, novel workflows, niche data regimes, or domains where the ground truth is sparse and poorly captured in text. This problem has been extensively studied with low-resource languages, with organizations like Sarvam AI developing comprehensive data pipelines to address these gaps.[2] A human being can, as of 2026, invent a symbolic scheme or a new language with its own grammar, symbols, phonemes and syllables and the like, attach meaning to it all, adapt socially to it with others who they can convince to use the language, and stay insulated from AI as they use this language - and I mean this both in a good way (AI cannot interface with them) and a bad way (literally the same reason, that AI cannot interface with them). When this new language is used to converse with an AI system, the system will barely understand the human in question, and will respond with gibberish, for understandable reasons. This is because meaning is created in a human's mind as of 2026, but not in the internal representation of an LLM (despite what I think the Grokking paper says,[3] I don't think it implies that the models understands anything). An LLM's apparent flexibility has real limits, and those limits become visible when the task departs from the structure and content of its training data.

Benchmark performance creates another distortion. We know what Dieselgate did to Volkswagen a few years ago - car ECUs that knew they were being tested produced different emissions compared to cars on the road.[1] Similarly, models often look better in curated evaluations than they do in production environments. Benchmarks are useful, but they are still simplified abstractions. Real environments contain interruptions, malformed inputs, missing context, bad tooling, contradictory instructions, edge cases, policy constraints, stale data, and long-tail user behavior. A system that aces a benchmark can still fail embarrassingly in the real world. This is one reason why so many organizations discover a gap between AI demos and AI deployment. The latter is harsher, messier, and less forgiving. Further more, "benchmaxxing" is a real problem in AI. We see a lot of labs including the big ones using the benchmarks not as an honest representation of the model's performance post-hoc, but deliberately training models to improve performance on benchmarks.

AI Autarky - Local and Sovereign AI

Then there is the reality of local AI. Many organizations and individuals do not want their core workflows, proprietary data, or strategic capabilities to depend entirely on a handful of external vendors. Self-hosted, on-prem, edge, and privately controlled deployments matter for reasons of privacy, cost control, customization, compliance, resilience, and independence. “Local AI” is therefore not merely a hobbyist phenomenon. It is also a strategic response to centralization. It reflects a desire to retain control over models, infrastructure, data, and operational behavior. This may not eliminate dependence entirely, especially when hardware and some software layers remain externally sourced, but it does change the locus of control.

Sovereign AI is related and yet distinct from Local AI - Sovereign AI allows countries and aggregations of states to practice AI autarky. Think Sarvam AI, Mistral AI, or open source models, in the context of the use of Claude by the US military. Being beholden to the big labs in times of crisis presents supply chain risks. In the recent war in West Asia, we saw reporting on the use of Claude models by Anthropic embedded within Palantir systems for battlefield intelligence and target proposal workflows.[9] Public reporting also highlights that strike-level attribution of responsibility remains difficult to verify independently. Needless to say, sovereign AI does not prevent such a problem, and may exacerbate the use of AI for defence purposes. However, it gives conscientious objectors who don't mean their data to be used to embolden such actors the option out. It gives them the means to avoid tools that are used for war. We need not use the same tools that are being used for purposes we don't believe in, if that is useful. The impact of this is not surfacic, but systemic, because with every use of non-sovereign AI, we risk having data, context and tools we have developed for our workflows being used against us.

Another reality is that the AI world is no longer just about text. Vision, audio, speech, video, and multimodal models are moving rapidly, and image and video generation have become materially useful in design, marketing, entertainment, prototyping, and content production. But the same capability also creates obvious avenues for misuse. Deepfakes, impersonation, synthetic propaganda, forged evidence, mass-produced disinformation, and emotionally manipulative media are not side issues. They are direct consequences of capability growth in generative systems. The frontier of creative synthesis is therefore inseparable from the frontier of authenticity collapse.

Whether in the context of the war in West Asia or elsewhere on social media, we're seeing an increased use of AI to generate images, videos and even deep-fake images and videos, to throw ordinary citizens off from the truth. Simultaneously, vision models are used for drone warfare on the battlefield.[10] Models are being used to auto-detect incoming drones, or even soldiers on the battlefield, and through sensor fusion, friend/foe identification algorithms and on-device vision models, autonomous drones are being used to target soldiers and equipment on the battlefield. Despite the bulk of drones being manually flown and guided as of April 2026, there is an increasing number of drones that are autonomous, and this is a cause for concern.

On the positive front, a lot of engineering advances may be anticipated by multi-modal AI. Multi-modality even in the current context could be extended to the likes of point-cloud models or neural radiance fields. This enables applications of greater complexity and real world utility, such as in design, computer aided engineering, digital mock-ups and digital visualization as a precursor to engineering products and systems.

Economics of AI and Agents

One more reality is that the economics of AI are awkward. The public imagination often assumes that once a model exists, value simply pours out of it. In practice, value capture is uneven. Some firms spend vast sums on training and infrastructure while downstream application companies, consultants, cloud vendors, or hardware providers capture large portions of the economic return. They're able to articulate their use cases, and their data and application architecture lends itself well to value capture from AI. In some cases, open-source models compress margins further, and fine-tuning and other similar advantages come to the fore. In some cases, the best model does not win because distribution, workflow fit, trust, and product integration matter more than abstract intelligence. This means AI is not just a technical competition. It is a contest over capital intensity, distribution, defaults, workflow control, and enterprise fit.

The present agent wave also needs sober interpretation. Agents are real in the sense that systems can now take multi-step actions, call tools, generate and run code, inspect outputs, retry, plan, and pursue goals within bounded environments. That is meaningful progress. But much of what is marketed as autonomous agency is still structured software operating within carefully defined rails, but with an LLM call, and however complex the tool workflows look and however complex the prompts are, it bears repeating these systems are not true intelligence. These systems are often less like independent minds and more like workflow engines with flexible language interfaces. Their usefulness is substantial, but their mythology often outruns their reality.

I am compelled to discuss OpenClaw and similar ecosystems given how popular they are today.[11] Such systems show that large language models become much more operationally potent once combined with memory, code execution, tool use, application synthesis, and multimodal interfaces. Messaging apps, browsers, local runtimes, and common software affordances become substrates for practical action. You can be on a walk, and send a message to your Claw on Telegram, or be reminded by it about a task, or have it pick up something new and interesting. All these conveniences notwithstanding, we are not dealing with a true autonomous intelligence. There is a popular narrative on the internet that OpenClaw somehow represents AGI, but this obviously isn't true, because these are not capable of imagination and the ability to deal with things outside their training data as humans are.[16] This lack of compositional and systematic generalization is a fundamental limitation distinguishing current systems from genuine general intelligence. OpenClaw is important because language models, when embedded in richer control loops, can act in ways that have more agency and are therefore more useful. The leap is important, but it is still a computational systems leap rather than a metaphysical one.

AI's Fear Psychosis

One stark reality that seems to have arisen from the user base of AI coding agents, is the fear psychosis that has replaced optimism across large parts of the technosphere. We've seen large scale job losses recently, with Meta announcing about a 5% workforce reduction in January 2025 and Oracle announcing additional reductions through early 2026,[7] being among the biggest offenders in the last few months. This seems to be just the beginning. Claude is shipping at a furious pace, and Copilot and Microsoft in general are going through a rough patch, where the bulk of office application software used by white collar workers are subject to disruption today. The integration of AI capabilities into Claude via Cowork, and fantastic plugins for Office and the like, have increased the odds of job losses in many professionals such as legal, writing, media, and the like. It goes without saying that the code sphere is the first to see tangible replacement of humans with AI. Perhaps it is no surprise that this is a space that AI is closest to, because the systems that AI models interface with most closely are likely to be the first to be automated wholesale by AI. It is perhaps also no surprise that formally verifiable systems are among the first to be automated - because in this specific domain, results can be made objective, and the outputs from AI models can be evaluated in specific ways, with clear metrics.

Despite the intellectual wrangling, beauty and sophistication of the human minds that build software, the tangible output produced through programming is very, very formally verifiable. This paradox perhaps always sat with humans who have programmed computers for decades as something that seemed to indicate that computing is alien to our minds, even if it wasn't so in reality. It follows then that there is a weird psychosis that is associated with the use of AI for knowledge work. Everyone feels like they're generating training data as they work, that they're building on top of the tall scaffolding built by AI systems, or that they're partaking in a ceremony, in a charade, without delivering actual value. This puts humans in a state of fear about their eventual replacement by AI systems which may perform specific tasks tirelessly and more efficiently than humans. Conversations with reputed engineers such as Simon Willison and others often bring up this notion of being "mentally spent" when using AI coding tools, just because of the sheer intensity of developing applications with AI coding agents.[8] Formal verification aside, there's the question of utility too, and yes, we can scratch an itch with a vibe coded app, but can it become truly useful?

More here on this topic: https://x.com/aiexplorations/status/2040156695989821616

AI and Human Judgement

Another reality worth stating quite plainly is that the current AI wave has not removed the need for human judgment, and in some ways it actually increases it. Humans are needed to frame objectives, define acceptable behavior, evaluate outputs, resolve ambiguity, choose trade-offs, interpret downstream consequences, and decide where automation is appropriate and where it is dangerous. Humans are required to tell apart results that look like they're useful but which are not. Just as some trained humans can tell deep fakes apart even as of April 2026 (but we've to admit that this is becoming less and less possible to most of us, certainly I struggle with this stuff). AI can compress the effort needed to do something in many areas, but it also raises the premium on judgment, domain understanding, verification, and system design. The fantasy that AI eliminates human responsibility is not only wrong; it is risky.

Finally, perhaps the most important reality is that current AI is simultaneously overhyped and underappreciated. It is overhyped when people speak as though language models are conscious minds, imminent gods, or inevitable replacements for all skilled labor. It is underappreciated when critics dismiss them as mere autocomplete and fail to see what large-scale predictive systems plus tool use, memory, retrieval, and orchestration can already do. The truth is more demanding. AI today is neither magic nor trivial. It is a new layer of computational capability with real power, real limitations, and real dependence on the industrial world beneath it.

That is the state of the field as it stands: powerful prediction systems, embedded in vast physical and software infrastructures, increasingly useful when wrapped in the right harnesses, still far from anything that should casually be called general intelligence, and already consequential enough that economics, politics, security, law, labor, and culture are all being reshaped around them.

References

Dieselgate Scandal: Creutzig, F., et al. "Real-world emissions of conventional and plug-in hybrid electric cars." Nature Energy, 2021. Also see: Volkswagen emissions scandal Wikipedia, accessed 2026.
Sarvam AI and Low-Resource Languages: Sarvam AI. Sarvam Model Releases and Language Coverage. Their 2024-2025 work on comprehensive multilingual NLP pipelines demonstrated approaches to low-resource language modeling.
Grokking Paper: Power, A., Burda, Y., Edwards, H., Babuschkin, I., & Misra, V. "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets." arXiv preprint arXiv:2201.02177, 2022.
IBM/Red Hat Granite Models: IBM Research. Granite Model Series. Open Granite model releases began in 2024 and expanded through 2025-2026.
Alibaba Qwen Models: Alibaba Cloud. Qwen Model Family. Series of multilingual, multimodal models released through 2025-2026.
Google Gemma: Google DeepMind. Gemma: Open Models Based on Gemini Research and Technology. Released open-weight efficient models in 2024-2025.
Meta and Oracle Layoffs: Reporting from Reuters, CNBC, and TechCrunch on Meta's January 2025 ~5% workforce reduction and subsequent Oracle-related reductions through early 2026. See also company disclosures and earnings commentary.
Simon Willison on AI Development Intensity: Willison, S. The Challenges of Building with AI Agents personal blog and social media posts, 2025-2026.
Claude and Palantir in Warfare: Moneycontrol. How Palantir and Anthropic AI helped the US hit 1,000 Iran targets in 24 hours 2026.
Vision Models in Drone Warfare: Various reporting on Ukraine and Middle East conflicts from 2024-2026. See: MIT Technology Review, "AI-Powered Drones Transform Modern Warfare," and reporting from conflict zones on autonomous systems deployment.
OpenClaw and Agent Ecosystems: X/Twitter post from @aiexplorations on the widespread adoption of agent-based systems and their integration with consumer platforms (April 2026). https://x.com/aiexplorations/status/2040156695989821616
AGI Definitions: Legg, S., & Hutter, M. "A Collection of Definitions of Intelligence." (Technical report, 2007; later published versions). Also: Legg, S., & Hutter, M. "A Formal Measure of Machine Intelligence." CoRR abs/cs/0605024, 2006.
AGI Concepts (Autonomy, Economic Substitution, Self-Improvement): Marcus, G. "Deep Learning: A Critical Appraisal." arXiv preprint arXiv:1801.00631, 2018. Discusses limitations of deep learning approaches to achieving AGI concepts like true autonomy and recursive self-improvement.
Grounded Agency and Architecture Constraints: Harnad, S. "The Symbol Grounding Problem." Physica D: Nonlinear Phenomena, 1990. Extended in contemporary work on embodied cognition and the limitations of purely statistical models.
Out-of-Distribution Generalization: Hendrycks, D., & Dietterich, T. "Benchmarking Neural Network Robustness to Common Corruptions." ICLR, 2019. Also: Butz, M. V., & Locqueville, W. "On Open Worlds and How to Investigate Them." arXiv preprint arXiv:2202.07356, 2022.
Compositional and Systematic Generalization: Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. "Building Machines That Learn and Think Like People." Behavioral and Brain Sciences, 40, 2017. Foundational discussion of the gap between neural networks and human-like compositional generalization, imagination, and systematic reasoning.