The App-Centric Era Is Ending
The app-centric way of interacting with computers is dying—and honestly, I can't wait.

When the iPhone arrived, it triggered the big bang of apps. We've spent the last fifteen years training ourselves to think in apps. Everything begins with an app: unlock your phone, hunt for the right icon, tap, scroll, repeat.
But if you pause for a second, it feels wrong. Our brains don't naturally organize life into apps; we think in terms of intent.
Before Large Language Models (LLMs), apps were the best abstraction we had to translate our intent into actions. And, to be fair, it worked perfectly… until now. Then ChatGPT happened, and suddenly, hunting through endless screens of icons feels like busywork from another era. I think we all kind of feel the same thing: now that computers have learned to talk, we should be able to tell them what we want, and they should take care of the rest.
The next decade is all about closing the gap between where we are today and this ideal state. The immediate steps are quite clear. Largely there are 3 gaps that most people seem to agree on:
- AI isn’t personal enough – Despite all its magic, it still knows very little about you.
- AI isn’t ubiquitous enough – It doesn’t meet you where you are.
- AI isn’t proactive enough – It tells you what to do, but more often than not doesn’t do it for you.
So, where will this transformation happen? Which platform will a more personalized, ubiquitous, and proactive AI take root in?
It's hard to predict the distant future, but the ideal immediate candidate is already right in front of us. It's not a new app, a wearable pendant, or a futuristic interface. It's the software we've been using all along: the browser.
There are a few reasons why the browser is uniquely positioned:
First is the context layer. Tabs, domains, sessions, iframes, cookies, credentials, extensions, bookmarks, history, sidebars, URLs—these are all rich sources of context through which browsers continuously gather data. They observe how you navigate the web, what you search for, what you ignore, and what you revisit. Your default browser already holds deep context about you - a lot of it.

Next, a lot of work happens in the browser. Much like the iPhone outgrew the “phone” label, the browser has outgrown “browsing.” We carry most of our knowledge and productivity work—email, docs, calendars, meetings—on the web. Modern browsers have become quasi-operating systems, hosting not just static content but full-fledged apps—Google Workspace (Docs/Sheets/Slides/Drive), Slack, Figma, Notion, Asana, PWAs, crypto wallets—the list goes on. With AI, the browser is ready to host a new kind of software: intelligent agents capable of chat, coding, deep research, computer usage, image generation, and more. As it leverages these agents, the browser is no longer just where work happens but becomes an active participant in that work. We won’t just use apps in the browser, browsers will become an "embedded" part of our lives.
Finally, the browser is cloud-native and therefore cross-platform. iOS, Android, macOS, Windows, iPadOS—even TVs and VR headsets—the browser runs everywhere. Agents within the browser would feel ever-present, offering a unique quality that standalone AI chatbots can't match.
All of this highlights the motivations behind the looming Browser War III—a fight to control the web’s most critical gateway in the age of AI.
Browser War III
The Browser War might be the highest-stakes war of all because it's a winner-take-all market. Browsers are extremely sticky. The longer you use a browser, the more footprint you leave—profiles, passwords, credentials, bookmarks, extensions—and the less likely you are to switch. With AI agents capturing your valuable web behavior data and offering hyper-personalized services, this stickiness will only deepen.
But this intense market concentration isn't driven by consumer choice alone. The browser market is tightly controlled, with operating systems and OEMs (Original Equipment Manufacturers)—upstream of browsers—playing a significant role. iOS and Android intentionally design their interfaces to lock users into preferred browsers Safari and Chrome, through measures like restricting browser engines, obscuring default browser settings, and nudging users toward defaults. OEMs further reinforce these walled gardens by pre-installs (like Edge on Windows PCs or Chrome on Android devices), often bound legally or incentivized financially to do so.
Under normal conditions, there would be no chance for a new winner to emerge in such a controlled environment, which explains the historical lack of competition.

But we live in unprecedented times. No incumbent—browser, OEM, or OS—is safe amid such disruptive technology. AI browsers offer entirely new experiences, blending chatbot interfaces directly into browsing. Navigating product design and engineering breakthroughs in this space is part science, part art. Those who succeed in doing so now have the chance to build lasting moats for decades.
Within this evolving landscape, several emerging players are making their mark, including Perplexity Comet, The Browser Company’s Dia, Genspark, Fellou, Opera Neon, and the much-speculated browser from OpenAI. However, before delving into these new entrants, it's crucial to first grasp the current state of affairs.
Game of Thrones
Below is web traffic flow measured in page views across key players—ranging from OEMs and operating systems upstream, to search engines downstream of browsers. Here are a few highlights from the data:

Chrome is king: Chrome commands more web traffic than both Safari and Edge on macOS and Windows, despite those browsers being preinstalled on their respective platforms. One of the first things people do when they buy a Mac or Windows PC is download Chrome. Unironically, an Apple VC calls Chrome's URL bar “the most popular text box on all of macOS,” and Satya Nadella, CEO of Microsoft, admits, “Google makes more money on Windows than all of Microsoft.” Chrome is considered the most widely used consumer software product in the world, installed on over three billion devices, with a market share of 66.5%. Chrome also dominates browsers both in usage and time spent, averaging around 3.2 hours daily.
Ad Business Model Conflict & DOJ Case Holds Google Back: Google faces two major threats to its dominance: a significant antitrust lawsuit and a fundamental conflict within its business model.
- Antitrust Threat: The U.S. Department of Justice (DOJ) has secured a ruling that Google holds a monopoly in search. Potential remedies could even force Google to divest Chrome—an asset competitors like OpenAI and Perplexity are keen to acquire.
- Business Model Conflict: AI-powered answers directly threaten Google's ad-driven business model. Direct answers reduce user clicks on ads, creating a financial dilemma. Google's revenue per user from search ads significantly exceeds the $20/month subscription model for its premium AI offering, Gemini. Google Search/Ads made ~$175B in 2023. Even a 10–15% click-through rate (CTR) driven erosion could mean $17–$26B in revenue at risk annually. Early signs suggest $5–10B in annualized exposure already if current CTR trends persist.
While Google currently wears the crown, these threats have likely constrained its ability to innovate aggressively at the browser forefront. As we'll explore shortly, Gemini’s integration within Chrome already feels behind competitors, despite Google's uniquely dominant position.
Samsung<>Google: Google not only owns Google Search and Chrome but also leads the development of open-source Android, leveraging Samsung’s (and other Android OEMs’) vast global hardware footprint.
Google and Samsung maintain a mutually beneficial yet strategically cautious relationship. Samsung needs Google’s software ecosystem; Google needs Samsung’s hardware reach. Despite some competitive overlap—especially in AI, voice assistants, and app stores—both companies often find it more lucrative to cooperate than fully compete. Yet, the balance is always shifting.
Today, Google pays Samsung billions through revenue-share and licensing agreements, ensuring services like Google Search, Chrome, Play Store, and recently Gemini AI are preinstalled as defaults on over a billion Samsung devices. However, Google's grip over Samsung might be weakening.
Perplexity is reportedly in talks with Samsung to become the default AI assistant on future devices—potentially starting with the Galaxy S26 in early 2026. If successful, Gemini could lose its default position. Samsung is also negotiating a substantial investment in Perplexity’s upcoming $500 million funding round, valuing the AI startup at $14 billion.
Microsoft<>OpenAI: The Microsoft–OpenAI partnership emerged to counter Samsung–Google and other rivals in the escalating AI arms race. Microsoft invested approximately $13 billion in OpenAI, securing a deal to earn back a portion of profits—up to 49 cents on every dollar—until it recoups its investment with a capped return. In exchange, Microsoft gained exclusive licensing and resale rights to OpenAI’s models, until the OpenAI board declares AGI has been achieved which would cut Microsoft’s access to subsequent models and tech, deeply integrating them into Azure, Copilot, Bing, and Edge. This partnership provides Microsoft a powerful competitive Edge (pun intended) in the emerging AI browser war.
Apple’s iOS Control: As both OEM and developer of the browser engine and OS, Apple maintains tight control over iOS.
Until 2020, users couldn’t set a default browser other than Safari. Safari still can't be fully removed, and third-party browsers must use Apple's WebKit engine instead of their native engines (like Blink for Chromium). Apple's control has significantly benefited the company—especially through its deal with Google, reportedly worth $18–20 billion annually, to keep Google Search as Safari's default.
However, Apple's iOS restrictions—particularly on browser engines—limit innovation by others. As AI-native interfaces evolve, Apple risks falling behind unless it develops its own leading alternatives. Apple Intelligence has underwhelmed, and WWDC 2025 landed with a thud. The much-hyped Siri improvements were quietly delayed, and announcements felt incremental rather than transformative.
Apple’s reported interest in acquiring Perplexity AI for $14 billion signals a recognition of what's at stake. Such a move could strengthen Apple's AI search capabilities, reduce its reliance on Google, and inject new life into Siri, Safari, and Spotlight. However, Perplexity CEO Aravind Srinivas has expressed a desire to remain independent, making a strategic partnership—like integrating Perplexity’s technology into Apple's ecosystem—a more viable path.
iOS/Android Duopoly: For years, iOS and Android have dominated the mobile OS landscape, powering devices responsible for over 63% of global web traffic. That dominance isn't disappearing anytime soon—but it no longer feels guaranteed. Ambitious challengers are rising, and for the first time in over a decade, iOS and Android might need to watch their backs.
Deutsche Telekom, parent company of T-Mobile, is backing two experimental mobile platforms—one in collaboration with Perplexity, the other with Brain Technologies—both aiming for "app-less" experiences. Huawei is ramping up its AI-driven vision through HarmonyOS, asserting independence from Western ecosystems. Meanwhile, OpenAI and Jony Ive are teaming up to create a radically reimagined AI-native device.
AI Browser Market Map
.png)
There's a lot to unpack here, but let's zero in on our core topic; AI browsers. With its recent launch, Comet Perplexity’s new AI-first browser has taken center stage. In CEO Aravind’s own words, here's what Comet brings to the table:

It’s no surprise that Aravind starts with the omnibox. Google Search is morphing before our eyes, with AI-generated answers increasingly replacing traditional blue links. Comet, Dia, and other AI-native browsers are envisioning a single, unified omnibox for search, chat, and direct navigation. The next great browser is all about owning this one gateway to the web.
Next is the sidecar. Think of it as your typical chatbot but with deep awareness of your current tab by default. You can effortlessly @mention other tabs to shift context or highlight text to feed directly into the sidecar conversation. Want to know more about the host of a podcast you're watching on YouTube? Just ask—no need to switch tabs, copy-paste, or upload anything. To me, this felt like a significant leap beyond my current experience with chatbots. That said, I'd argue this feature is already table stakes for what I'd consider a true AI browser.
I'll skip ads for now (more on that later) and jump straight to Aravind's last point—personalization. This one's intriguing. If Comet is your default browser and you've recently had a therapy session on chatgpt.com, Comet implicitly knows all about your childhood trauma. This is because if you're logged into ChatGPT, Comet can effortlessly pull context from your past conversations. Interestingly, the reverse isn't true—web apps can't directly access browser-level context. Browsers naturally exist one layer above web apps, which is precisely why I see AI browsers quickly surpassing standalone AI chatbots.
Where I think AI browsers will differentiate is in their abilities of agentic browsing - taking actions for you on the web (e.g. clicking, scrolling, or filling forms), reliably and quickly. Comet has an early advantage here, largely due to Perplexity’s specialized Sonar models, post-trained specifically for browsing tasks.
Perplexity Comet
Comet currently supports agentic browsing in two modes: headful and headless. In headful mode, the assistant visibly takes control of your active tab, allowing you to watch step-by-step as it carries out your instructions. Headless mode is quieter—it runs discreetly in the background, performing a search or updating a shopping cart in a new tab.
But even though Comet is ahead, its current agentic capabilities don’t always impress. If I’m lazy with my prompt, the assistant often misunderstands my intent and is still not at a pace to help me with my daily tasks. That said, Perplexity keeps investing in post-training for web-specific actions (like scrolling, form-filling, interacting with logged-in sessions), skills that general LLMs lack. Progress here will be critical, perhaps decisive, in Comet’s ultimate success.
Dia
The Browser Company’s Dia - The next challenger to watch is The Browser Company’s Dia. Dia enters the AI browser race at a disadvantage in distribution, user base, and funding—especially tough when it's up against not only fast-growing startups like Perplexity but also industry giants like Google and OpenAI. Still, given The Browser Company’s impressive track record—turning Arc from a niche browser for power users into a beloved brand—and its powerhouse cap table (featuring founders of Instagram, Shopify, Pinterest, Slack, Stripe, Zoom, Figma, and Notion), Dia certainly deserves attention.
Leveraging lessons learned from Arc, Dia emphasizes minimizing user friction and onboarding complexity. Its design is deliberately familiar and minimalistic, slowly introducing sophisticated AI features as users grow comfortable. This strategy mirrors Cursor’s successful approach of transforming an established interface—like an IDE—into an AI-native experience without radically altering its core. In other words, what Cursor did to VS Code, Dia aims to do to Chrome.
The Dia team places significant weight on personalization and emotional affinity—and it shows. Right from onboarding, Dia proactively asks users about their tastes, preferences, and style, whether in writing and coding. Sidecar chat interactions then become uniquely tailored to each user. Considering how many users now treat chatbots as pseudo-therapists—engaging in hours-long conversations, seeking life advice, even sharing deeply personal secrets—Dia’s emphasis on emotional connection could help it carve out a meaningful niche.
However, unlike Comet, Genspark, and Fellou, Dia doesn’t yet support agentic browsing or any type of actions. You currently can’t instruct Dia to open tabs, click buttons, or fill out shopping carts. Since basic AI tasks—such as chatting with open tabs or question-answering—will quickly become commoditized, Dia's thoughtful UX and deep personalization alone might not move the needle significantly.
To differentiate itself further, Dia’s near-term roadmap includes launching a public “skills” marketplace. This will allow third-party developers to create specialized AI capabilities for specific tasks. Imagine creating a personalized “travel” skill preloaded with your favorite airlines and departure times. Activating it on Dia would simply require typing “/travel” into Dia’s omnibox. While this skills marketplace could evolve into a powerful differentiator, Dia hasn't clearly articulated the incentive structure for developers yet—something that will be crucial for sustained growth and engagement.
Genspark
MainFunc Inc.’s Genspark - Genspark is an early mover run by an ex-Baidu team and it's the surprise dark horse that already has significant traction and revenue - having reached $36 million ARR in under two months. Unlike many competitors narrowly focused on browsing alone, Genspark bundles its AI browser with a full productivity suite—encompassing tools like Drive, Docs, Slides, Sheets, and even integrated phone calls. This broad strategy places Genspark not just against browsers but head-to-head with productivity titans like Microsoft 365 Copilot and Google's Duet AI. While their technology might feel cutting-edge or flashier, challenging entrenched office suites directly is never easy.
Geographically, Genspark holds an advantage with strong roots in Asia (backed notably by investors in Singapore) and presumably a developing presence in the US market. There’s also potential for expansion into China, where Google's absence leaves room for competitors—and where local giants like Baidu have their own AI browser offerings. It will be interesting to watch whether Genspark strategically focuses on regions where competition is lighter.
Another big advantage for Genspark is OpenAI's explicit support. Being showcased by OpenAI suggests Genspark may enjoy privileged access to the latest models, early technical insights, or other strategic resources—an advantage that shouldn't be underestimated
Fellou
Fellou - Fellou might be the most aggressively autonomous platform relative to its size.
I’ll share my experience to explain why. To test Fellou, I prompted it to reorder the last item I'd purchased on an e-commerce site. Just a few minutes later, my mobile banking app notified me that a transaction had successfully gone through. As it turns out, Fellou had immediately one shotted my prompt and purchased the item —no additional confirmations needed—using a saved card in my profile, which I hadn't even realized was there.
Intrigued by this experience, I decided to dig deeper. Fellou’s agentic capabilities go beyond just the browser environment. It has a computer-use agent that can take actions not only within browser tabs but directly on my desktop, provided I grant permission.
Fellou is positioning itself uniquely by blending Robotic Process Automation (RPA), browser technology, and AI-driven agents into one cohesive experience. This deep level of automation could resonate strongly with certain groups: growth hackers, power users, and perhaps SMB employees who frequently juggle multiple roles.
That said, trust and safety remain major hurdles. At some point during my experience with Fellou, it asked to record my screen and control my computer directly—arguably as intrusive as it gets, and personally, that was a non-starter for me. However, purely from an engineering standpoint, I've found its automation capabilities genuinely impressive.
Opera Neon
Opera Neon – Opera brings decades of browser-building experience and an existing user base in the hundreds of millions to which they can market Neon once it’s ready; even a 1% conversion would be a huge user base compared to any startup here.
Neon isn’t on the market yet so it’s hard to comment on its value add but the challenge of Opera will be to prove that Neon is worth a subscription and figure out how to deal with a potential cannibalization issue: if Neon is amazing, why use regular Opera? Will Opera end up maintaining two browsers or migrating features across?
Another challenge is scope management: Neon is trying a lot of new ideas at once (automating browsing, cloud-made apps, etc.). Opera will need to prioritize which use-cases to perfect. It’s easy to envision cool demos (like “make a mini-game website for me”), but they must identify what actual Neon subscribers will do daily and ensure those workflows shine.


My Personal Experience: Which AI Browser I Prefer and Why
Bringing an assistant with you to every site you visit is undeniably useful. I’m highly confident that traditional browsers without any AI assistant are becoming a thing of the past. I expect all mainstream browsers to soon have at least a sidecar for AI chats by default.
I’ve tried both Dia and Comet as my default browser, and for now, I’ve ended up sticking with Dia. The main reason is that I don’t want all my sidecar AI chats to be powered exclusively by Perplexity models. While I find Perplexity useful for research, I generally prefer ChatGPT for most cases. I often found myself going to chatgpt’s website, which made Comet’s assistant feel redundant.
Another reason is that agentic capabilities felt like in their early stages. While I was impressed by Comet’s actions, I found them slow and not consistently reliable. Given that, the absence of such functionality in Dia doesn’t bother me for now.
Finally, I think the biggest area of improvement for all AI browsers is understanding user intent; specifically, figuring out when the user wants a traditional search with blue links, when they want to navigate to a site, and when they’re looking for an AI-generated answer. I found both Dia and Comet giving me AI responses when I simply wanted to run a search or go directly to a website. Interpreting user intent from a single omnibox is a hard problem, but solving it could make a huge difference.
Where is this headed?
Even with today's relatively basic "chat with your tabs" capabilities, AI browsers already represent a significant leap forward from standalone chatbots—no more copy-pasting of paragraphs and links, or uploading PDFs and screenshots.
As an early pioneer with agentic capabilities, Comet already impresses me with what it can achieve today—but what excites me even more is where all of this is headed. As compute costs continue to fall, AI assistants become increasingly hyper-personalized, and their ability to autonomously act on users' behalf rapidly expands, the possibilities begin to look genuinely transformative.
Because AI browsers sit so close to the user, they’ll increasingly have the final say in what content you see. Imagine a future where your browser doesn’t just mediate your experience of the web—it curates it. A personal “home feed” for the entire internet.
Your browser’s homepage could evolve into a unified, personalized content stream—discovering, filtering, remixing, and even transforming content from your favorite social apps, media sources, and the broader web. All of it is delivered through a customizable interface uniquely shaped by you.
AI browsers today mostly rely on paid subscriptions, with some like Comet considering usage-based pricing models. While minor sponsorship opportunities (e.g., sponsored suggested questions) could occasionally appear, Comet and others’ clear preference is charging users directly for the value provided—rather than monetizing user data or relying on ads. Depending on how these business models unfold, the algorithm powering your personalized browser feed could authentically reflect your genuine interests, values, and tastes—in stark contrast to today’s platforms, which primarily optimize for endless engagement and doomscrolling.
Computer Use as End Game
The solutions that succeed in AI tend to ride its momentum rather than resist it. When raw compute can brute-force a problem into submission, that often ends up being the winning approach.
Earlier, we outlined the world's push to make AI more personal, ubiquitous, and proactive. I see AI browsers and Anthropic’s Modular Context Protocol (MCP) as complementary but contrasting approaches to this end.
Let me explain.
Anthropic’s MCP is best described as “USB-C for AI.” It’s an open standard designed to let assistants seamlessly connect to data and apps via structured APIs. It allows AI to pull your calendar from one app, cross-reference your email in another, and book a meeting automatically. Thanks to solid timing and early ecosystem buy-in, MCP has gained real traction.
But for MCP to work, every app needs to explicitly support it. Developers must build and maintain MCP-compatible endpoints. That’s no small task—especially when there are millions of apps and little short-term incentive to comply. If an app doesn’t implement MCP, it’s simply invisible to the AI. The pipe doesn’t connect.
In contrast, browsers already are the integration layer. They don't wait on third-party teams. They run on top of existing, universal standards— the Document Object Model (DOM). Because the DOM is a text-based universal interface embedded in nearly every web page, agents can read structure, extract content, and take actions—without asking permission or needing special APIs. That’s what makes browsers so powerful: they are the de facto MCP for the entire web.
But even browsers have limits.The DOM ends where the web ends. Native apps, terminal environments, desktop software, legacy enterprise tools—none of these expose a DOM. And that’s where the browser’s reach stops.
This leads us to the next frontier: computer use agents—systems that can operate not only on structured DOMs, but literally in the pixel space, the most universal abstraction of digital life. In this paradigm, screen images become the observation space, and mouse, keyboard, and screen-coordinate actions become the action space. These agents don’t need APIs or DOM access—they simply see what we see and do what we’d do: move the mouse, click a button, type a command.
That’s the motivation behind computer use agents—models built to operate across graphical user interfaces (GUIs), just like a human would. Virtually every major AI lab have been working on this:
- OpenAI — Operator
- Anthropic — Computer Use
- Google DeepMind — Project Mariner
- Amazon — Nova Act
- Bytedance — UI-TARS
- Apple — Ferret-UI
Independent labs and startups are moving fast, too.
- General Agents’ Ace — known for high-speed desktop automation
- Adept’s Act models — now licensed to Amazon
- Simular’s S2 — an open-source, vision-driven computer-use agent
These agents are powered primarily by vision-based models. They perceive the screen visually, identify interface elements like buttons, fields, and menus—down to their pixel coordinates—and move the mouse and keyboard accordingly. Combined with planning and advanced reasoning, they can interpret high-level user goals and take step-by-step actions across extended workflows to fulfill them.
Andrej Karpathy captured the analogy perfectly:
“Projects like OpenAI’s Operator are to the digital world as humanoid robots are to the physical world. One general setting (monitor, keyboard and mouse—or human body) that can in principle gradually perform arbitrarily general tasks, via an I/O interface originally designed for humans. In both cases, it leads to a gradually mixed autonomy world, where humans become high-level supervisors of low-level automation. A bit like a driver monitoring the Autopilot.”
“This will happen faster in the digital world than in the physical world because flipping bits is somewhere around 1000x less expensive than moving atoms.”

While I don't think computer-use agents will subsume everything, I expect them to be the connective tissue that gets us to something close to AGI. Deep Research (DR) agents will scrape the web. Coding agents will write code and GUI agents will glue it all together—chaining actions across apps and modalities. They’ll be the ones to upload files, download attachments, drag & drop, use password managers, navigate clunky enterprise tools, configure desktop settings, hop between apps.
We’re not there yet.
Today, mature AI browsers like Comet, Fellou, and Genspark still rely mostly on structured, text-based methods—DOM traversal, APIs, clean HTML flows. Infrastructure projects like BrowserBase and BrowserUse do the same. If you’ve ever tested an early GUI agent like Operator or Mariner, you know the experience can be… rough. My lovely, late grandmother, bless her, would’ve outclicked both—easily.
Still, the progress is undeniable.
OpenAI’s ChatGPT Agent combines the vision-based interface of Operator with the text-based capabilities of deep research (plus a terminal) to get the best of both worlds, using reinforcement learning to decide which modality to use and when. It’s one of the first serious attempts to coordinate multiple modalities—text, vision, and code—into a single agentic system. Manus, which appears architecturally similar, likely leverages vision-based agents as well. Comet is known to use vision as a fallback in edge cases like captchas and other visually complex flows.
In fact, the line between agentic chatbots (like ChatGPT Agents or Manus) and AI browsers is already starting to blur. The underlying architecture is increasingly similar—the real difference comes down to the product surface: standalone web apps versus a home-like browser. And to me, AI browsers feel like the winning product form. They’re embedded closer to where users already work—and in software, proximity to the user tends to win.
Meanwhile, a growing ecosystem of infrastructure startups—Cua, Bytebot, Scrapybara, Haluminate—is quietly laying the picks and shovels that will make this future possible.
For simplicity’s sake, I’m rooting for “computer use” to catch on as the umbrella term for this entire space. It may be slow, awkward, and expensive today—but the need is undeniable.And if it works, it’ll be everywhere.
Who Wins: AI Labs or Browser Companies?
Chatbots and browsers may appear to be distinct product forms, but as they gain agentic capabilities including deep research, web browsing, multimodel input, coding, video/image generation their boundaries are quickly dissolving. Whether it’s ChatGPT Agent, Gemini in Chrome, Perplexity Comet they’re all converging toward a singular goal:
Be the user’s default interface for interacting with AI every day.
That means whoever wins becomes the new homepage, search engine, and productivity environment all in one. Stakes are high.
As I’ve said before, I expect AI browsers to emerge as the mainstream product form for everyday use. I wouldn’t be surprised if, two years from now, people think of “ChatGPT” as OpenAI’s browser rather than the standalone web or mobile app we use today.
That leads to the next big question: Who will win this race: AI labs or browser companies?
One could argue there’s a world where a Cursor-like player emerges; no in-house models, just a brilliant combination of off-the-shelf proprietary and open-source models, packaged into the best possible UX for human–AI collaboration inside a browser. While I won't write it off, I think the chances are very slim.
The winning AI browser will need a very broad and deep capability stack: visual UI understanding, text-based browsing, intent parsing (from both text and voice), reasoning to chain actions, a controller to decide when and how to invoke tools and more. Any breakthrough in these areas creates an outsized advantage. Given their direct access to cutting-edge models and vast compute resources, AI labs are best positioned to capitalize on such breakthroughs.
All eyes now turn to OpenAI’s rumoured browser.