Recently, I’ve been busy with the local deployment of Large Language Models (LLMs), sharing my insights in various groups. People often ask why I’m so keen on deploying AI locally. My straightforward response is:
This need is indeed genuine, but my reasons for local AI deployment extend beyond this. However, this particular need arises because it’s restricted by various online models, making it my first and most pressing requirement for local deployment.
In reality, in my day-to-day use, what actually diminishes my user experience are scenarios like:
- ChatGPT is great, but its web browsing function is tied exclusively to Bing. As a search engine, Bing is significantly lacking, especially for Chinese language searches. Using ChatGPT Plus for searches feels like asking a college student to find answers in ancient bamboo scrolls.
- Bard’s search functionality is superior, thanks to its reliance on Google. So, although its model might not match GPT’s capabilities, using Bard feels like having a high school student Google things for me – definitely better than a college student rummaging through bamboo scrolls. However, it strictly adheres to Robots.txt, limiting webpage access. This means I can’t ask Bard to summarize an article from a WeChat public account.
- Grok has its clear advantages, as it can directly pull search results from X (Twitter). This means its external knowledge base is updated by the minute, allowing you to get answers about very recent events. But I’m not keen on paying a separate monthly fee just for this advantage.
- Wenxin Yiyi (a Chinese AI model) also has its strengths, as it can access Baidu’s hot searches, making it more responsive to domestic trending events. It appears that every tech giant, or online AI, has specialized based on its own business interests or advantages.
However, these “advantages” are essentially formed by creating “disadvantages” for others. ChatGPT’s search isn’t as effective as Bard’s because if ChatGPT were to integrate Google search, it would incur hefty search API fees. Similarly, Musk would likely set a prohibitive price for X (Twitter)’s API to ensure that Bard and ChatGPT can’t compete with Grok in this arena.
If we look at the Chinese internet landscape, the battle intensifies further, and it’s not even about money anymore. Baidu would never share its search interface with Tencent’s Hunyuan, and Tencent wouldn’t allow Wenxin Yiyi to access WeChat public account articles.
This reflects a significant outcome of the past 20 years of internet development: a cyberspace dominated by a few monopolies, each operating in its own silo, unwilling to acknowledge the others.
In the era of mobile internet, users have become accustomed to the way businesses seem to disregard each other at the product level. Scannable QR codes are blocked, links are not directly clickable and must be copied, and even sending links is restricted, requiring users to copy and share “passwords” or use coded messages in comment sections to send private messages.
It’s not as if I’ll stop watching videos on Douyin just because I can’t open its links in WeChat. Nor will I stop shopping on Taobao simply because Xiaohongshu (Little Red Book) doesn’t allow Taobao QR codes.
The Maginot Lines established by these monopolistic giants have, for a long time, only served to inconvenience users. Their primary function in the business world has been to act as a defense against ‘sudden raids’ – preventing competitors from poaching users directly from their apps, which could lead to a significant shift in user loyalty in a single day.
However, this approach has significantly shackled their own AI developments.
When we talk about this round of AI advancements, what users envision is AGI – Artificial General Intelligence. They expect an entity that can operate in cyberspace just as a human would. For instance, if I can read an article on WeChat, my AI should be able to as well; otherwise, it’s of no use to me.
This means if these tech giants can’t break down their commercial barriers to let their AIs operate on each other’s platforms, all their online models – whether it’s GPT, Gemini, or Wenxin Yiyian – will lose out to open-source local models or third-party models.
Let me illustrate this with a specific scenario: planning a trip.
Those who frequently travel for business or leisure know that planning an itinerary is often the most daunting task. With the rise of AI, not just users but many entrepreneurs have wondered if AI could help us customize travel plans with a single click, even using APIs to directly book flights, hotels, and attraction tickets.
However, the reality is that such AI is most likely to be developed by travel platforms like Ctrip or Feizhu. The reason is the same as the advantages I listed for the current giants’ AI. Only Ctrip and Feizhu have access to real-time updates of flight and hotel databases and the ability to complete “booking” operations directly within their systems.
From their business perspective, neither Ctrip nor Feizhu would ever offer this kind of data through an API to independent entrepreneurs.
But the question remains: if Ctrip or Feizhu were to launch a travel AI capable of generating itineraries and processing bookings in a conversational manner, would I use it?
The answer is no. Why? Because as a user, I seek flexibility and a wider range of options. These AI systems, tied to their parent companies, would inherently limit my choices to what’s available in their own databases. This limitation undermines the very essence of a comprehensive, user-centered AI solution.
Planning a trip also involves a preliminary step: researching the destination on Xiaohongshu (Little Red Book) to find out what’s interesting there. Once I’ve decided what to do, I often switch back and forth between Feizhu and Ctrip to ensure I’m getting the best prices. Sometimes, I even use Baidu Maps to figure out the distance and transportation options between various attractions within a destination city to decide which ones to visit and in what order.
It’s clear, then, that the AI tools developed by Feizhu and Ctrip can’t possibly provide a personalized, comprehensive travel planning experience for consumers.
At best, they can help users conduct more intuitive searches on their platforms. But for those who travel or go on business trips frequently, this might be less effective than directly using structured searches with specific filters. Not to mention, I also want to compare prices between Ctrip and Feizhu, which is something a single platform’s AI simply can’t do.
In essence, while these individual AI tools can offer some convenience, they fall short in delivering a fully integrated and personalized travel planning experience. The ability to seamlessly integrate information and services from various platforms remains a significant challenge, one that a single-platform AI cannot yet overcome. This highlights the need for more collaborative and open systems in the realm of AI-driven travel assistance.
In the context of travel planning, the kind of Artificial General Intelligence (AGI) you’re describing would indeed revolutionize the process. Here’s how this AGI would ideally function:
- Initial Inquiry: You ask the AI for travel destinations with fewer people during the Spring Festival.
- Comprehensive Research: The AI searches platforms like Douyin and Xiaohongshu for terms like “Spring Festival”, “offbeat”, and “travel cities”. It then compiles a list of potential destinations and presents them to you.
- Refinement and Specifics: You express interest in Yiwu, Anshan, and Huainan. You request more information about what these cities have to offer.
- Detailed Search and Presentation: The AI conducts detailed searches on Xiaohongshu for each of the three cities and provides you with a more detailed introduction to each.
- Decision and Logistics: You decide on Yiwu. The AI then checks Ctrip for flight prices and schedules. It gathers information about attractions in Yiwu, obtains their operating hours from Dazhong Dianping, and calculates travel times between them using Baidu Maps to arrange a daily itinerary.
- Iterative Customization: Through several rounds of dialogue, you fine-tune the itinerary and attractions (for instance, mentioning a preference not to start activities early in the morning).
- Final Itinerary Creation: The AI develops a final travel plan based on your preferences.
- Confirmation: You review and confirm the itinerary.
- Booking: The AI proceeds to book the necessary arrangements.
Reflecting on this analysis, it becomes apparent that no single internet giant, especially those in oligopolistic positions, can successfully launch such an AI. The reason is straightforward: competitive dynamics would lead to mutual blocking. For instance, if Ctrip developed this AI, Feizhu might block it, and vice versa. If Xiaohongshu created it, Dazhong Dianping might block it.
So, what’s the feasible solution?
The only viable option seems to be an AI Agent that is either locally deployed on a user’s device or is independent of any major tech giant. This AI would simulate user interactions – clicks, swipes, visual processing – operating above and beyond the constraints of any single app or website, ignoring the barriers erected due to business competition.
It could even manage tasks like copying a Douyin link from WeChat, opening Douyin, and navigating to the link, rather than just directly clicking on it. Essentially, it would do anything a user can do.
Technically, this isn’t far-fetched. Li Feifei once developed a prototype called VoxPoser that used Large Language Models (LLMs) to control robotic arms in the physical world. Manipulating virtual environments is simpler than physical ones; it’s essentially about directing software to perform certain tasks.
Especially with the introduction of features like Function Call in GPT-4 Turbo, creating a demo might already be feasible. However, the overall workload would be significant, potentially making each request costly. And cost has always been a major barrier for Agent-type AI.
But, if we optimistically assume that AI’s cost-efficiency improvements continue at the 2023 rate for another 2-3 years, then cost may not be an issue.
Legal and compliance issues, however, could be a concern, especially if the product relies on centralized MaaS services like those provided by OpenAI. It might face prohibitions. Therefore, the best outcome might be advancements in end-user hardware and further optimization of AI models, enabling locally-deployed large models on PCs to achieve effective Agent capabilities.
In such a scenario, the long-standing oligopoly of the internet might be fundamentally disrupted. And currently, it’s hard to foresee how the tech giants could counter such a development effectively.