At Computex Taipei 2026, Perplexity announced a new feature: Perplexity Computer for Windows, scheduled for release in July. This system automatically determines which parts of an AI task run locally and which are handled by cloud-based models, eliminating the need for users to manually switch modes.
Sensitive content is processed locally first.
This solution was jointly announced by Perplexity CEO Aravind Srinivas and Intel CEO Liwu Chen. The company calls it a hybrid local-server inference orchestration system, which focuses on handling privacy, performance, and computing costs within the same process.
Perplexity states that data such as financial records, health information, and personal documents are best handled by a lightweight model on the device, which determines whether to keep them locally. Data requiring more complex reasoning is then sent to a larger model in the cloud for processing.
According to the company, tasks such as document summarization, text formatting, and lightweight classification can be completed locally; complex reasoning is handled by the server. The entire process switches automatically during task execution, minimizing the user's awareness.
However, this does not mean that Perplexity offers users a fully controllable offline model. The local components remain a compact model of Perplexity integrated into the application, and the cloud portion still runs through the Perplexity server; therefore, it cannot be considered a completely offline solution.
Cost pressure is an important background factor.
In an interview during Computex, Srinivas stated that the goal of AI systems should be to provide higher "value per watt" for each user, rather than concentrating all computing power on servers and the largest models. He noted that some companies are already spending hundreds of millions of dollars per month on computing power.
Perplexity previously disclosed that its revenue had increased from $100 million to $500 million, while its headcount had only grown by 34%. In this context, offloading some inference workloads to user computers can directly reduce cloud computing costs.
This is also one of the key reasons why the AI industry is currently pushing for edge-based inference. For businesses, local operation reduces server costs; for users, it means that some sensitive data does not have to leave the device.
The industry is shifting towards end-side and hybrid models.
Currently, many technology companies are advancing local or hybrid inference. Apple is performing some sensitive processing on its local chips; Microsoft's Foundry Local, which officially became available in April of this year, supports local AI inference on Windows, macOS, and Linux.
NVIDIA also launched RTX Spark at Computex, targeting local large model inference on laptops and desktop devices. In contrast, Perplexity's difference lies not in the model itself, but in the scheduling layer: the system determines the division of labor between local and cloud in real time based on the task, rather than allowing users to pre-select.
Perplexity stated that this feature is not limited to Intel chipsets. While the demonstration used an Intel Core Ultra Series 3 processor, Nvidia processors are also supported. Currently, the feature is only confirmed to launch initially on Windows PCs; release dates for other platforms have not yet been announced.












