#22. Agent Mode in ChatGPT: Field Notes on the EA in Your Pocket and the Business Operator It Could Become

A measured look at where ChatGPT Agent Mode delivers as a personal assistant, where it still needs handholding, and how it could eventually run core workflows.

Aug 10, 2025

Why I’m testing Agent Mode

Beyond my love of testing shiny new tech, I wanted to see where Agent Mode can credibly act like a low-cost EA for both life and work. The win would be a reduction in context switching and the ability to offload structured work so I can focus on higher-leverage decisions, i.e. the death of busy work.

Over the course of a week, I put it through real scenarios. Some it handled smoothly end-to-end, while others needed more steering than they should have. Through those tests, I started to see where it works today and how, with the right evolution, it could shift from assistant to operator.

What Agent Mode is

Think junior EA or analyst. While regular ChatGPT is step-by-step, the promise in Agent Mode is that it lets you set a goal and then it plans tasks, chooses tools, and executes end-to-end with minimal back-and-forth.

I initially found it difficult to comprehend what Agent Mode can do that the existing default and reasoning models (now GPT-5 and 5 Thinking but at the time GPT-4o, o3, etc.) cannot. Here’s the description from the OpenAI announcement that helped me understand this better:

“At the core of this new capability is a unified agentic system. It brings together three strengths of earlier breakthroughs: Operator’s⁠ ability to interact with websites, deep research’s⁠ skill in synthesizing information, and ChatGPT’s intelligence and conversational fluency.
ChatGPT carries out these tasks using its own virtual computer, fluidly shifting between reasoning and action to handle complex workflows from start to finish, all based on your instructions.”

Note that Operator was previously available only to Pro users in the US ($200/mo).

In case you missed it: here’s my previous article on supercharging my personal and productivity with AI

Findings From Experimentation

Examples of where it worked well

Personal: reservations.

“Book sushi for 2 at 7 pm Thursday, about $50 per person, vegetarian options, strong reviews.” It searched, filtered, and booked without intervention.

Prompt: Look for highly rated but affordable sushi restaurants in San Francisco, CA and make me a reservation on OpenTable for 2 people for around 7pm on either Wednesday or Thursday this week. It should be within $50 per person and offer vegetarian options.

Output: Sample work product for the above prompt to show how Agent Mode works. This is 25-30 minutes of video compressed into <2 mins.

Professional: straightforward modeling.

It returned numbers with clearly stated assumptions and some sensitivities. While not a top grade and comprehensive model, it could be another data point to consider in personal investing decisions especially as it gets better.

Prompt: Build a discounted cash flow (DCF) model in Excel for NVIDIA (NVDA) using historical financial data from its 10-K and 10-Q filings. Extract revenue, EBIT, CapEx, D&A, working capital changes, and free cash flow from the last 3 years, and forecast unlevered free cash flows for the next 5 years. Calculate the terminal value using both the Gordon Growth and Exit Multiple methods, discount all cash flows using a WACC derived from company and market data, and arrive at an estimated equity value per share. Include sensitivity analysis and clearly document all assumptions. Format the model with clean, professional styling: consistent fonts, blue for inputs, black for formulas, and use separate, clearly labeled tabs for assumptions, calculations, outputs, and charts—as expected in a top-tier investment banking presentation.

Examples of where it was just okay

Professional: first-pass sourcing.

It returned names, roles, and companies that were broadly on target. However, for the same JD, candidates were technically relevant but mis-leveled. Some too senior, others too junior.

Prompt: You are an Executive Recruiting Sourcer. For a series B consumer marketplace company looking for a Chief Product Officer with prior experience with rapid growth, a team of 30 reportees, and whitelabeling, find 50 ideal candidate profiles (titles, companies, public LinkedIn summaries) and group them by tier 1/2/3 priority. Draft personalized outreach templates for each tier. Output: CSV + email draft + recommended outreach sequence.

Professional: market-sizing or competitive analysis type problems

Prompt: You are a Venture Capital Analyst. For AI-driven customer support automation, estimate TAM, SAM, SOM using bottom-up and top-down approaches. Include key assumptions, reference data sources, and a concise 2-page memo with charts. Output also includes a .xlsx with editable data and one-sentence key takeaways for a pitch deck.

Examples of where it failed

Personal: trip planning

“Plan a December week in Seattle with constraints.” It missed half the constraints I added, seasonal context, and pacing, and the output formatting was terrible.

Professional: layered research and analysis.

“Map the semiconductor supply chain and run a DCF for each firm.” It returned a jumble set of numbers and charts which wasn’t usable.

Strengths I’m seeing

Complex multi-step workflows that break a goal into sensible subtasks.
Tool orchestration across search, code, and file parsing in one flow.
Data-heavy work: cleaning, formatting, analysis, and basic visualizations.
Knowledge synthesis from multiple sources into a single answer.
“AI intern” tasks: competitive analysis, draft briefs, notes, and outlines without constant supervision.

Limitations to factor in

Judgment-heavy or context-specific calls still need human guidance.
Ambiguous goals cause drift or over-engineering.
Real-time interaction is slow. A restaurant booking took ~30 minutes that I could have done in 5.
Structured outputs vary. Markdown and CSV are fine for light content, but heavier presentations and large CSVs can be inconsistent.

Early take, where I hope it goes, and why it matters for growth, retention, and loyalty.

Today, Agent Mode feels like a capable junior EA. It’s good for well-scoped, multi-step execution: it can book dinner, compile lists, and run single-threaded research with minimal oversight. But the real opportunity lies in chaining these capabilities into multi-step, cross-functional workflows.

For example, imagine Agent Mode embedded inside a loyalty program stack, automatically identifying at-risk cohorts, pulling the latest engagement data, drafting retention offers, and pushing them live, all before a churn report even lands on your desk. That’s the leap from “assistant” to “operator.” Agent Mode isn’t there yet, but experimenting with it now builds the muscle memory to spot those leverage points early. When it can run a retention campaign as easily as it books a sushi reservation, the organizations ready to plug it in will have a compounding advantage.

Looking forward to seeing what it can do in a few months time!

Be One Percent Better

Discussion about this post