Your Security Team Knows Exactly What to Do. That Was Never the Problem.
I've been working in cybersecurity for more decades than I would like to admit, before we even called it cybersecurity. Across startups, public companies, and most recently leading product at Cohesity. Over the years I've spent a lot of time with very large organizations, Fortune 100 companies and enterprises with tens of thousands of employees, trying to understand why security and IT operations remain so stubbornly hard. I want to share some of what I've learned, because I think it explains why we built Surf.AI and what we're actually trying to solve.
The thing I want to start with is this: we did not invent automation. Automation has been around forever. Security and IT teams have always known how to disable an account, revoke a token, rotate a certificate. The actual operations — what I sometimes call the last mile — are well understood. Everybody can do them. That was never the problem.
The problem lives in everything that comes before.
The space between knowing and doing
Let me try to explain what I mean. In most security workflows, the action itself is binary. If something happens, do this thing. That logic is straightforward. But the real world is not binary. Between "something happened" and "do this thing" there is a vast gray area filled with questions that are genuinely hard to answer. What depends on this system? Who owns it? What breaks if I touch it? Is the person I need to talk to even still at the company?
I think of this sometimes as a ripple effect. You know the server certificate needs to be updated. But you also know, or more precisely you suspect but can't confirm, that updating it might cascade into something much worse. Maybe it takes down a production environment. Maybe it disrupts a workflow that someone built years ago for reasons nobody remembers. The human instinct in this situation is entirely reasonable: you want to be confident in what you're doing before you do it. You want to understand the impact. And when you can't understand the impact, you don't act.
This creates what I think of as a deeply unfortunate equilibrium. Security teams know that not acting degrades their security posture. They know the backlog is growing, that unaccounted for cloud resources systems and orphaned accounts and misconfigured certificates are accumulating risk. But they also know that acting without sufficient context could make things worse. So they're caught in a kind of paralysis, aware of the problem, capable of the fix, but unable to move because the space between knowing what to do and knowing what happens when you do it is too wide.
The result, across years and decades, is bloat. While pile up. Security posture degrades slowly and then suddenly. And the organization becomes more vulnerable in a way that's hard to even measure, because the risk isn't in any single unresolved item — it's in the aggregate.
Why scale makes it worse
This dynamic gets worse as organizations get larger, and I think it's worth explaining why, because it's somewhat counterintuitive. You might expect that bigger companies, with more resources and more sophisticated teams, would be better at this. In my experience, the opposite is true.
In a small company, context is ambient. You know who set up the server. You can walk over to their desk and ask them why. The organizational graph — who owns what, who depends on what, why decisions were made — is small enough to fit in people's heads.
In a large enterprise, especially one that's grown through mergers and acquisitions over 25 or 50 years, that ambient context disappears entirely. The IT organization alone might be hundreds or thousands of people. The engineer who made a critical architectural decision left two decades ago. The documentation, if it ever existed, is fragmented across systems that were never designed to talk to each other. We work with companies like this today — organizations with 50,000 to 75,000 employees, decades of accumulated infrastructure, and security teams that are extraordinarily capable but simply cannot move at the speed the environment demands. What remains is something I've heard described as tribal knowledge: the kind of understanding that exists only in the patterns of who opens tickets, who closes them, who always seems to know the answer when something goes wrong. It's the real fabric of how the organization operates, and it's almost entirely undocumented.
This is, I think, the core of what makes security and IT operations at scale so difficult. It's not a tooling problem in the traditional sense. It's a context problem. And it's a context problem that gets exponentially harder as the organization grows.
What automation got wrong
Now, the security industry has not been unaware of this. There have been many attempts to solve it, and I want to be honest about what those attempts looked like and why I think they fell short.
The dominant paradigm for a long time has been workflow automation, what the industry calls low code or no code platforms. The promise is appealing: build automated workflows without needing to write code, and let them handle the repetitive operational work. But when you look under the surface of most of these systems, what you find is scripts. Defined, brittle sequences of steps that work well in controlled conditions and break the moment reality deviates from the script.
Scripts are inherently binary. They encode the if-then logic I described earlier, and they do it well. What they cannot do is navigate the gray area. They can't reason about impact. They can't adapt when the data changes or when the environment looks different than it did when the script was written. They are, in a fundamental sense, the wrong tool for a problem that is defined by ambiguity and context-dependence.
Context and action, tightly coupled
This is the background against which we built Surf.AI, and I want to be precise about what we're doing, because I think the distinction matters.
There are, broadly, two categories of capability that matter here. The first is context: understanding how an organization actually works, not how it's supposed to work on paper, but the real operational reality of who owns what, what depends on what, what the downstream effects of any given action might be. The second is action: the ability to actually do something about what you've found, to execute the remediation or the change.
Many products in the market offer one or the other. There are excellent tools that map your environment, that build rich graphs of your infrastructure and its dependencies. And separately, there are agentic AI tools that can execute tasks, that can take actions in your environment. But they exist independently, and that independence is, I believe, the fundamental limitation.
A context tool that can't act is a very sophisticated way of telling you about problems you still can't fix. An agentic tool that can't reason about context is, frankly, dangerous. It can do things, but it doesn't know what happens when it does them. It reproduces the exact uncertainty that was causing the paralysis in the first place, just at machine speed.
What we've built at Surf.AI is the tight coupling of both. Our context graph learns the documented and undocumented reality of your organization, from your ITSM activity, your infrastructure relationships, the patterns of how work actually gets done. And our agentic layer can take action, but it does so with the full weight of that context informing every step. Each side makes the other useful. Neither would be sufficient alone. I think of it as a vertically integrated approach to a problem that has historically been addressed in fragments.
Agentic doesn't mean autonomous
I want to address something directly, because I think it's on everyone's mind when they hear the words "agentic AI" applied to security infrastructure. The concern is reasonable: if these agents are adaptive, if they're not just following scripts, how do I know what they're going to do? How do I trust them?
We thought about this a lot, and I think the answer is not to ask people to trust the agents. The answer is to keep people in control.
The way I describe it to our customers is this: think of it as being promoted. You become a manager of a squad of agents that works on your behalf. They do the legwork: the research, the dependency mapping, the impact analysis, at a scale and speed that no human team could match. But they work for you. You can insert yourself as a human in the loop at any step. You get comprehensive audit logs and full visibility into what they're doing and why. The workflow is defined; within each step, there's the flexibility that AI provides, but you always know where you're starting and where you're going to end up.
This balance between the adaptability that makes agentic AI useful and the control that makes it trustworthy is not a feature we added on top. It's a design principle that runs through everything we've built.
The window is closing
I'll close with the thing that gives this work its urgency. For a long time, the paralysis I described was a slow-burning problem. Security posture degraded gradually, and while that was bad, the consequences accumulated over years.
That's no longer the case. Threat actors have adopted AI to scale their capabilities in ways that are genuinely unprecedented. The gap between attacker sophistication and defender agility is not holding steady. It's widening, and it's widening fast. The gray area that organizations have been unable to navigate is now being actively exploited by adversaries who are moving quicker than ever before.
I've been saying a version of this for years: you always wanted to solve this problem, but you couldn't. The technology wasn't there. Now, with the advances in AI that have emerged over the past few years, you can. And increasingly — because the other side is using these same advances — you must.
We're working with large enterprise customers today. Manufacturing companies, real estate firms, distributed healthcare networks, organizations with exactly the kind of complexity I've described. In some cases, we've helped them recover hundreds of thousands of dollars in wasted licenses within weeks, simply by giving them the context they needed to act on what they already knew was a problem. We're seeing this play out in practice. I'm looking forward to sharing more about what we're learning, both the technical details and the broader questions about what autonomous security operations look like, in future posts.
For now, I just wanted to lay out the problem as clearly as I could, because I think understanding the problem is most of the way to understanding why we built what we built.
Elad Horn is Co-Founder & Chief Product Officer at Surf AI, bringing deep expertise in cloud security and product leadership from previous roles including VP of Cloud Security Products at Proofpoint, VP of Products at enSilo (acquired by Fortinet), and Head of Product at Cohesity.
