Beware the molting AI Agents: When AI Agents Build Their Own Skills
We are witnessing a quiet but fundamental shift in how we interact with artificial intelligence. For the last few years, the mental model has been "Prompt \to Response." You type into a box, the model generates text, and the interaction ends.
But a new paradigm is emerging, one that moves beyond Generative AI to Agentic AI. In this model, the AI doesn't just "generate"; it runs continuously. It has a gateway to the outside world, and critically, it has what we might metaphorically call "hard drive access"—the ability to read, write, execute code, and persist information locally or in a cloud environment.
This isn't just about a chatbot having a better memory. It’s about AI agents that can build, download, and execute their own "skills."
The Vision: A History and Memory of Skills
The most compelling promise of this paradigm is the transition from knowledge to know-how.
A traditional LLM has read the internet, so it knows how to use the Python Pandas library. But it doesn't have the library installed, nor does it have the environment to run it. It’s a librarian who has read every book on surgery but has never held a scalpel.
In the Agentic paradigm, an AI agent can:
* Identify a gap: "I need to analyze this CSV file, but I don't have a tool for that."
* Acquire the skill: It can write a script (a tool) or download a verified "skill" (e.g., via the Model Context Protocol or similar standards) to handle that task.
* Retain the memory: Crucially, it saves this tool to its "hard drive" (its persistent storage). Next time you ask for analysis, it doesn't hallucinate a method; it simply calls the tool it already built.
This allows agents to acquire a history of skills. Over time, your agent becomes distinct from mine not just because of its personality settings, but because of its toolbox. My agent might have accumulated a deep library of SEO and data visualization scripts, while yours has built a suite of home automation and scheduling hooks. They become specialized workers rather than generic models.
The Issue: The "Skill" as a Backdoor
However, giving an AI agent "hard drive access" and the autonomy to "download skills" introduces a massive attack surface.
In the old model, if you tricked an LLM into writing malware, you still had to copy-paste it and run it yourself. The "air gap" was you. In the Agentic model, the agent is the runtime environment.
1. The Trojan Skill
If agents share skills (a likely future, envisioning a "GitHub for Agent Skills"), a hacker doesn't need to hack your computer directly. They just need to publish a useful skill—say, a "Stock Market Analyzer"—that contains a dormant backdoor.
We are already seeing this in the wild. A real-world example recently surfaced in the community (discussion here), highlighting how Marketplace Skills can hijack dependencies. Researchers demonstrated a vulnerability in Agentic IDEs (specifically within the Claude Code ecosystem) where an agent attempts to set up a project and looks for "helper skills." A malicious actor can publish a skill that claims to help with installation but actually silently alters the project's dependency tree—for example, swapping a legitimate library like httpx for a compromised version. Because the agent has valid system permissions to "install tools," it happily invites the vampire into the house.
2. Logic Layering and "Dependency Hell"
The second issue is more subtle but perhaps more pervasive: Logic Layering.
You mentioned the risk of skills being "counterproductive." In software engineering, we call this dependency hell or interaction effects, but it's worse with AI.
Imagine your agent downloads Skill A (which optimizes for speed) and Skill B (which optimizes for accuracy).
* Skill A decides to delete "unnecessary" cache files to save space.
* Skill B was using those cache files to verify data accuracy.
Suddenly, your agent is failing at tasks, and it doesn't know why. The logic of the skills is invisible to you, and the agent is just following the instructions of two conflicting sub-routines.
Furthermore, if agents start building skills on top of other skills, we create a "house of cards" of logic. If the foundational skill has a slight bias or error (e.g., it always rounds numbers down), every skill built on top of it will inherit and amplify that error. We risk creating a web of automated behaviors that are impossible to debug because no human ever wrote the code—machines wrote snippets based on other machines' snippets.
AI Fix for AI Problems
The move toward agents with "hard drive access" is inevitable because the utility is too high to ignore. We want agents that can do things, not just say things.
But this requires a shift in how we view AI safety. We can't just align the model (the brain); we have to secure the environment (the body). We will likely need:
* Sandboxed Runtimes: Agents should never run skills on your actual OS; they should run them in ephemeral, disposable virtual machines.
* Skill Signatures: A system where skills are cryptographically signed and verified, much like app store apps today.
* Transparency Logs: A clear, human-readable log of exactly what tools the agent used and why, so we can untangle the logic when things go wrong.
We are handing the keys to the machine. We just need to make sure it doesn't accidentally (or intentionally) change the locks on us.
The solutions to secure this future are already being developed, but the real vulnerability might be our own enthusiasm. In the era of "vibe coding," it is dangerously easy for developers on the forefront to get swept up and grab a flashy new "skill," trusting a repository just because the README looks clean. The challenge isn't just making these agents work—it's resisting the urge to blindly trust the tools we use to build them.
We need to construct the airlock, but we also need to check the badge of everyone walking through it.
Member discussion