Evaluating the Use of LLM Agents to Provide Better Software Security

November 11, 2024

As large language models (LLMs) continue to take on complex tasks previously done by humans—analyzing RNA for vaccines, writing software code, generating news articles, and much more—how does the technology fare at preventing cyberattacks on critical infrastructures like financial institutions or energy grids?

LLMs are already being used to launch cyberattacks, with cybercriminals exploiting malicious inputs to LLMs to generate malware, create phishing emails and phishing sites.

Other AI-infused technologies, known as LLM agents, are even more powerful than just LLMs alone, with the large language model serving as the core language engine, and additional computational enhancements making the “agent” more powerful and versatile.

Security experts are just now starting to use LLM agents to counter cyberthreats. They would benefit immensely from a comprehensive set of benchmarks that can validate the technology’s efficacy at any level, whether it’s protecting your laptop from teen hackers or safeguarding financial systems and public utilities from foreign adversaries.

A University of Maryland cybersecurity expert is hoping to give these validation efforts a boost, working to develop end-to-end benchmarks as well as developing state-of-the-art LLM agents to perform a complete cyberdefense workflow process—from vulnerability detection, to analysis, to software patching.

Assistant Professor of Computer Science Yizheng Chen is principal investigator of the two-year project, funded by a $1.7 million award from Open Philanthropy, a grantmaking organization that aims to use its resources to help others.

LLM agents are like the robots inside the computer, Chen says. They can use software tools, take actions, self-reflect, interact with the environment, and maintain long-term memory. They are designed to exhibit more autonomous and goal-oriented behavior, providing greater cyberdefense capabilities.

The key to her project, Chen explains, is to build very difficult cybersecurity benchmarks that the LLMs have not been trained on, avoiding the possibility of LLM agents memorizing the solutions.

Click HERE to read the full article

The Department welcomes comments, suggestions and corrections. Send email to editor [-at-] cs [dot] umd [dot] edu.