From 44d84528362c13d769b2a06a42bf113a794fe4f3 Mon Sep 17 00:00:00 2001 From: csf123321 Date: Sun, 10 May 2026 21:11:15 +0800 Subject: [PATCH] add framework docs, AI acknowledgment, fix theme order, switch to IEEEtran Co-Authored-By: Claude Sonnet 4.6 --- main.tex | 18 ++++++++++-------- references.bib | 31 +++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 8 deletions(-) diff --git a/main.tex b/main.tex index e8e00c7..01aea3f 100644 --- a/main.tex +++ b/main.tex @@ -22,7 +22,7 @@ \email{u28sc22@abdn.ac.uk} \begin{abstract} -Agentic AI systems---large language models embedded within autonomous execution loops that perceive, plan, invoke tools, and revise behaviour---are reshaping how software is designed and built. This paper presents a systematic literature survey of 13 peer-reviewed and widely cited papers (2023--2026) on the design of software systems incorporating agentic AI. The survey organises findings into four themes: foundational architectures and taxonomies, multi-agent frameworks and coordination, applications across the software engineering lifecycle, and planning/reasoning/tool-use mechanisms. A critical analysis identifies hallucination and reliability, evaluation fragmentation, coordination scalability, and governance as the principal open challenges. Future directions include hybrid neuro-symbolic architectures, lifecycle-spanning benchmarks, persistent long-horizon memory, and principled human-agent collaboration models. +Agentic AI systems---large language models embedded within autonomous execution loops that perceive, plan, invoke tools, and revise behaviour---are reshaping how software is designed and built. This paper presents a systematic literature survey of 13 peer-reviewed and widely cited papers (2023--2026) on the design of software systems incorporating agentic AI. The survey organises findings into four themes: foundational architectures and taxonomies, multi-agent frameworks and coordination, applications across the software engineering lifecycle, and planning/reasoning/tool-use mechanisms. A critical analysis identifies hallucination and reliability, evaluation fragmentation, coordination scalability, context window limits, and governance as the principal open challenges. Future directions include hybrid neuro-symbolic architectures, lifecycle-spanning benchmarks, persistent long-horizon memory, and principled human-agent collaboration models. \end{abstract} \keywords{agentic AI, software system design, large language models, multi-agent systems, autonomous software engineering} @@ -54,6 +54,8 @@ A systematic search was conducted across IEEE Xplore, the ACM Digital Library, a The initial search returned over 200 candidates. After de-duplication and title-and-abstract screening, 13 primary papers were retained and grouped into four thematic clusters as described in Section~\ref{sec:themes}. +\textbf{Use of AI-assisted tools.} DeepSeek was used as a supplementary aid for literature organisation and error checking in accordance with the course guidelines. All paper selection, critical analysis, and editorial judgement are the author's own. + \section{Thematic Overview} \label{sec:themes} @@ -87,20 +89,20 @@ Planning, Reasoning \& Tool Use \subsection{Foundations and Architectures of Agentic AI Systems} -The foundational literature establishes the conceptual vocabulary and architectural patterns that the rest of the field builds upon. Abou Ali and Dornaika \cite{abuali2025agentic} introduce a \emph{dual-paradigm} framework that separates \emph{symbolic/classical} agents (relying on deterministic planning and persistent state machines) from \emph{neural/generative} agents (driven by stochastic generation and prompt-based orchestration). Wang et al.\ \cite{wang2024survey} propose a unified architectural model centred on three sub-systems: a \emph{brain} (the LLM), a \emph{perception} module, and an \emph{action} module. Arunkumar et al.\ \cite{arunkumar2026architectures} extend this by decomposing the brain into Planning, Reasoning, and Memory components. The framework survey by Derouiche et al.\ \cite{derouiche2025frameworks} maps these abstractions onto concrete open-source frameworks---AutoGen, LangGraph, CrewAI, and MetaGPT---analysing their design trade-offs. +The foundational literature establishes the conceptual vocabulary and architectural patterns that the rest of the field builds upon. Abou Ali and Dornaika \cite{abuali2025agentic} introduce a \emph{dual-paradigm} framework that separates \emph{symbolic/classical} agents (relying on deterministic planning and persistent state machines) from \emph{neural/generative} agents (driven by stochastic generation and prompt-based orchestration). Wang et al.\ \cite{wang2024survey} propose a unified architectural model centred on three sub-systems: a \emph{brain} (the LLM), a \emph{perception} module, and an \emph{action} module. Arunkumar et al.\ \cite{arunkumar2026architectures} extend this by decomposing the brain into Planning, Reasoning, and Memory components. The framework survey by Derouiche et al.\ \cite{derouiche2025frameworks} maps these abstractions onto concrete open-source frameworks---AutoGen \cite{autogendocs}, LangGraph \cite{langgraphdocs}, CrewAI \cite{crewaidocs}, and MetaGPT---analysing their design trade-offs. \subsection{Multi-Agent Frameworks and Coordination} Once individual agent architectures are established, a natural extension is composing multiple agents into collaborative systems. He, Treude, and Lo \cite{ishibashi2024multiagent} provide a literature review of LLM-based multi-agent (LMA) systems within the software development lifecycle, identifying coordination and trust challenges that arise when agents take on specialised roles. Rajendran et al.\ \cite{ieee2025multiagent} present a conceptual framework for software design and refactoring using auction-based task allocation and consensus protocols to manage agent disagreement. Becattini, Verdecchia, and Vicario \cite{sallma2025} address the architectural layer directly with SALLMA, a reference software architecture that specifies interfaces, shared state management, and real-time agent communication. -\subsection{Tool Use, Planning, and Reasoning} - -The internal mechanisms that allow agents to decompose goals and invoke external resources are surveyed by Masterman et al.\ \cite{masterman2024landscape} and Wang et al.\ \cite{wang2025aiagenticprogrammingsurvey}. Masterman et al.\ examine single-agent and multi-agent implementations and identify three critical phases---\emph{planning}, \emph{execution}, and \emph{reflection}---present in robust systems. Wang et al.\ \cite{wang2025aiagenticprogrammingsurvey} focus on \emph{agentic programming} as an emerging paradigm in which agents autonomously iterate on a task. Park et al.\ \cite{park2023generative} provide a foundational empirical study demonstrating that architectures combining memory retrieval, reflection, and planning can produce coherent long-horizon behaviour. - \subsection{Applications in Software Engineering} Three papers evaluate agentic AI directly against software engineering tasks. Jin et al.\ \cite{jin2024llmagents} conduct a broad survey covering six SE domains, establishing clear distinctions between standalone LLMs and agent-based systems in terms of autonomy and self-improvement. Liu et al.\ \cite{liu2024llmse} categorise 124 papers from both the SE and agent-capability perspectives, showing that tool-augmented agents consistently outperform standalone models. Jimenez et al.\ \cite{jimenez2024swebench} introduce SWE-bench, a benchmark of 2,294 real-world GitHub issues drawn from 12 Python repositories, providing the field's most widely used empirical measuring stick. +\subsection{Tool Use, Planning, and Reasoning} + +The internal mechanisms that allow agents to decompose goals and invoke external resources are surveyed by Masterman et al.\ \cite{masterman2024landscape} and Wang et al.\ \cite{wang2025aiagenticprogrammingsurvey}. Masterman et al.\ examine single-agent and multi-agent implementations and identify three critical phases---\emph{planning}, \emph{execution}, and \emph{reflection}---present in robust systems. Wang et al.\ \cite{wang2025aiagenticprogrammingsurvey} focus on \emph{agentic programming} as an emerging paradigm in which agents autonomously iterate on a task. Park et al.\ \cite{park2023generative} provide a foundational empirical study demonstrating that architectures combining memory retrieval, reflection, and planning can produce coherent long-horizon behaviour. + \section{Detailed Discussion} \subsection{Foundations and Architectures} @@ -109,7 +111,7 @@ The dual-paradigm framework of Abou Ali and Dornaika \cite{abuali2025agentic} re Wang et al.\ \cite{wang2024survey} complement this with component-level analysis. Their architecture positions the LLM as a central reasoning engine. Memory is divided into \emph{in-context} (working) memory and \emph{external} memory (vector databases, knowledge graphs)---a distinction with direct engineering implications: in-context memory is bounded by the model's context window, while external memory scales arbitrarily but introduces retrieval latency and recall errors. -Arunkumar et al.\ \cite{arunkumar2026architectures} extend the taxonomy to evaluation, arguing that agents should be assessed across all five architectural layers rather than solely by task completion rate. The authors document how early agent loops such as ReAct adopted flat sequential structures, while more recent designs use hierarchical search and recursive decomposition for non-linear problem solving. The framework comparison in Derouiche et al.\ \cite{derouiche2025frameworks} translates these abstractions into engineering decisions: LangGraph's graph-based execution model supports stateful, cyclical workflows, whereas CrewAI prioritises ease of configuration for role-based pipelines. +Arunkumar et al.\ \cite{arunkumar2026architectures} extend the taxonomy to evaluation, arguing that agents should be assessed across all five architectural layers rather than solely by task completion rate. The authors document how early agent loops such as ReAct adopted flat sequential structures, while more recent designs use hierarchical search and recursive decomposition for non-linear problem solving. The framework comparison in Derouiche et al.\ \cite{derouiche2025frameworks} translates these abstractions into engineering decisions: LangGraph's \cite{langgraphdocs} graph-based execution model supports stateful, cyclical workflows, whereas CrewAI \cite{crewaidocs} prioritises ease of configuration for role-based pipelines. \subsection{Multi-Agent Frameworks} @@ -175,7 +177,7 @@ At the same time, the survey reveals that the field is far from maturity. Halluc The implications for software system design are clear: practitioners adopting agentic AI today must design for human oversight, invest in robust evaluation infrastructure, and treat the agent as an architectural component subject to the same quality attributes---reliability, security, maintainability---as any other system component \cite{sallma2025}. Researchers, meanwhile, have a rich agenda whose resolution will determine how quickly the field moves from promising demonstrations to dependable practice. -\bibliographystyle{ACM-Reference-Format} +\bibliographystyle{IEEEtran} \bibliography{references} \end{document} diff --git a/references.bib b/references.bib index f629542..571c3cd 100644 --- a/references.bib +++ b/references.bib @@ -170,6 +170,37 @@ location = {San Francisco, CA, USA}, series = {UIST '23} } +% ----------------------------------------------- +% Additional references — official framework documentation +% ----------------------------------------------- + +% LangGraph official documentation — graph-based stateful agent workflows +@misc{langgraphdocs, + title={LangGraph Documentation}, + author={{LangChain AI}}, + year={2025}, + url={https://langchain-ai.github.io/langgraph/}, + note={Accessed May 2025} +} + +% CrewAI official documentation — role-based multi-agent orchestration framework +@misc{crewaidocs, + title={CrewAI Documentation}, + author={{CrewAI Inc.}}, + year={2025}, + url={https://docs.crewai.com/}, + note={Accessed May 2025} +} + +% AutoGen official documentation — Microsoft's conversational multi-agent framework +@misc{autogendocs, + title={AutoGen Documentation}, + author={{Microsoft Research}}, + year={2025}, + url={https://microsoft.github.io/autogen/}, + note={Accessed May 2025} +} + % AI agentic programming: planning, memory, tool integration, execution monitoring @misc{wang2025aiagenticprogrammingsurvey, title={AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities},