Assessment of the potential of emerging large language models in structural analysis: A case study on beam analysis
Abstract
This study explored the potential of emerging large language models (LLMs) in structural
analysis, with a particular focus on beam analysis. OpenAI’s GPT-4 was employed as a
benchmark to evaluate the capabilities of such models. A custom test dataset was developed,
consisting of 90 nominal beam analysis problems and 90 adversarial variants generated through
word- and sentence-level perturbations of the nominal beam analysis problems.
To facilitate the evaluation, a modular framework was designed to integrate LLMs into the
beam structural analysis workflow. The framework comprises seven components: a User
Interface, Task Packager, Model Interface, Utility Tools, Response Retriever, Instruction
Runner, and PDF Generator. It was implemented using Python 3.10.11.
GPT-4 was assessed within the proposed framework under both few-shot and zero-shot setups.
In the few-shot setup, it achieved 93.3% accuracy on nominal test cases, with performance
dropping by 11% to 82.3% on adversarial cases, indicating a robustness of 89%. In the zero-
shot setup, GPT-4 attained 88.9% accuracy on nominal cases, with a 13% drop to 75.9% under
adversarial conditions, reflecting a robustness of 87%. Its generated reports received an average
quality rating of 8.03 out of 10, by voluntary assessors.
The findings suggest that LLMs hold significant promise for integration into structural
engineering workflows. The use of LLMs resulted into a simple user experience, and the LLM
demonstrated ability to use tools availed to it to solve tasks. The study recommends future
research focused on consolidating currently fragmented approaches into a unified system that
fully utilizes the capabilities of LLMs across the entire structural engineering lifecycle, from
conceptual design through to construction planning.