چکیده Large language models (LLMs) and multimodal large language models (MLLMs) have shown excellent general capabilities, even exhibiting adaptability in many professional domains such as law, economics, transportation, and medicine. Currently, many domain-specific benchmarks have been proposed to verify the performance of (M)LLMs in specific fields. Among various domains, transportation plays a crucial role in modern society as it impacts the economy, the environment, and the quality of life for billions of people. However, it is unclear how much traffic knowledge (M)LLMs possess and whether they can reliably perform transportation-related tasks. To address this gap, we propose TransportationGames, a carefully designed and thorough evaluation benchmark for assessing (M)LLMs in the transportation domain. By comprehensively considering the applications in real-world scenarios and referring to the first three levels in Bloom’s Taxonomy, we test the performance of various (M)LLMs in memorizing, understanding, and applying transportation knowledge by the selected tasks. The experimental results show that although some models perform well in some tasks, there is still much room for improvement overall. We hope the release of TransportationGames can serve as a foundation for future research, thereby accelerating the implementation and application of (M)LLMs in the transportation domain.

چکیده به فارسی (ترجمه ماشینی) مدل های بزرگ زبان (LLMS) و مدل های زبان بزرگ چند حالته (MLLMS) قابلیت های کلی عالی را نشان داده اند ، حتی در بسیاری از حوزه های حرفه ای مانند قانون ، اقتصاد ، حمل و نقل و پزشکی سازگاری دارند.در حال حاضر ، بسیاری از معیارهای خاص دامنه برای تأیید عملکرد (M) LLMS در زمینه های خاص پیشنهاد شده است.در میان حوزه های مختلف ، حمل و نقل نقش مهمی در جامعه مدرن دارد زیرا بر اقتصاد ، محیط زیست و کیفیت زندگی برای میلیاردها نفر تأثیر می گذارد.با این حال ، مشخص نیست که دانش ترافیک (M) LLM ها چقدر دارند و آیا می توانند با اطمینان می توانند کارهای مربوط به حمل و نقل را انجام دهند.برای پرداختن به این شکاف ، ما TransportingGames را پیشنهاد می کنیم ، یک معیار ارزیابی دقیق و دقیق برای ارزیابی (M) LLMS در حوزه حمل و نقل.با در نظر گرفتن جامع برنامه های کاربردی در سناریوهای دنیای واقعی و مراجعه به سه سطح اول در طبقه بندی بلوم ، عملکرد LLM های مختلف (M) LLM را در حفظ ، درک و استفاده از دانش حمل و نقل توسط کارهای انتخاب شده آزمایش می کنیم.نتایج تجربی نشان می دهد که اگرچه برخی از مدل ها در برخی از کارها عملکرد خوبی دارند ، اما هنوز هم فضای زیادی برای بهبود وجود دارد.ما امیدواریم که انتشار TransportionGames بتواند به عنوان پایه ای برای تحقیقات آینده باشد ، در نتیجه اجرای و کاربرد LLMS (M) LLM در حوزه حمل و نقل را تسریع می کند.

زبان مقاله انگلیسی

عنوان مقاله به انگلیسی TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models

عنوان مقاله به فارسی TransportingGames: محک زدن دانش تبادل (چند وجهی) مدل های زبانی بزرگ

فرمت مقاله PDF

نویسندگان Xue Zhang, Xiangyu Shi, Xinyue Lou, Rui Qi, Yufeng Chen, Jinan Xu, Wenjuan Han

مشاهده بیشتر