{"id":88217,"date":"2025-09-12T11:31:29","date_gmt":"2025-09-12T09:31:29","guid":{"rendered":"https:\/\/insiders-technologies.com\/insiders-llm-benchmarking-september-2025\/"},"modified":"2025-12-11T11:33:16","modified_gmt":"2025-12-11T10:33:16","slug":"insiders-llm-benchmarking-september-2025","status":"publish","type":"post","link":"https:\/\/insiders.next-kmu.de\/en\/insiders-llm-benchmarking-september-2025\/","title":{"rendered":"Insiders LLM Bench\u00admar\u00adking September 2025"},"content":{"rendered":"<p>[et_pb_section fb_built=\u201e1\u201c _builder_version=\u201e4.16\u201c custom_padding=\u201e0px||0px||true\u201c da_disable_devices=\u201eoff|off|off\u201c locked=\u201eoff\u201c global_colors_info=\u201c{}\u201c da_is_popup=\u201eoff\u201c da_exit_intent=\u201eoff\u201c da_has_close=\u201eon\u201c da_alt_close=\u201eoff\u201c da_dark_close=\u201eoff\u201c da_not_modal=\u201eon\u201c da_is_singular=\u201eoff\u201c da_with_loader=\u201eoff\u201c da_has_shadow=\u201eon\u201c][et_pb_row _builder_version=\u201e4.27.4\u201c custom_padding=\u201e0px||||false|false\u201c global_colors_info=\u201c{}\u201c][et_pb_column type=\u201e4_4\u201c _builder_version=\u201e4.16\u201c custom_padding=\u201c|||\u201c global_colors_info=\u201c{}\u201c custom_padding__hover=\u201c|||\u201c][et_pb_post_title author=\u201eoff\u201c date=\u201eoff\u201c categories=\u201eoff\u201c comments=\u201eoff\u201c _builder_version=\u201e4.27.4\u201c _module_preset=\u201edefault\u201c title_font=\u201c|800|||||||\u201c global_colors_info=\u201c{}\u201c][\/et_pb_post_title][et_pb_text _builder_version=\u201e4.27.4\u201c header_font=\u201c|700|||||||\u201c header_4_letter_spacing=\u201e12px\u201c module_alignment=\u201ecenter\u201c saved_tabs=\u201eall\u201c locked=\u201eoff\u201c global_colors_info=\u201c{}\u201c]<\/p>\n<p>The Insiders LLM Bench\u00admar\u00adking in September 2025 continues the series and builds con\u00adsis\u00adt\u00adently on the findings from Q2. To ensure com\u00adpa\u00adra\u00adbi\u00adlity, identical dimen\u00adsions and test data are used as in the previous bench\u00admar\u00adking.<\/p>\n<p>[\/et_pb_text][et_pb_text _builder_version=\u201e4.27.4\u201c _module_preset=\u201edefault\u201c header_font=\u201c|700|||||||\u201c header_4_letter_spacing=\u201e12px\u201c module_alignment=\u201ecenter\u201c global_colors_info=\u201c{}\u201c]<\/p>\n<p>The market for large language models (LLMs) is deve\u00adlo\u00adping rapidly. New models appear on a monthly basis, existing ones are further optimized\u2014and not all of them prove them\u00adselves in practice. With the current Insiders LLM Bench\u00admar\u00adking for Q3 2025, we create trans\u00adpa\u00adrency and provide companies with sound guidance: Which models deliver the best quality? What are the limi\u00adta\u00adtions in pro\u00adduc\u00adtive use? And how can per\u00adfor\u00admance and security be recon\u00adciled?<\/p>\n<p><\/p>\n<p>&nbsp;<\/p>\n<p><\/p>\n<h3>A practical com\u00adpa\u00adrison<\/h3>\n<p><\/p>\n<p>As in Q2, we tested the leading models based on a stan\u00addar\u00addized IDP dataset \u2013 real documents from insurance and finance. This ensures that the results are directly trans\u00adferable to our customers\u2018 requi\u00adre\u00adments. The bench\u00admar\u00adking covers a total of 21 models, including new additions such as GPT\u20115, Gemini 2.5 Pro, and Claude 4 Sonnet.<\/p>\n<p><\/p>\n<p>The com\u00adpa\u00adrison shows that global models set the benchmark thanks to their huge databases and computing resources. However, in regulated indus\u00adtries in par\u00adti\u00adcular, data pro\u00adtec\u00adtion, trans\u00adpa\u00adrency, and inte\u00adgra\u00adtion capa\u00adbi\u00adli\u00adties are just as crucial as pure per\u00adfor\u00admance.<\/p>\n<p><\/p>\n<p>By switching to a more powerful model, Insiders Private was able to achieve a signi\u00adfi\u00adcant leap in quality: from a score of 67.9 in Q2 to 78.2 now \u2013 while main\u00adtai\u00adning the same average pro\u00adces\u00adsing time per document. This brings it closer to the top models without com\u00adpro\u00admi\u00adsing on data pro\u00adtec\u00adtion or speed.<\/p>\n<p><\/p>\n<p>The current Insiders LLM bench\u00admar\u00adking illus\u00adtrates that Insiders con\u00adti\u00adnuously monitors the market and masters the balancing act between per\u00adfor\u00admance and security for its customers \u2013 with a clear best-of-breed approach. This approach means that no single model covers all tasks, but rather that the most powerful LLMs are iden\u00adti\u00adfied, evaluated, and flexibly inte\u00adgrated for each appli\u00adca\u00adtion. New models are therefore imme\u00addia\u00adtely tested in bench\u00admar\u00adking and compared with existing ones. The results flow directly into product deve\u00adlo\u00adp\u00adment and ensure con\u00adsis\u00adt\u00adently high quality.<\/p>\n<p><\/p>\n<p>The question of \u201cthe best LLM\u201d is not a black-and-white issue. Per\u00adfor\u00admance alone is not enough. In highly regulated indus\u00adtries such as insurance and finance, relia\u00adbi\u00adlity, data pro\u00adtec\u00adtion, and inte\u00adgra\u00adtion capa\u00adbi\u00adli\u00adties are also key factors.<\/p>\n<p>[\/et_pb_text][et_pb_button button_url=\u201ehttps:\/\/insiders.next-kmu.de\/wp-content\/uploads\/2025\/12\/Benchmarking_September_EN.pdf\u201c url_new_window=\u201eon\u201c button_text=\u201eRead LLM com\u00adpa\u00adrison\u201c button_alignment=\u201eleft\u201c _builder_version=\u201e4.27.4\u201c _module_preset=\u201edefault\u201c custom_button=\u201eon\u201c button_text_color=\u201egcid-a1ce49c7-18bb-4621\u20138275-487db4ef4ea2\u201c locked=\u201eoff\u201c global_colors_info=\u201c{%22gcid-e57f936a-e1ef-478a-a91c-6dc2f7bf0652%22:%91%22button_text_color__hover%22%93,%22gcid-a1ce49c7-18bb-4621\u20138275-487db4ef4ea2%22:%91%22button_text_color%22%93}\u201c button_text_color__hover_enabled=\u201eon|hover\u201c button_text_color__hover=\u201e#000000\u201c button_bg_color__hover_enabled=\u201eon|hover\u201c][\/et_pb_button][et_pb_text disabled_on=\u201eoff|off|off\u201c _builder_version=\u201e4.27.4\u201c _module_preset=\u201edefault\u201c header_font=\u201c|700|||||||\u201c header_4_letter_spacing=\u201e12px\u201c module_alignment=\u201ecenter\u201c global_colors_info=\u201c{}\u201c]<\/p>\n<p>For indi\u00advi\u00addual use cases, Insiders AI experts offer sound advice for your company. We would be happy to include your data in an upcoming industry-specific bench\u00admar\u00adking exercise. Simply contact our Insiders AI experts to find out more.<\/p>\n<p>[\/et_pb_text][et_pb_button button_url=\u201emailto:llm-benchmarking@insiders-technologies.de\u201c url_new_window=\u201eon\u201c button_text=\u201eBenchmark my use case\u201c button_alignment=\u201eleft\u201c disabled_on=\u201eoff|off|off\u201c _builder_version=\u201e4.27.4\u201c _module_preset=\u201edefault\u201c custom_button=\u201eon\u201c button_text_color=\u201egcid-a1ce49c7-18bb-4621\u20138275-487db4ef4ea2\u201c locked=\u201eoff\u201c global_colors_info=\u201c{%22gcid-e57f936a-e1ef-478a-a91c-6dc2f7bf0652%22:%91%22button_text_color__hover%22%93,%22gcid-a1ce49c7-18bb-4621\u20138275-487db4ef4ea2%22:%91%22button_text_color%22%93}\u201c button_text_color__hover_enabled=\u201eon|hover\u201c button_text_color__hover=\u201e#000000\u201c button_bg_color__hover_enabled=\u201eon|hover\u201c][\/et_pb_button][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Insiders LLM Bench\u00admar\u00adking in September 2025 continues the series and builds con\u00adsis\u00adt\u00adently on the findings from Q2. To ensure com\u00adpa\u00adra\u00adbi\u00adlity, identical dimen\u00adsions and test data are used as in the previous bench\u00admar\u00adking.<\/p>\n","protected":false},"author":26,"featured_media":87289,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","wp_typography_post_enhancements_disabled":false,"_mbp_gutenberg_autopost":false,"footnotes":""},"categories":[1,677,2],"tags":[],"class_list":["post-88217","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-allgemein","category-artificial-intelligence","category-blog-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/posts\/88217","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/users\/26"}],"replies":[{"embeddable":true,"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/comments?post=88217"}],"version-history":[{"count":0,"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/posts\/88217\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/media\/87289"}],"wp:attachment":[{"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/media?parent=88217"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/categories?post=88217"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insiders.next-kmu.de\/en\/wp-json\/wp\/v2\/tags?post=88217"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}