CoWPE: Adaptive Context Window Adjustment in LLMs for Complex Input Queries
Main Article Content
Abstract
Recent work has shown that large language models, or LLMs, are capable of amazing processing context windows based on the nuance and complexity of respective input queries. By changing rotary position embedding (RoPE), a well-liked position encoding technique used by well-known LLMs like LLaMA and GPT-NeoX, recent studies have attempted to expand the context window of LLMs. In order to help LLMs efficiently adapt to a larger context window based on input query complexity and nuance, we identify in this work the inherent need for LLMs' attention entropy (i.e., the information entropy of attention scores) to maintain stability and introduce a novel extension to RoPE that combines adjusting RoPE's base frequency and scaling the attention logits. Our proposal, CoWPE, aims to accomplish this by building neighbor attention information and bi-level grouped attention in order to modify the context window of LLMs. While neighbor attention catches relationships between neighboring tokens within a given range, grouped attention collects interdependence among tokens that are far apart. During inference, the self-attention mechanism of the original model is utilized to calculate the two-level attentions. Our CoWPE requires no fine-tuning and can easily expand the context window of existing LLMs with a small amount of code adjustment. We carry out extensive tests on several benchmarks, and the outcomes demonstrate the CoWPE can successfully increase the context window duration of current LLMs.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2024 All rights reserved by the respective authors and JAIGC