HomePhD DefensePage Template
English日本語简体中文

Hakaze Cho / Yufeng Zhao

Ph.D. 3rd Year Student @ Graduate School of Information Science, Japan Advanced Institute of Science and Technology
Fully-funded Research Assistant @ RebelsNLU, PI: Assoc. Prof. Naoya Inoue

Official Name: Yufeng Zhao (quite difficult to pronounce, so I select the Japanese spell for publication, both from the hieroglyph “趙 羽風”)
Birth: Beijing, 1999

Contact

Twitter     GitHub     Researchmap     OpenReview     ORCID     CV
Physical Address: Laboratory I-52, Information Science Building I, 1-1 Asahidai, Nomi, Ishikawa, Japan
E-mail: yfzhao [at] jaist.ac.jp
This email address is not always reachable. If I do not reply, please try CC the message to yfZhao495 [at] outlook.com.

Biography

I, aka Yufeng Zhao, graduated from Beijing Institute of Technology, a top-ranking university in China, with a Master’s degree in Software Engineering in 2023 and a Bachelor’s degree in Chemistry in 2021. I am in my graduation thread of Ph.D. at JAIST with a fast-track schedule in March 2026. My research focuses on exploring the internal mechanisms of artificial neural networks, particularly Transformer-based neural language models, during both training and inference, by mathematical and representation learning methods, and improving their performance robustly through our deeper understanding. I have published over 30 papers / presentations in this area since 2023, some of which have been presented at top-tier international conferences such as ICLR and NeurIPS.

Research Collaboration Statement. I am actively seeking productive research collaborations in the mentioned area. (My standard for being “productive” is to produce two to three top-tier conference papers per year. This is a benchmark I set for myself, not an expectation I impose on others.) If you are interested in working together on top conference papers, please do not hesitate to contact me. I welcome collaborations with both experts and motivated beginners—being a novice is not a drawback if you are eager and efficient to learn. Additionally, I am open to exploring collaborations in other areas as well.

Position Interests. I have already signed a full-time contract starting in April 2026, and I welcome visiting and part-time positions. At the same time, I am seeking an associate professor position starting in April 2029.

Research Interests

Keywords: Representation Learning, Mechanistic Interpretability, In-context Learning

  • Interpretability for Artificial Neural Network: Mechanistic interpretability (especially for Transformer)
    [ICLR 2025] [NeurIPS 2025] [COLING 2025]
  • Controllability for Artificial Neural Network: Low-resource model behavior improvement / controlling from mechanistic perspective
    [NAACL 2025] [BlackboxNLP 2025]
  • Misc.: Manifold Learning, Low-precision Neural Networks, Neural Network Training Dynamics
    [ArXiv] [ArXiv]

Publications

[Export Publication List as TXT] [Google Scholar] [Semantic Scholar] [DBLP]
Total Publications: 31, Cumulative Impact Factor: 96.4, Total Pages: 865.

International Conference

  1. Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
    Haolin YangHakaze Cho, Yiqiao Zhong, Naoya Inoue
    Annual Conference on Neural Information Processing Systems (NeurIPS). 2025. 52 pages. [h5=371, IF=23.3]
    [OpenReview] [PDF] [arXiv] [Poster] [Github] [Abstract] [Bibtex
    The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge the gap between attention heads and task vectors, offering a unified account of ICL's underlying mechanisms.
    @inproceedings{yang2025unifying,
        title={Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning},
        author={Yang, Haolin and Cho, Hakaze and Zhong, Yiqiao and Inoue, Naoya},
        booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
        year={2025},
        url={https://openreview.net/forum?id=FIfjDqjV0B}
    }
  2. Mechanistic Fine-tuning for In-context Learning
    Hakaze Cho, Peng Luo, Mariko Kato, Rin Kaenbyou, Naoya Inoue
    BlackboxNLP Workshop2025. 28 pages.  Workshop at EMNLP 2025.
    [ACL Anthology] [PDF] [arXiv] [Github] [Abstract] [Bibtex
    In-context Learning (ICL) utilizes structured demonstration-query inputs to induce few-shot learning on Language Models (LMs), which are not originally pre-trained on ICL-style data. To bridge the gap between ICL and pre-training, some approaches fine-tune LMs on large ICL-style datasets by an end-to-end paradigm with massive computational costs. To reduce such costs, in this paper, we propose Attention Behavior Fine-Tuning (ABFT), utilizing the previous findings on the inner mechanism of ICL, building training objectives on the attention scores instead of the final outputs, to force the attention scores to focus on the correct label tokens presented in the context and mitigate attention scores from the wrong label tokens. Our experiments on 9 modern LMs and 8 datasets empirically find that ABFT outperforms in performance, robustness, unbiasedness, and efficiency, with only around 0.01% data cost compared to the previous methods. Moreover, our subsequent analysis finds that the end-to-end training objective contains the ABFT objective, suggesting the implicit bias of ICL-style data to the emergence of induction heads. Our work demonstrates the possibility of controlling specific module sequences within LMs to improve their behavior, opening up the future application of mechanistic interpretability.
    @inproceedings{cho2025mechanistic,
        title={Mechanistic Fine-tuning for In-context Learning},
        author={Cho, Hakaze and Luo, Peng and Kato, Mariko and Kaenbyou, Rin and Inoue, Naoya},
        booktitle = "Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP",
        year={2025},
        url={https://arxiv.org/abs/2505.14233}
    }
  3. Revisiting In-context Learning Inference Circuit in Large Language Models
    Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
    International Conference on Learning Representations (ICLR). 2025. 37 pages. [h5=362, IF=48.9]
    [OpenReview] [PDF] [arXiv] [Github] [Poster] [Review] [Abstract] [Bibtex
    In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations of demonstrations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. Through careful measurements, the proposed inference circuit successfully captures and unifies many fragmented phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.
    @inproceedings{cho2025revisiting,
        title={Revisiting In-context Learning Inference Circuit in Large Language Models},
        author={Hakaze Cho and Mariko Kato and Yoshihiro Sakai and Naoya Inoue},
        booktitle={The Thirteenth International Conference on Learning Representations},
        year={2025},
        url={https://openreview.net/forum?id=xizpnYNvQq}
    }
  4. Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
    Annual Conference of the Nations of the Americas Chapter of the ACL (NAACL.main). 2025. 24 pages. [h5=126, IF=16.5]
    [ACL Anthology] [PDF] [arXiv] [Github] [Poster] [Review] [Abstract] [Bibtex
    In-Context Learning (ICL) typically utilizes classification criteria from output probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation applied. To address this problem, we propose Hidden Calibration, which renounces token probabilities and uses the nearest centroid classifier on the LM’s last hidden states. In detail, we assign the label of the nearest centroid previously estimated from a calibration set to the test sample as the predicted label. Our experiments on 6 models and 10 classification datasets indicate that Hidden Calibration consistently outperforms current token-based baselines by about 20%~50%, achieving a strong state-of-the-art in ICL. Our further analysis demonstrates that Hidden Calibration finds better classification criteria with less inter-class overlap, and LMs provide linearly separable intra-class clusters with the help of demonstrations, which supports Hidden Calibration and gives new insights into the principle of ICL. Our official code implementation can be found at https://github.com/hc495/Hidden_Calibration.
    @inproceedings{cho2025token,
        title={Token-based Decision Criteria Are Suboptimal in In-context Learning},
        author={Hakaze Cho and Yoshihiro Sakai and Mariko Kato and Kenshiro Tanaka and Akira Ishii and Naoya Inoue},
        booktitle={Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
        year={2025},
        url={https://aclanthology.org/2025.naacl-long.278/}
    }
  5. Understanding Token Probability Encoding in Output Embeddings
    Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue
    International Conference on Computational Linguistics (COLING). 2025. 16 pages. [h5=81, IF=7.7]
    [ACL Anthology] [PDF] [arXiv] [Poster] [Abstract] [Bibtex
    In this paper, we investigate the output token probability information in the output embedding of language models. We find an approximate common log-linear encoding of output token probabilities within the output embedding vectors and empirically demonstrate that it is accurate and sparse. As a causality examination, we steer the encoding in output embedding to modify the output probability distribution accurately. Moreover, the sparsity we find in output probability encoding suggests that a large number of dimensions in the output embedding do not contribute to causal language modeling. Therefore, we attempt to delete the output-unrelated dimensions and find more than 30% of the dimensions can be deleted without significant movement in output distribution and sequence generation. Additionally, in the pre-training dynamics of language models, we find that the output embeddings capture the corpus token frequency information in early steps, even before an obvious convergence of parameters starts.
    @inproceedings{cho2025understanding,
        title={Understanding Token Probability Encoding in Output Embeddings},
        author={Hakaze Cho and Yoshihiro Sakai and Kenshiro Tanaka and Mariko Kato and Naoya Inoue},
        booktitle={Proceedings of the 31st International Conference on Computational Linguistics},
        year={2025},
        url={https://aclanthology.org/2025.coling-main.708/}
    }
  6. Find-the-Common: A Benchmark for Explaining Visual Patterns from Images
    Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
    International Conference on Language Resources and Evaluation (LREC). 2024. 7 pages. [h5=68]
    [ACL Anthology] [PDF] [Abstract] [Bibtex
    Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs), such as GPT-4V and InstructBLIP, have prompted some studies have started an in-depth analysis of the reasoning capabilities of IVLMs. However, Inductive Visual Reasoning, a vital skill for text-image understanding, remains underexplored due to the absence of benchmarks. In this paper, we introduce Find-the-Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including Image-Based Reasoning, Text-Based Reasoning, and Image-Text-Based Reasoning with various models. Extensive experiments show that even state-of-the-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset has been released and is available online: https://github.com/SSSSSeki/Find-the-common.
    @inproceedings{shi2024find,
        title={Find-the-Common: A Benchmark for Explaining Visual Patterns from Images},
        author={Yuting Shi and Naoya Inoue and Houjing Wei and Yufeng Zhao and Tao Jin},
        booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
        year={2024},
        url={https://aclanthology.org/2024.lrec-main.642/}
    }

Pre-print

  1. Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
    Haolin YangHakaze ChoNaoya Inoue
    Pre-print. 2025. 45 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.
    @article{yang2025localizingtaskrecognitiontask,
        title={Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis},
        author={Yang, Haolin and Cho, Hakaze and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.24164},
        year={2025}
    }
  2. Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight
    Haolin YangHakaze ChoKaize DingNaoya Inoue
    Pre-print. 2025. 48 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    Large Language Models (LLMs) can perform new tasks from in-context demonstrations, a phenomenon known as in-context learning (ICL). Recent work suggests that these demonstrations are compressed into task vectors (TVs), compact task representations that LLMs exploit for predictions. However, prior studies typically extract TVs from model outputs or hidden states using cumbersome and opaque methods, and they rarely elucidate the mechanisms by which TVs influence computation. In this work, we address both limitations. First, we propose directly training Learned Task Vectors (LTVs), which surpass extracted TVs in accuracy and exhibit superior flexibility-acting effectively at arbitrary layers, positions, and even with ICL prompts. Second, through systematic analysis, we investigate the mechanistic role of TVs, showing that at the low level they steer predictions primarily through attention-head OV circuits, with a small subset of 'key heads' most decisive. At a higher level, we find that despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces to improve logits of relevant labels, while later TVs are predominantly scaled in magnitude. Taken together, LTVs not only provide a practical approach for obtaining effective TVs but also offer a principled lens into the mechanistic foundations of ICL.
    @article{yang2025taskvectorslearnedextracted,
        title={Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight},
        author={Yang, Haolin and Cho, Hakaze and Ding, Kaize and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.24169},
        year={2025}
    }
  3. Binary Autoencoder for Mechanistic Interpretability of Large Language Models
    Hakaze ChoHaolin YangBrian M. KurkoskiNaoya Inoue
    Pre-print. 2025. 36 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    Existing works are dedicated to untangling atomized numerical components (features) from the hidden states of Large Language Models (LLMs) for interpreting their mechanism. However, they typically rely on autoencoders constrained by some implicit training-time regularization on single training instances (i.e., normalization, top-k function, etc.), without an explicit guarantee of global sparsity among instances, causing a large amount of dense (simultaneously inactive) features, harming the feature sparsity and atomization. In this paper, we propose a novel autoencoder variant that enforces minimal entropy on minibatches of hidden activations, thereby promoting feature independence and sparsity across instances. For efficient entropy calculation, we discretize the hidden activations to 1-bit via a step function and apply gradient estimation to enable backpropagation, so that we term it as Binary Autoencoder (BAE) and empirically demonstrate two major applications: (1) Feature set entropy calculation. Entropy can be reliably estimated on binary hidden activations, which we empirically evaluate and leverage to characterize the inference dynamics of LLMs and In-context Learning. (2) Feature untangling. Similar to typical methods, BAE can extract atomized features from LLM's hidden states. To robustly evaluate such feature extraction capability, we refine traditional feature-interpretation methods to avoid unreliable handling of numerical tokens, and show that BAE avoids dense features while producing the largest number of interpretable ones among baselines, which confirms the effectiveness of BAE serving as a feature extractor.
    @article{cho2025binary,
        title={Binary Autoencoder for Mechanistic Interpretability of Large Language Models},
        author={Cho, Hakaze and Yang, Haolin and Kurkoski, Brian M. and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.20997},
        year={2025}
    }
  4. Mechanism of Task-oriented Information Removal in In-context Learning
    Hakaze ChoHaolin YangGouki MinegishiNaoya Inoue
    Pre-print. 2025. 87 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.
    @article{cho2025mechanism,
        title={Mechanism of Task-oriented Information Removal in In-context Learning},
        author={Cho, Hakaze and Yang, Haolin and Minegishi, Gouki and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.21012},
        year={2025}
    }
  5. Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Pre-print. 2025. 6 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    In this study, we measure the Intrinsic Dimension (ID) of token embedding to estimate the intrinsic dimensions of the manifolds spanned by the representations, so as to evaluate their redundancy quantitatively compared to their extrinsic dimensionality. In detail, (1) we estimate the ID of token embeddings in small-scale language models and also modern large language models, finding that the embedding spaces often reside on lower-dimensional manifolds compared to their extrinsic dimensionality; (2) we measure the ID across various model sizes and observe an increase in redundancy rates as the model scale grows; (3) we measure the dynamics of IDs during the training process, and find a rapid ID drop in the early stages of training. Moreover, (4) when LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs, suggesting that the ID can serve as a useful guideline for LoRA application.
    @article{kataiwa2025measuring,
        title={Measuring Intrinsic Dimension of Token Embeddings},
        author={Kataiwa, Takuya and Cho, Hakaze and Ohki, Tetsushi},
        journal={arXiv preprint arXiv:2503.02142},
        year={2025}
    }
  6. Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
    Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
    Pre-print. 2025. 8 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    The performance of In-Context Learning (ICL) is highly sensitive to the selected demonstrations. Existing approaches to demonstration selection optimize different objectives, yielding inconsistent results. To address this, we propose a unified metric--affinity and diversity--that leverages ICL model's internal representations. Our experiments show that both affinity and diversity strongly correlate with test accuracies, indicating their effectiveness for demonstration selection. Moreover, we show that our proposed metrics align well with various previous works to unify the inconsistency.
    @article{kato2025affinity,
        title={Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations},
        author={Kato, Mariko and Cho, Hakaze and Sakai, Yoshihiro and Inoue, Naoya},
        journal={arXiv preprint arXiv:2502.14380},
        year={2025}
    }
  7. StaICC: Standardized Evaluation for Classification Task in In-context Learning
    Hakaze ChoNaoya Inoue
    Pre-print. 2025. 20 pages. 
    [PDF] [arXiv] [Github] [PyPI] [Abstract] [Bibtex
    Classification tasks are widely investigated in the In-Context Learning (ICL) paradigm. However, current efforts are evaluated on disjoint benchmarks and settings, while their performances are significantly influenced by some trivial variables, such as prompt templates, data sampling, instructions, etc., which leads to significant inconsistencies in the results reported across various literature, preventing fair comparison or meta-analysis across different papers. Therefore, this paper proposes a standardized and easy-to-use evaluation toolkit (StaICC) for in-context classification. Including, for the normal classification task, we provide StaICC-Normal, selecting 10 widely used datasets, and generating prompts with a fixed form, to mitigate the variance among the experiment implementations. To enrich the usage of our benchmark, we also provide a sub-benchmark StaICC-Diag for diagnosing ICL from several aspects, aiming for a more robust inference processing.
    @article{cho2025staicc,
        title={StaICC: Standardized Evaluation for Classification Task in In-context Learning},
        author={Cho, Hakaze and Inoue, Naoya},
        journal={arXiv preprint arXiv:2501.15708},
        year={2025}
    }
  8. NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning
    Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
    Pre-print. 2024. 20 pages. 
    [PDF] [arXiv] [Github] [Abstract] [Bibtex
    In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibration. Our experiments on two models and 12 downstream datasets show that NoisyICL can help ICL produce more accurate predictions. Our further analysis indicates that NoisyICL enables the model to provide more fair predictions, and also with more faithful confidence. Therefore, we believe that NoisyICL is an effective calibration of ICL. Our experimental code is uploaded to Github.
    @article{zhao2024noisyicl,
        title={NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning},
        author={Zhao, Yufeng and Sakai, Yoshihiro and Inoue, Naoya},
        journal={arXiv preprint arXiv:2402.05515},
        year={2024}
    }

Domestic Conferences / Journal / Miscellaneous

† = Japan-domestic Secondary Publication for International Conference Papers; Default: Non-refereed, ▲= Refereed
  1. Conference Note: Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze ChoNaoya Inoue
    Journal of Natural Language Processing (JNLP). 2025. 6 pages. 
    [PDF
  2. ▲†Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    本稿では,言語の埋め込み表現である単語ベクトルや埋め込み層について,表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し,その冗長度合いを定量評価する.具体的には,(1) Word2Vec や GloVe などの小規模モデルの埋め込みが持つIDを推定し,(2) Pythiaシリーズを代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する.実験の結果,埋め込み空間が外在次元に比べ低い次元の多様体上に分布する傾向が見られた.また,モデル規模の拡大に伴う冗長率の変化や,学習初期において ID が急速に収束する傾向が観察された.また,推定されたIDがLoRA適用時のランク選択に有効な可能性を示した.
  3. Analysis of Internal Representations of Knowledge with Expressions of Familiarity
    Kenshiro Tanaka, Yoshihiro Sakai, Hakaze ChoNaoya Inoue, Kai Sato, Ryosuke Takahashi, Benjamin HeinzerlingKentaro Inui
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    大規模言語モデル (LLM) の知識の既知性判断能力に関する研究が進められつつあるが、「It is known that…」のような既知性を示す言語表現を伴う知識を学習した際に、推論時にLLMがその知識の既知性を判断する能力については、検討されていない。本研究では、事前学習済みLLMに既知性を示す言語表現を付与した知識の記述を学習させ、その知識の内部表象を分析することで、既知性がどのようにLLMの内部に表現され得るのかを分析する。その結果、(1)知識の内部表象には、学習時に付与した言語表現毎に個別に既知性の情報が保持されていること、(2)既知性の情報は言語表現の記述位置毎に個別に保持されることが明らかになった。本研究は、LLMの既知性の判断能力のメカニズム解明の足がかりとなるものである。
  4. Internal Representations of Knowledge Recognition in Language Models
    Kai Sato, Ryosuke Takahashi, Benjamin Heinzerling, Kenshiro Tanaka, Hakaze Cho, Yoshihiro Sakai, Naoya InoueKentaro Inui
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    言語モデル(LM)の知識獲得能力は広く研究されているが,獲得した知識の既知性に関する判断機序については十分な理解が得られていない.本研究ではLMを用いて,特定の知識に対する出力生成時と既知性判断時の内部状態を比較分析した.結果として,言語モデルが実際に既知性判断を行う能力を持ち得ることが示され,(1)知識を学習した時点で,既知性を判断するための情報が内部表現中に存在すること,(2)既知と判断される知識と未知と判断される知識において,LMがそれぞれ異なる活性化パターンを示すことを明らかにした.これらの知見は,LMの既知性判断メカニズムの理解へ向けた手がかりを提供する.
  5. Revisiting In-context Learning Inference Circuit in Large Language Models
    Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages.  Oral, Outstanding Paper.
    [PDF] [Slides] [Abstract
    In-context Learning (ICL) は,言語モデルにおける新たな少数ショット学習パラダイムとして注目されているが,その内在的メカニズムは十分に解明されていない. 本研究では,ICL の推論ダイナミクスを3 つの基本操作に分解し,それらを基盤として推論回路を構築した上で精密な測定を行い,従来の研究で観察されてきた現象を統一的に説明することを試みた. さらに,提案した回路を無効化するアブレーション分析の結果,ICL の性能が顕著に低下することが確認され,提案した推論回路が ICL の主要なメカニズムであることが示唆された.
  6. Beyond the Induction Circuit: A Mechanistic Prototype for Out-of-domain In-context Learning
    Hakaze ChoNaoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages. 
    [PDF] [Poster] [Abstract
    In-contextLearning (ICL) is a promising few-shot learning paradigm with unclear mechanisms. Existing explanations heavily rely on Induction Heads, which fail to account for out-of-domain ICL, where query labels are absent in demonstrations. To address this, we model ICL as attribute resolution, where queries are mixtures of some attributes, and ICL identifies and resolves relevant attributes for predictions. In this paper, we propose a mechanistic prototype using toy models trained on synthetic data, and observe: (1) even 1-layer Transformers achieve non-trivial accuracy, with limited benefit from additional demonstrations, (2) scaling models effectively improve accuracy, and (3) inference operations can be decomposed into label space identification and generalized induction, warranting further exploration.
  7. Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages. 
    [PDF] [Abstract
    本研究では,言語の埋め込み表現である単語ベクトルや埋め込み層について,表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し,その冗長度合いを定量評価する.具体的には,(1)Word2Vec や GloVe などの小規模モデルの埋め込みが持つ ID を推定し,(2) Pythia 系列を代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する.実験の結果,埋め込み空間が外在的な次元に比べ低い次元の多様体上に分布する傾向が見られた.また,モデル規模の拡大に伴う冗長率の変化や,学習初期における急激な IDの形成が見られた.
  8. Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
    Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages. 
    [PDF] [Abstract
    文脈内学習 (In-Context Learning; ICL) において, デモンストレーション (デモ) の選択はタスク性能に大きな影響を与える. 既存研究ではデモの選択手順については研究されているが, 選択基準であるデモの性質は十分に調べられていない. 本研究では, デモの「親和性」と「多様性」という 2 つの性質を新たに提案し, その内の親和性が性質が複数のモデルおよびデータセットにおいてデモ選択に望ましい性質であることを示した. さらに, 既存手法で選ばれたデモが, 2 つの性質のタスク性能を向上させる方向へ集約していることを示し, デモ選択とタスク性能のメカニズム解明への示唆を得た.
  9. StaICC: Standardized Evaluation for Classification Task in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Naoya Inoue
    Symposium of Young Researcher Association for NLP Studies (YANS). 2025.  Poster Only.
    [Poster
  10. Image Feature Vectors are Frozen Informative Tokens for Language Models
    Mariko Kato, Hakaze Cho, Zhenzhu Yan, Yuting Shi, Naoya Inoue
    Symposium of Young Researcher Association for NLP Studies (YANS). 2025.  Poster Only.
  11. Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
    The 260th SIG for Natural Language, Information Processing Society of Japan (SIG-NL260, IPSJ). 2024. 17 pages.  Oral, Research Award for Young Scholars.
    [PDF] [Slides] [Abstract
    文脈内学習 (In-Context Learning; ICL) のタスクでは通常,ラベル空間に含まれるラベルトークンの生成確率を比べることで推論結果を決定するが,そのラベルトークンの選択は人間により恣意的に行われる.いくつかの先行研究は,これらのラベルトークンの生成確率の較正が ICL の性能向上に寄与することを明らかにしたが,これらの手法には依然として,人間が最適ではないラベルトークンを選べてしまうという問題が残る.そこで,本研究ではまず (1) LLM の隠れ状態を分析することで,現行のトークンベースの較正手法では,隠れ状態が持つ有益な情報をうまく表現出来ないことを明らかにする.そして,(2) 人間によるラベルトークン選択の影響を低減し,隠れ状態に含まれる有益な情報を効果的に利用出来る新たな ICL の手法を提案する.実験の結果,我々の提案手法は 3 つのモデルと 10 個の分類データセットでの実験で,現在のトークンベースの較正手法を約 20% 上回る性能を発揮した.
  12. NoisyICL: A Little Noise in Model Parameters Can Calibrate In-context Learning
    Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages.  Oral.
    [PDF] [Slides] [Abstract
    In-Context Learning (ICL), where language models learn tasks in a generative form from few-shot demonstrations without parameter update, is emerging while scaling up the language models. Nevertheless, the performance of ICL is still unsatisfactory. Some previous studies suggested that it is due to under-calibration and they fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for a calibration. Our experiments on 2 models and 7 downstream task datasets show that NoisyICL helps perform ICL better. Our further analysis indicates that NoisyICL can enable the model to provide more fair predictions, with less unfaithful confidence. So, NoisyICL can be considered as an effective calibration.
  13. Can LLM Learn Prompt Format in In-context Learning?
    Yoshihiro Sakai, Hakaze ChoNaoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages.  SB Intuitions Awards.
    [PDF] [Abstract
    In-Context Learning (文脈内学習;ICL) は,プロンプト中に与えられた少数のデモなどからパラメータを更新することなくタスクを学習する LLM の能力であるが,そのメカニズムは十分に明らかにされていない.先行研究の実験は,「タスクの入力の後にラベルを出力する」というフォーマットを LLM に示すことが特に重要である可能性を示唆する.そこで本研究では,LLM が与えられたデモから答え方のフォーマットを学習する様子を直接的に可視化した.結果として,(1) 確かに LLM はデモから答え方のフォーマットを学んでいること,(2) フォーマットの学習は意味の無いラベルについても可能であること,(3) 最悪のラベルが ICL の Macro-F1 を大きく向上させることを発見した.
  14. Find-the-Common: Benchmarking Inductive Reasoning Ability on Vision-Language Models
    Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages. 
    [PDF] [Abstract
    Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs) have revolutionized the landscape of integrated vision and language understanding. However, Inductive Visual Reasoning—a vital skill for textimage understanding—remains underexplored due to the absence of benchmarks. So, in this paper, we introduce Find–the–Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including implicit reasoning, symbolic reasoning, and implicit-symbolic reasoning with various models. Extensive experiments show that even state-ofthe-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset is available online.

Thesis

  1. The Mechanistic Basis of In-context Learning
    Yufeng Zhao
    Ph.D. Dissertation @ Japan Advanced Institute of Science and Technology. 2026. 223 pages.
  2. Fine-tuning with Randomly Initialized Downstream Network: Finding a Stable Convex-loss Region in Parameter Space
    Yufeng Zhao
    Master’s Thesis - Rank A @ Beijing Institute of Technology. 2023. 81 pages.
  3. Synthesis and Self-Assembly of Aggregation-induced Emission Compounds
    Yufeng Zhao
    Bachelor Thesis @ Beijing Institute of Technology. 2021. 52 pages.

Resume

Professional Activities

Peer Review

  • Association for Computational Linguistics Rolling Review (ACL ARR): 2025 (May, July, October), 2026 (January)
  • Conference on Neural Information Processing Systems (NeurIPS): 2025
  • International Conference on Learning Representations (ICLR): 2025, 2026
  • International Conference on Machine Learning (ICML): 2025 Actionable Interpretability Workshop
  • Annual Meeting of the Association for Computational Linguistics (ACL): 2025 Student Research Workshop

Society Member

  • Student Member, The Japanese Association for Natural Language Processing
  • Student Member, The Japanese Society for Artificial Intelligence
  • Association for Computational Linguistics (ACL)

Grants

  • Principal Investigator: Towards Mechanistic Controllability: Circuit-based Behavior Correction for Large Language Models
    RIKEN SPDR Grant, 2026.4 ~ 2029.3, JPY 3,000,000.

Awards

  • Outstanding Paper @ The 31st Annual Conference of the (Japanese) Association for Natural Language Processing (NLP2025, ANLP). 2025. (top 14 in 765, 2.0%)
  • Research Award for Young Scholars @ The 260th SIG for Natural Language, Information Processing Society of Japan (SIG-NL260, IPSJ). 2024.
  • SB Intuitions Awards @ The 30st Annual Conference of the Japanese Association for Natural Language Processing (NLP2024, ANLP). 2024.
  • Monbukagakusho Honors Scholarship @ Japanese Ministry of Education, Culture, Sports, Science and Technology. 2023.
  • Outstanding Oral Presentation @ 2022 Euro-Asia Conference on Frontiers of Computer Science and Information Technology. 2022.
  • GPA Improvement Award @ Beijing Institute of Technology. 2020. I missed (medical) many exams in 2019, so my regular GPA in 2020 were considered a significant improvement.
  • Annual Outstanding Academic (GPA) Scholarship @ Beijing Institute of Technology. 2018, 2019, 2021, 2022, 2023.
  • First Prize @ 30th Chinese (High School) Chemistry Olympiad. 2016.
  • Second Prize @ 29th Chinese (High School) Chemistry Olympiad. 2015.

Hakaze Cho / Yufeng Zhao / 趙 羽風

博士後期課程3年生 @ 北陸先端科学技術大学院大学コンピューティング科学研究領域
リサーチアシスタント @ RebelsNLU,指導教員:井之上 直也 准教授

正式名:Yufeng Zhao(漢字表記:「趙 羽風」,発音は難しいので日本語発音を選択しました)
生年:1999年, 北京生まれ

連絡先

Twitter     GitHub     Researchmap     OpenReview     ORCID     CV
住所:石川県能美市旭台1-1 北陸先端科学技術大学院大学 情報科学研究科 I棟 I-52室
E-mail:yfzhao [at] jaist.ac.jp
このメールアドレスは常に受信できるとは限りません. もし私から返信がない場合は,yfZhao495 [at] outlook.comにも併せて送ってみてください.

紹介

私は中国のトップ大学である北京理工大学を卒業し,2021年に化学の学士号,2023年にソフトウェア工学の修士号を取得しました. 現在はJAISTにて博士課程に在籍しており,2026年3月の早期修了を目指しています. 研究テーマは,人工ニューラルネットワーク,特にTransformerベースのニューラル言語モデルにおける訓練・推論中の内部挙動を,数学的および表現学習の手法によって解明し,その理解に基づく性能向上を目指すものです. 2023年以降,この分野において30本以上の論文および研究発表を発信しており,その中にはICLRやNeurIPSといったトップカンファレンスに採択されたものも含まれます.

研究協力募集. この研究分野に関心のある方との効率的な共同研究を積極的に募集しています. (私が考える「効率的な」の基準は,1年にトップカンファレンスの論文を2〜3本発表することである. これは他人に求めるものではなく,あくまで自分自身に課している目標である.)ご興味をお持ちの方は,ぜひお気軽にご連絡ください. 専門家だけでなく,意欲と学習効率の高い初心者との協力も歓迎します. また,他分野での共同研究についても柔軟に対応いたします.

仕事募集. 2026年4月より開始するフルタイムの雇用契約をすでに締結しており,客員および非常勤での職も歓迎いたします. あわせて,2029年4月開始の准教授職を探しております.

研究関心

キーワード:表現学習,機械論的解釈可能性,文脈内学習

  • 人工ニューラルネットワークの解釈可能性:機械論的解釈可能性(特にTransformer)
    [ICLR 2025] [NeurIPS 2025] [COLING 2025]
  • 人工ニューラルネットワークの制御可能性:低リソースモデル改善 / 機械論的視点からのモデル制御
    [NAACL 2025] [BlackboxNLP 2025]
  • その他:多様体学習,低数値精度ニューラルネットワーク,モデル訓練ダイナミクス
    [ArXiv] [ArXiv]

論文一覧

[Export Publication List as TXT] [Google Scholar] [Semantic Scholar] [DBLP]
発表数: 31, 累積IF: 96.4, 総ページ数: 865.

国際会議

  1. Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
    Haolin YangHakaze Cho, Yiqiao Zhong, Naoya Inoue
    Annual Conference on Neural Information Processing Systems (NeurIPS). 2025. 52 pages. [h5=371, IF=23.3]
    [OpenReview] [PDF] [arXiv] [Poster] [Github] [Abstract] [Bibtex
    The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge the gap between attention heads and task vectors, offering a unified account of ICL's underlying mechanisms.
    @inproceedings{yang2025unifying,
        title={Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning},
        author={Yang, Haolin and Cho, Hakaze and Zhong, Yiqiao and Inoue, Naoya},
        booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
        year={2025},
        url={https://openreview.net/forum?id=FIfjDqjV0B}
    }
  2. Mechanistic Fine-tuning for In-context Learning
    Hakaze Cho, Peng Luo, Mariko Kato, Rin Kaenbyou, Naoya Inoue
    BlackboxNLP Workshop2025. 28 pages.  Workshop at EMNLP 2025.
    [ACL Anthology] [PDF] [arXiv] [Github] [Abstract] [Bibtex
    In-context Learning (ICL) utilizes structured demonstration-query inputs to induce few-shot learning on Language Models (LMs), which are not originally pre-trained on ICL-style data. To bridge the gap between ICL and pre-training, some approaches fine-tune LMs on large ICL-style datasets by an end-to-end paradigm with massive computational costs. To reduce such costs, in this paper, we propose Attention Behavior Fine-Tuning (ABFT), utilizing the previous findings on the inner mechanism of ICL, building training objectives on the attention scores instead of the final outputs, to force the attention scores to focus on the correct label tokens presented in the context and mitigate attention scores from the wrong label tokens. Our experiments on 9 modern LMs and 8 datasets empirically find that ABFT outperforms in performance, robustness, unbiasedness, and efficiency, with only around 0.01% data cost compared to the previous methods. Moreover, our subsequent analysis finds that the end-to-end training objective contains the ABFT objective, suggesting the implicit bias of ICL-style data to the emergence of induction heads. Our work demonstrates the possibility of controlling specific module sequences within LMs to improve their behavior, opening up the future application of mechanistic interpretability.
    @inproceedings{cho2025mechanistic,
        title={Mechanistic Fine-tuning for In-context Learning},
        author={Cho, Hakaze and Luo, Peng and Kato, Mariko and Kaenbyou, Rin and Inoue, Naoya},
        booktitle = "Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP",
        year={2025},
        url={https://arxiv.org/abs/2505.14233}
    }
  3. Revisiting In-context Learning Inference Circuit in Large Language Models
    Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
    International Conference on Learning Representations (ICLR). 2025. 37 pages. [h5=362, IF=48.9]
    [OpenReview] [PDF] [arXiv] [Github] [Poster] [Review] [Abstract] [Bibtex
    In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations of demonstrations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. Through careful measurements, the proposed inference circuit successfully captures and unifies many fragmented phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.
    @inproceedings{cho2025revisiting,
        title={Revisiting In-context Learning Inference Circuit in Large Language Models},
        author={Hakaze Cho and Mariko Kato and Yoshihiro Sakai and Naoya Inoue},
        booktitle={The Thirteenth International Conference on Learning Representations},
        year={2025},
        url={https://openreview.net/forum?id=xizpnYNvQq}
    }
  4. Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
    Annual Conference of the Nations of the Americas Chapter of the ACL (NAACL.main). 2025. 24 pages. [h5=126, IF=16.5]
    [ACL Anthology] [PDF] [arXiv] [Github] [Poster] [Review] [Abstract] [Bibtex
    In-Context Learning (ICL) typically utilizes classification criteria from output probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation applied. To address this problem, we propose Hidden Calibration, which renounces token probabilities and uses the nearest centroid classifier on the LM’s last hidden states. In detail, we assign the label of the nearest centroid previously estimated from a calibration set to the test sample as the predicted label. Our experiments on 6 models and 10 classification datasets indicate that Hidden Calibration consistently outperforms current token-based baselines by about 20%~50%, achieving a strong state-of-the-art in ICL. Our further analysis demonstrates that Hidden Calibration finds better classification criteria with less inter-class overlap, and LMs provide linearly separable intra-class clusters with the help of demonstrations, which supports Hidden Calibration and gives new insights into the principle of ICL. Our official code implementation can be found at https://github.com/hc495/Hidden_Calibration.
    @inproceedings{cho2025token,
        title={Token-based Decision Criteria Are Suboptimal in In-context Learning},
        author={Hakaze Cho and Yoshihiro Sakai and Mariko Kato and Kenshiro Tanaka and Akira Ishii and Naoya Inoue},
        booktitle={Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
        year={2025},
        url={https://aclanthology.org/2025.naacl-long.278/}
    }
  5. Understanding Token Probability Encoding in Output Embeddings
    Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue
    International Conference on Computational Linguistics (COLING). 2025. 16 pages. [h5=81, IF=7.7]
    [ACL Anthology] [PDF] [arXiv] [Poster] [Abstract] [Bibtex
    In this paper, we investigate the output token probability information in the output embedding of language models. We find an approximate common log-linear encoding of output token probabilities within the output embedding vectors and empirically demonstrate that it is accurate and sparse. As a causality examination, we steer the encoding in output embedding to modify the output probability distribution accurately. Moreover, the sparsity we find in output probability encoding suggests that a large number of dimensions in the output embedding do not contribute to causal language modeling. Therefore, we attempt to delete the output-unrelated dimensions and find more than 30% of the dimensions can be deleted without significant movement in output distribution and sequence generation. Additionally, in the pre-training dynamics of language models, we find that the output embeddings capture the corpus token frequency information in early steps, even before an obvious convergence of parameters starts.
    @inproceedings{cho2025understanding,
        title={Understanding Token Probability Encoding in Output Embeddings},
        author={Hakaze Cho and Yoshihiro Sakai and Kenshiro Tanaka and Mariko Kato and Naoya Inoue},
        booktitle={Proceedings of the 31st International Conference on Computational Linguistics},
        year={2025},
        url={https://aclanthology.org/2025.coling-main.708/}
    }
  6. Find-the-Common: A Benchmark for Explaining Visual Patterns from Images
    Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
    International Conference on Language Resources and Evaluation (LREC). 2024. 7 pages. [h5=68]
    [ACL Anthology] [PDF] [Abstract] [Bibtex
    Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs), such as GPT-4V and InstructBLIP, have prompted some studies have started an in-depth analysis of the reasoning capabilities of IVLMs. However, Inductive Visual Reasoning, a vital skill for text-image understanding, remains underexplored due to the absence of benchmarks. In this paper, we introduce Find-the-Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including Image-Based Reasoning, Text-Based Reasoning, and Image-Text-Based Reasoning with various models. Extensive experiments show that even state-of-the-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset has been released and is available online: https://github.com/SSSSSeki/Find-the-common.
    @inproceedings{shi2024find,
        title={Find-the-Common: A Benchmark for Explaining Visual Patterns from Images},
        author={Yuting Shi and Naoya Inoue and Houjing Wei and Yufeng Zhao and Tao Jin},
        booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
        year={2024},
        url={https://aclanthology.org/2024.lrec-main.642/}
    }

プレプリント

  1. Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
    Haolin YangHakaze ChoNaoya Inoue
    Pre-print. 2025. 45 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.
    @article{yang2025localizingtaskrecognitiontask,
        title={Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis},
        author={Yang, Haolin and Cho, Hakaze and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.24164},
        year={2025}
    }
  2. Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight
    Haolin YangHakaze ChoKaize DingNaoya Inoue
    Pre-print. 2025. 48 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    Large Language Models (LLMs) can perform new tasks from in-context demonstrations, a phenomenon known as in-context learning (ICL). Recent work suggests that these demonstrations are compressed into task vectors (TVs), compact task representations that LLMs exploit for predictions. However, prior studies typically extract TVs from model outputs or hidden states using cumbersome and opaque methods, and they rarely elucidate the mechanisms by which TVs influence computation. In this work, we address both limitations. First, we propose directly training Learned Task Vectors (LTVs), which surpass extracted TVs in accuracy and exhibit superior flexibility-acting effectively at arbitrary layers, positions, and even with ICL prompts. Second, through systematic analysis, we investigate the mechanistic role of TVs, showing that at the low level they steer predictions primarily through attention-head OV circuits, with a small subset of 'key heads' most decisive. At a higher level, we find that despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces to improve logits of relevant labels, while later TVs are predominantly scaled in magnitude. Taken together, LTVs not only provide a practical approach for obtaining effective TVs but also offer a principled lens into the mechanistic foundations of ICL.
    @article{yang2025taskvectorslearnedextracted,
        title={Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight},
        author={Yang, Haolin and Cho, Hakaze and Ding, Kaize and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.24169},
        year={2025}
    }
  3. Binary Autoencoder for Mechanistic Interpretability of Large Language Models
    Hakaze ChoHaolin YangBrian M. KurkoskiNaoya Inoue
    Pre-print. 2025. 36 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    Existing works are dedicated to untangling atomized numerical components (features) from the hidden states of Large Language Models (LLMs) for interpreting their mechanism. However, they typically rely on autoencoders constrained by some implicit training-time regularization on single training instances (i.e., normalization, top-k function, etc.), without an explicit guarantee of global sparsity among instances, causing a large amount of dense (simultaneously inactive) features, harming the feature sparsity and atomization. In this paper, we propose a novel autoencoder variant that enforces minimal entropy on minibatches of hidden activations, thereby promoting feature independence and sparsity across instances. For efficient entropy calculation, we discretize the hidden activations to 1-bit via a step function and apply gradient estimation to enable backpropagation, so that we term it as Binary Autoencoder (BAE) and empirically demonstrate two major applications: (1) Feature set entropy calculation. Entropy can be reliably estimated on binary hidden activations, which we empirically evaluate and leverage to characterize the inference dynamics of LLMs and In-context Learning. (2) Feature untangling. Similar to typical methods, BAE can extract atomized features from LLM's hidden states. To robustly evaluate such feature extraction capability, we refine traditional feature-interpretation methods to avoid unreliable handling of numerical tokens, and show that BAE avoids dense features while producing the largest number of interpretable ones among baselines, which confirms the effectiveness of BAE serving as a feature extractor.
    @article{cho2025binary,
        title={Binary Autoencoder for Mechanistic Interpretability of Large Language Models},
        author={Cho, Hakaze and Yang, Haolin and Kurkoski, Brian M. and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.20997},
        year={2025}
    }
  4. Mechanism of Task-oriented Information Removal in In-context Learning
    Hakaze ChoHaolin YangGouki MinegishiNaoya Inoue
    Pre-print. 2025. 87 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.
    @article{cho2025mechanism,
        title={Mechanism of Task-oriented Information Removal in In-context Learning},
        author={Cho, Hakaze and Yang, Haolin and Minegishi, Gouki and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.21012},
        year={2025}
    }
  5. Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Pre-print. 2025. 6 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    In this study, we measure the Intrinsic Dimension (ID) of token embedding to estimate the intrinsic dimensions of the manifolds spanned by the representations, so as to evaluate their redundancy quantitatively compared to their extrinsic dimensionality. In detail, (1) we estimate the ID of token embeddings in small-scale language models and also modern large language models, finding that the embedding spaces often reside on lower-dimensional manifolds compared to their extrinsic dimensionality; (2) we measure the ID across various model sizes and observe an increase in redundancy rates as the model scale grows; (3) we measure the dynamics of IDs during the training process, and find a rapid ID drop in the early stages of training. Moreover, (4) when LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs, suggesting that the ID can serve as a useful guideline for LoRA application.
    @article{kataiwa2025measuring,
        title={Measuring Intrinsic Dimension of Token Embeddings},
        author={Kataiwa, Takuya and Cho, Hakaze and Ohki, Tetsushi},
        journal={arXiv preprint arXiv:2503.02142},
        year={2025}
    }
  6. Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
    Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
    Pre-print. 2025. 8 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    The performance of In-Context Learning (ICL) is highly sensitive to the selected demonstrations. Existing approaches to demonstration selection optimize different objectives, yielding inconsistent results. To address this, we propose a unified metric--affinity and diversity--that leverages ICL model's internal representations. Our experiments show that both affinity and diversity strongly correlate with test accuracies, indicating their effectiveness for demonstration selection. Moreover, we show that our proposed metrics align well with various previous works to unify the inconsistency.
    @article{kato2025affinity,
        title={Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations},
        author={Kato, Mariko and Cho, Hakaze and Sakai, Yoshihiro and Inoue, Naoya},
        journal={arXiv preprint arXiv:2502.14380},
        year={2025}
    }
  7. StaICC: Standardized Evaluation for Classification Task in In-context Learning
    Hakaze ChoNaoya Inoue
    Pre-print. 2025. 20 pages. 
    [PDF] [arXiv] [Github] [PyPI] [Abstract] [Bibtex
    Classification tasks are widely investigated in the In-Context Learning (ICL) paradigm. However, current efforts are evaluated on disjoint benchmarks and settings, while their performances are significantly influenced by some trivial variables, such as prompt templates, data sampling, instructions, etc., which leads to significant inconsistencies in the results reported across various literature, preventing fair comparison or meta-analysis across different papers. Therefore, this paper proposes a standardized and easy-to-use evaluation toolkit (StaICC) for in-context classification. Including, for the normal classification task, we provide StaICC-Normal, selecting 10 widely used datasets, and generating prompts with a fixed form, to mitigate the variance among the experiment implementations. To enrich the usage of our benchmark, we also provide a sub-benchmark StaICC-Diag for diagnosing ICL from several aspects, aiming for a more robust inference processing.
    @article{cho2025staicc,
        title={StaICC: Standardized Evaluation for Classification Task in In-context Learning},
        author={Cho, Hakaze and Inoue, Naoya},
        journal={arXiv preprint arXiv:2501.15708},
        year={2025}
    }
  8. NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning
    Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
    Pre-print. 2024. 20 pages. 
    [PDF] [arXiv] [Github] [Abstract] [Bibtex
    In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibration. Our experiments on two models and 12 downstream datasets show that NoisyICL can help ICL produce more accurate predictions. Our further analysis indicates that NoisyICL enables the model to provide more fair predictions, and also with more faithful confidence. Therefore, we believe that NoisyICL is an effective calibration of ICL. Our experimental code is uploaded to Github.
    @article{zhao2024noisyicl,
        title={NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning},
        author={Zhao, Yufeng and Sakai, Yoshihiro and Inoue, Naoya},
        journal={arXiv preprint arXiv:2402.05515},
        year={2024}
    }

国内会議・ジャーナル・その他

† = 国際会議論文の日本国内再録; 通常: 査読なし,▲ = 査読あり
  1. Conference Note: Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze ChoNaoya Inoue
    Journal of Natural Language Processing (JNLP). 2025. 6 pages. 
    [PDF
  2. ▲†Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    本稿では,言語の埋め込み表現である単語ベクトルや埋め込み層について,表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し,その冗長度合いを定量評価する.具体的には,(1) Word2Vec や GloVe などの小規模モデルの埋め込みが持つIDを推定し,(2) Pythiaシリーズを代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する.実験の結果,埋め込み空間が外在次元に比べ低い次元の多様体上に分布する傾向が見られた.また,モデル規模の拡大に伴う冗長率の変化や,学習初期において ID が急速に収束する傾向が観察された.また,推定されたIDがLoRA適用時のランク選択に有効な可能性を示した.
  3. Analysis of Internal Representations of Knowledge with Expressions of Familiarity
    Kenshiro Tanaka, Yoshihiro Sakai, Hakaze ChoNaoya Inoue, Kai Sato, Ryosuke Takahashi, Benjamin HeinzerlingKentaro Inui
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    大規模言語モデル (LLM) の知識の既知性判断能力に関する研究が進められつつあるが、「It is known that…」のような既知性を示す言語表現を伴う知識を学習した際に、推論時にLLMがその知識の既知性を判断する能力については、検討されていない。本研究では、事前学習済みLLMに既知性を示す言語表現を付与した知識の記述を学習させ、その知識の内部表象を分析することで、既知性がどのようにLLMの内部に表現され得るのかを分析する。その結果、(1)知識の内部表象には、学習時に付与した言語表現毎に個別に既知性の情報が保持されていること、(2)既知性の情報は言語表現の記述位置毎に個別に保持されることが明らかになった。本研究は、LLMの既知性の判断能力のメカニズム解明の足がかりとなるものである。
  4. Internal Representations of Knowledge Recognition in Language Models
    Kai Sato, Ryosuke Takahashi, Benjamin Heinzerling, Kenshiro Tanaka, Hakaze Cho, Yoshihiro Sakai, Naoya InoueKentaro Inui
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    言語モデル(LM)の知識獲得能力は広く研究されているが,獲得した知識の既知性に関する判断機序については十分な理解が得られていない.本研究ではLMを用いて,特定の知識に対する出力生成時と既知性判断時の内部状態を比較分析した.結果として,言語モデルが実際に既知性判断を行う能力を持ち得ることが示され,(1)知識を学習した時点で,既知性を判断するための情報が内部表現中に存在すること,(2)既知と判断される知識と未知と判断される知識において,LMがそれぞれ異なる活性化パターンを示すことを明らかにした.これらの知見は,LMの既知性判断メカニズムの理解へ向けた手がかりを提供する.
  5. Revisiting In-context Learning Inference Circuit in Large Language Models
    Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages.  Oral, Outstanding Paper.
    [PDF] [Slides] [Abstract
    In-context Learning (ICL) は,言語モデルにおける新たな少数ショット学習パラダイムとして注目されているが,その内在的メカニズムは十分に解明されていない. 本研究では,ICL の推論ダイナミクスを3 つの基本操作に分解し,それらを基盤として推論回路を構築した上で精密な測定を行い,従来の研究で観察されてきた現象を統一的に説明することを試みた. さらに,提案した回路を無効化するアブレーション分析の結果,ICL の性能が顕著に低下することが確認され,提案した推論回路が ICL の主要なメカニズムであることが示唆された.
  6. Beyond the Induction Circuit: A Mechanistic Prototype for Out-of-domain In-context Learning
    Hakaze ChoNaoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages. 
    [PDF] [Poster] [Abstract
    In-contextLearning (ICL) is a promising few-shot learning paradigm with unclear mechanisms. Existing explanations heavily rely on Induction Heads, which fail to account for out-of-domain ICL, where query labels are absent in demonstrations. To address this, we model ICL as attribute resolution, where queries are mixtures of some attributes, and ICL identifies and resolves relevant attributes for predictions. In this paper, we propose a mechanistic prototype using toy models trained on synthetic data, and observe: (1) even 1-layer Transformers achieve non-trivial accuracy, with limited benefit from additional demonstrations, (2) scaling models effectively improve accuracy, and (3) inference operations can be decomposed into label space identification and generalized induction, warranting further exploration.
  7. Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages. 
    [PDF] [Abstract
    本研究では,言語の埋め込み表現である単語ベクトルや埋め込み層について,表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し,その冗長度合いを定量評価する.具体的には,(1)Word2Vec や GloVe などの小規模モデルの埋め込みが持つ ID を推定し,(2) Pythia 系列を代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する.実験の結果,埋め込み空間が外在的な次元に比べ低い次元の多様体上に分布する傾向が見られた.また,モデル規模の拡大に伴う冗長率の変化や,学習初期における急激な IDの形成が見られた.
  8. Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
    Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages. 
    [PDF] [Abstract
    文脈内学習 (In-Context Learning; ICL) において, デモンストレーション (デモ) の選択はタスク性能に大きな影響を与える. 既存研究ではデモの選択手順については研究されているが, 選択基準であるデモの性質は十分に調べられていない. 本研究では, デモの「親和性」と「多様性」という 2 つの性質を新たに提案し, その内の親和性が性質が複数のモデルおよびデータセットにおいてデモ選択に望ましい性質であることを示した. さらに, 既存手法で選ばれたデモが, 2 つの性質のタスク性能を向上させる方向へ集約していることを示し, デモ選択とタスク性能のメカニズム解明への示唆を得た.
  9. StaICC: Standardized Evaluation for Classification Task in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Naoya Inoue
    Symposium of Young Researcher Association for NLP Studies (YANS). 2025.  Poster Only.
    [Poster
  10. Image Feature Vectors are Frozen Informative Tokens for Language Models
    Mariko Kato, Hakaze Cho, Zhenzhu Yan, Yuting Shi, Naoya Inoue
    Symposium of Young Researcher Association for NLP Studies (YANS). 2025.  Poster Only.
  11. Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
    The 260th SIG for Natural Language, Information Processing Society of Japan (SIG-NL260, IPSJ). 2024. 17 pages.  Oral, Research Award for Young Scholars.
    [PDF] [Slides] [Abstract
    文脈内学習 (In-Context Learning; ICL) のタスクでは通常,ラベル空間に含まれるラベルトークンの生成確率を比べることで推論結果を決定するが,そのラベルトークンの選択は人間により恣意的に行われる.いくつかの先行研究は,これらのラベルトークンの生成確率の較正が ICL の性能向上に寄与することを明らかにしたが,これらの手法には依然として,人間が最適ではないラベルトークンを選べてしまうという問題が残る.そこで,本研究ではまず (1) LLM の隠れ状態を分析することで,現行のトークンベースの較正手法では,隠れ状態が持つ有益な情報をうまく表現出来ないことを明らかにする.そして,(2) 人間によるラベルトークン選択の影響を低減し,隠れ状態に含まれる有益な情報を効果的に利用出来る新たな ICL の手法を提案する.実験の結果,我々の提案手法は 3 つのモデルと 10 個の分類データセットでの実験で,現在のトークンベースの較正手法を約 20% 上回る性能を発揮した.
  12. NoisyICL: A Little Noise in Model Parameters Can Calibrate In-context Learning
    Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages.  Oral.
    [PDF] [Slides] [Abstract
    In-Context Learning (ICL), where language models learn tasks in a generative form from few-shot demonstrations without parameter update, is emerging while scaling up the language models. Nevertheless, the performance of ICL is still unsatisfactory. Some previous studies suggested that it is due to under-calibration and they fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for a calibration. Our experiments on 2 models and 7 downstream task datasets show that NoisyICL helps perform ICL better. Our further analysis indicates that NoisyICL can enable the model to provide more fair predictions, with less unfaithful confidence. So, NoisyICL can be considered as an effective calibration.
  13. Can LLM Learn Prompt Format in In-context Learning?
    Yoshihiro Sakai, Hakaze ChoNaoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages.  SB Intuitions Awards.
    [PDF] [Abstract
    In-Context Learning (文脈内学習;ICL) は,プロンプト中に与えられた少数のデモなどからパラメータを更新することなくタスクを学習する LLM の能力であるが,そのメカニズムは十分に明らかにされていない.先行研究の実験は,「タスクの入力の後にラベルを出力する」というフォーマットを LLM に示すことが特に重要である可能性を示唆する.そこで本研究では,LLM が与えられたデモから答え方のフォーマットを学習する様子を直接的に可視化した.結果として,(1) 確かに LLM はデモから答え方のフォーマットを学んでいること,(2) フォーマットの学習は意味の無いラベルについても可能であること,(3) 最悪のラベルが ICL の Macro-F1 を大きく向上させることを発見した.
  14. Find-the-Common: Benchmarking Inductive Reasoning Ability on Vision-Language Models
    Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages. 
    [PDF] [Abstract
    Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs) have revolutionized the landscape of integrated vision and language understanding. However, Inductive Visual Reasoning—a vital skill for textimage understanding—remains underexplored due to the absence of benchmarks. So, in this paper, we introduce Find–the–Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including implicit reasoning, symbolic reasoning, and implicit-symbolic reasoning with various models. Extensive experiments show that even state-ofthe-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset is available online.

学位論文

  1. 大規模言語モデルにおける文脈内学習のメカニズム的基盤
    Yufeng Zhao
    博士論文 @ 北陸先端科学技術大学院大学,2026. 223 pages.
  2. Fine-tuning with Randomly Initialized Downstream Network: Finding a Stable Convex-loss Region in Parameter Space
    Yufeng Zhao
    修士論文 - 評価A @ 北京理工大学,2023. 81 pages.
  3. Synthesis and Self-Assembly of Aggregation-induced Emission Compounds
    Yufeng Zhao
    学士論文 @ 北京理工大学,2021. 52 pages.

履歴

学術活動

論文査読

  • Association for Computational Linguistics Rolling Review (ACL ARR): 2025 (May, July, October), 2026 (January)
  • Conference on Neural Information Processing Systems (NeurIPS): 2025
  • International Conference on Learning Representations (ICLR): 2025, 2026
  • International Conference on Machine Learning (ICML): 2025 Actionable Interpretability Workshop
  • Annual Meeting of the Association for Computational Linguistics (ACL): 2025 Student Research Workshop

所属学会

  • 言語処理学会 学生会員
  • 人工知能学会 学生会員
  • Association for Computational Linguistics (ACL)

競争的研究経費

  • Principal Investigator:機械論的制御可能性に向けた大規模言語モデルの回路ベース編集
    理化学研究所 基礎科学特別研究員研究経費,2026年4月 ~ 2029年3月,300万円.

受賞歴

  • 優秀賞 @ 言語処理学会 第31回年次大会 (NLP2025,ANLP),2025 (全765件中上位15件,2.0%)
  • 若手奨励賞 @ 情報処理学会 第260回NL研究会 (SIG-NL260),2024
  • スポンサー賞 (SB Intuitions Awards) @ 言語処理学会 第30回年次大会 (NLP2024),2024
  • 文部科学省外国人留学生学習奨励費 @ 文部科学省,2023
  • 優秀口頭発表賞 @ 2022年欧亜フロンティアコンピュータ科学技術国際会議
  • GPA向上賞 @ 北京理工大学,2020 2019年に健康状態のため多くの試験を欠席したため,2020年の通常のGPAは顕著な向上と見なされました
  • 北京理工大学 年間(GPA)優秀賞:2018,2019,2021,2022,2023
  • 一等賞 @ 第30回中国(高校)化学オリンピック. 2016.
  • 二等賞 @ 第29回中国(高校)化学オリンピック. 2015.

Hakaze Cho / Yufeng Zhao / 赵羽风

博士3年级学生 @ Graduate School of Information Science, Japan Advanced Institute of Science and Technology
全额资助科研助理 @ RebelsNLU, PI: 井之上直也 副教授

正式姓名:Yufeng Zhao(该拼写对于不同文化背景的人存在不可克服的发音困难,因此我选择使用姓名汉字的日文拼写方式)
出生:北京市,1999

联系方式

Twitter     GitHub     Researchmap     OpenReview     ORCID     CV
物理地址:Laboratory I-52, Information Science Building I, 1-1 Asahidai, Nomi, Ishikawa, Japan
E-mail:yfzhao [at] jaist.ac.jp
本邮箱地址并非完全可达. 如果您未收到我的回复,请尝试抄送至 yfZhao495 [at] outlook.com.

简介

我毕业于北京理工大学(985/211工程),在2023年获得软件工程硕士学位,在2021年获得化学学士学位. 我计划于2026年3月在JAIST提前完成博士学位,目前正在执行博士毕业的最终手续. 我的研究关注于主要通过数学和表征学习方法分析人工神经网络(特别是基于Transformer的大型语言模型)的内部机制,并通过这些理解稳健地提升其性能. 自2023年以来,在本领域,我已发表30余篇论文及演讲,有些发表在国际知名顶级会议,如ICLR及NeurIPS.

科研合作声明. 我目前正在积极寻求高效的科研合作. (我个人对于“高效”的定义是:在一年之内推出2-3篇顶级会议论文. 但这是我对自己的要求,而非针对潜在的合作者.) 如果您对合作完成上述领域的顶级会议论文有兴趣,无论您有何种科研背景或经验,非常欢迎与我联系. 同时,也欢迎讨论其它领域的科研合作.

求职声明. 我已签订从2026年4月开始的全职工作合同,但我欢迎客座或兼职研究工作. 同时,我正在寻找从2029年4月开始的副教授职位.

研究方向

关键词:表征学习,机制可解释性,上下文学习

  • 人工神经网络的可解释性:(Transformer模型的)机制可解释性
    [ICLR 2025] [NeurIPS 2025] [COLING 2025]
  • 人工神经网络的可控制性:从机制可解释性视角的低资源模型行为优化与控制
    [NAACL 2025] [BlackboxNLP 2025]
  • 其它:流形学习,低数值精度神经网络,神经网络训练动力学
    [ArXiv] [ArXiv]

发表论文

[Export Publication List as TXT] [Google Scholar] [Researchmap] [Semantic Scholar] [DBLP]
论文数: 31, 累计影响因子: 96.4, 总页数: 865.

国际会议

  1. Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
    Haolin YangHakaze Cho, Yiqiao Zhong, Naoya Inoue
    Annual Conference on Neural Information Processing Systems (NeurIPS). 2025. 52 pages. [h5=371, IF=23.3]
    [OpenReview] [PDF] [arXiv] [Poster] [Github] [Abstract] [Bibtex
    The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge the gap between attention heads and task vectors, offering a unified account of ICL's underlying mechanisms.
    @inproceedings{yang2025unifying,
        title={Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning},
        author={Yang, Haolin and Cho, Hakaze and Zhong, Yiqiao and Inoue, Naoya},
        booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
        year={2025},
        url={https://openreview.net/forum?id=FIfjDqjV0B}
    }
  2. Mechanistic Fine-tuning for In-context Learning
    Hakaze Cho, Peng Luo, Mariko Kato, Rin Kaenbyou, Naoya Inoue
    BlackboxNLP Workshop2025. 28 pages.  Workshop at EMNLP 2025.
    [ACL Anthology] [PDF] [arXiv] [Github] [Abstract] [Bibtex
    In-context Learning (ICL) utilizes structured demonstration-query inputs to induce few-shot learning on Language Models (LMs), which are not originally pre-trained on ICL-style data. To bridge the gap between ICL and pre-training, some approaches fine-tune LMs on large ICL-style datasets by an end-to-end paradigm with massive computational costs. To reduce such costs, in this paper, we propose Attention Behavior Fine-Tuning (ABFT), utilizing the previous findings on the inner mechanism of ICL, building training objectives on the attention scores instead of the final outputs, to force the attention scores to focus on the correct label tokens presented in the context and mitigate attention scores from the wrong label tokens. Our experiments on 9 modern LMs and 8 datasets empirically find that ABFT outperforms in performance, robustness, unbiasedness, and efficiency, with only around 0.01% data cost compared to the previous methods. Moreover, our subsequent analysis finds that the end-to-end training objective contains the ABFT objective, suggesting the implicit bias of ICL-style data to the emergence of induction heads. Our work demonstrates the possibility of controlling specific module sequences within LMs to improve their behavior, opening up the future application of mechanistic interpretability.
    @inproceedings{cho2025mechanistic,
        title={Mechanistic Fine-tuning for In-context Learning},
        author={Cho, Hakaze and Luo, Peng and Kato, Mariko and Kaenbyou, Rin and Inoue, Naoya},
        booktitle = "Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP",
        year={2025},
        url={https://arxiv.org/abs/2505.14233}
    }
  3. Revisiting In-context Learning Inference Circuit in Large Language Models
    Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
    International Conference on Learning Representations (ICLR). 2025. 37 pages. [h5=362, IF=48.9]
    [OpenReview] [PDF] [arXiv] [Github] [Poster] [Review] [Abstract] [Bibtex
    In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations of demonstrations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. Through careful measurements, the proposed inference circuit successfully captures and unifies many fragmented phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.
    @inproceedings{cho2025revisiting,
        title={Revisiting In-context Learning Inference Circuit in Large Language Models},
        author={Hakaze Cho and Mariko Kato and Yoshihiro Sakai and Naoya Inoue},
        booktitle={The Thirteenth International Conference on Learning Representations},
        year={2025},
        url={https://openreview.net/forum?id=xizpnYNvQq}
    }
  4. Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
    Annual Conference of the Nations of the Americas Chapter of the ACL (NAACL.main). 2025. 24 pages. [h5=126, IF=16.5]
    [ACL Anthology] [PDF] [arXiv] [Github] [Poster] [Review] [Abstract] [Bibtex
    In-Context Learning (ICL) typically utilizes classification criteria from output probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation applied. To address this problem, we propose Hidden Calibration, which renounces token probabilities and uses the nearest centroid classifier on the LM’s last hidden states. In detail, we assign the label of the nearest centroid previously estimated from a calibration set to the test sample as the predicted label. Our experiments on 6 models and 10 classification datasets indicate that Hidden Calibration consistently outperforms current token-based baselines by about 20%~50%, achieving a strong state-of-the-art in ICL. Our further analysis demonstrates that Hidden Calibration finds better classification criteria with less inter-class overlap, and LMs provide linearly separable intra-class clusters with the help of demonstrations, which supports Hidden Calibration and gives new insights into the principle of ICL. Our official code implementation can be found at https://github.com/hc495/Hidden_Calibration.
    @inproceedings{cho2025token,
        title={Token-based Decision Criteria Are Suboptimal in In-context Learning},
        author={Hakaze Cho and Yoshihiro Sakai and Mariko Kato and Kenshiro Tanaka and Akira Ishii and Naoya Inoue},
        booktitle={Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
        year={2025},
        url={https://aclanthology.org/2025.naacl-long.278/}
    }
  5. Understanding Token Probability Encoding in Output Embeddings
    Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue
    International Conference on Computational Linguistics (COLING). 2025. 16 pages. [h5=81, IF=7.7]
    [ACL Anthology] [PDF] [arXiv] [Poster] [Abstract] [Bibtex
    In this paper, we investigate the output token probability information in the output embedding of language models. We find an approximate common log-linear encoding of output token probabilities within the output embedding vectors and empirically demonstrate that it is accurate and sparse. As a causality examination, we steer the encoding in output embedding to modify the output probability distribution accurately. Moreover, the sparsity we find in output probability encoding suggests that a large number of dimensions in the output embedding do not contribute to causal language modeling. Therefore, we attempt to delete the output-unrelated dimensions and find more than 30% of the dimensions can be deleted without significant movement in output distribution and sequence generation. Additionally, in the pre-training dynamics of language models, we find that the output embeddings capture the corpus token frequency information in early steps, even before an obvious convergence of parameters starts.
    @inproceedings{cho2025understanding,
        title={Understanding Token Probability Encoding in Output Embeddings},
        author={Hakaze Cho and Yoshihiro Sakai and Kenshiro Tanaka and Mariko Kato and Naoya Inoue},
        booktitle={Proceedings of the 31st International Conference on Computational Linguistics},
        year={2025},
        url={https://aclanthology.org/2025.coling-main.708/}
    }
  6. Find-the-Common: A Benchmark for Explaining Visual Patterns from Images
    Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
    International Conference on Language Resources and Evaluation (LREC). 2024. 7 pages. [h5=68]
    [ACL Anthology] [PDF] [Abstract] [Bibtex
    Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs), such as GPT-4V and InstructBLIP, have prompted some studies have started an in-depth analysis of the reasoning capabilities of IVLMs. However, Inductive Visual Reasoning, a vital skill for text-image understanding, remains underexplored due to the absence of benchmarks. In this paper, we introduce Find-the-Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including Image-Based Reasoning, Text-Based Reasoning, and Image-Text-Based Reasoning with various models. Extensive experiments show that even state-of-the-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset has been released and is available online: https://github.com/SSSSSeki/Find-the-common.
    @inproceedings{shi2024find,
        title={Find-the-Common: A Benchmark for Explaining Visual Patterns from Images},
        author={Yuting Shi and Naoya Inoue and Houjing Wei and Yufeng Zhao and Tao Jin},
        booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
        year={2024},
        url={https://aclanthology.org/2024.lrec-main.642/}
    }

预印本

  1. Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
    Haolin YangHakaze ChoNaoya Inoue
    Pre-print. 2025. 45 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.
    @article{yang2025localizingtaskrecognitiontask,
        title={Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis},
        author={Yang, Haolin and Cho, Hakaze and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.24164},
        year={2025}
    }
  2. Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight
    Haolin YangHakaze ChoKaize DingNaoya Inoue
    Pre-print. 2025. 48 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    Large Language Models (LLMs) can perform new tasks from in-context demonstrations, a phenomenon known as in-context learning (ICL). Recent work suggests that these demonstrations are compressed into task vectors (TVs), compact task representations that LLMs exploit for predictions. However, prior studies typically extract TVs from model outputs or hidden states using cumbersome and opaque methods, and they rarely elucidate the mechanisms by which TVs influence computation. In this work, we address both limitations. First, we propose directly training Learned Task Vectors (LTVs), which surpass extracted TVs in accuracy and exhibit superior flexibility-acting effectively at arbitrary layers, positions, and even with ICL prompts. Second, through systematic analysis, we investigate the mechanistic role of TVs, showing that at the low level they steer predictions primarily through attention-head OV circuits, with a small subset of 'key heads' most decisive. At a higher level, we find that despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces to improve logits of relevant labels, while later TVs are predominantly scaled in magnitude. Taken together, LTVs not only provide a practical approach for obtaining effective TVs but also offer a principled lens into the mechanistic foundations of ICL.
    @article{yang2025taskvectorslearnedextracted,
        title={Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight},
        author={Yang, Haolin and Cho, Hakaze and Ding, Kaize and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.24169},
        year={2025}
    }
  3. Binary Autoencoder for Mechanistic Interpretability of Large Language Models
    Hakaze ChoHaolin YangBrian M. KurkoskiNaoya Inoue
    Pre-print. 2025. 36 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    Existing works are dedicated to untangling atomized numerical components (features) from the hidden states of Large Language Models (LLMs) for interpreting their mechanism. However, they typically rely on autoencoders constrained by some implicit training-time regularization on single training instances (i.e., normalization, top-k function, etc.), without an explicit guarantee of global sparsity among instances, causing a large amount of dense (simultaneously inactive) features, harming the feature sparsity and atomization. In this paper, we propose a novel autoencoder variant that enforces minimal entropy on minibatches of hidden activations, thereby promoting feature independence and sparsity across instances. For efficient entropy calculation, we discretize the hidden activations to 1-bit via a step function and apply gradient estimation to enable backpropagation, so that we term it as Binary Autoencoder (BAE) and empirically demonstrate two major applications: (1) Feature set entropy calculation. Entropy can be reliably estimated on binary hidden activations, which we empirically evaluate and leverage to characterize the inference dynamics of LLMs and In-context Learning. (2) Feature untangling. Similar to typical methods, BAE can extract atomized features from LLM's hidden states. To robustly evaluate such feature extraction capability, we refine traditional feature-interpretation methods to avoid unreliable handling of numerical tokens, and show that BAE avoids dense features while producing the largest number of interpretable ones among baselines, which confirms the effectiveness of BAE serving as a feature extractor.
    @article{cho2025binary,
        title={Binary Autoencoder for Mechanistic Interpretability of Large Language Models},
        author={Cho, Hakaze and Yang, Haolin and Kurkoski, Brian M. and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.20997},
        year={2025}
    }
  4. Mechanism of Task-oriented Information Removal in In-context Learning
    Hakaze ChoHaolin YangGouki MinegishiNaoya Inoue
    Pre-print. 2025. 87 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.
    @article{cho2025mechanism,
        title={Mechanism of Task-oriented Information Removal in In-context Learning},
        author={Cho, Hakaze and Yang, Haolin and Minegishi, Gouki and Inoue, Naoya},
        journal={arXiv preprint arXiv:2509.21012},
        year={2025}
    }
  5. Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Pre-print. 2025. 6 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    In this study, we measure the Intrinsic Dimension (ID) of token embedding to estimate the intrinsic dimensions of the manifolds spanned by the representations, so as to evaluate their redundancy quantitatively compared to their extrinsic dimensionality. In detail, (1) we estimate the ID of token embeddings in small-scale language models and also modern large language models, finding that the embedding spaces often reside on lower-dimensional manifolds compared to their extrinsic dimensionality; (2) we measure the ID across various model sizes and observe an increase in redundancy rates as the model scale grows; (3) we measure the dynamics of IDs during the training process, and find a rapid ID drop in the early stages of training. Moreover, (4) when LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs, suggesting that the ID can serve as a useful guideline for LoRA application.
    @article{kataiwa2025measuring,
        title={Measuring Intrinsic Dimension of Token Embeddings},
        author={Kataiwa, Takuya and Cho, Hakaze and Ohki, Tetsushi},
        journal={arXiv preprint arXiv:2503.02142},
        year={2025}
    }
  6. Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
    Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
    Pre-print. 2025. 8 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    The performance of In-Context Learning (ICL) is highly sensitive to the selected demonstrations. Existing approaches to demonstration selection optimize different objectives, yielding inconsistent results. To address this, we propose a unified metric--affinity and diversity--that leverages ICL model's internal representations. Our experiments show that both affinity and diversity strongly correlate with test accuracies, indicating their effectiveness for demonstration selection. Moreover, we show that our proposed metrics align well with various previous works to unify the inconsistency.
    @article{kato2025affinity,
        title={Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations},
        author={Kato, Mariko and Cho, Hakaze and Sakai, Yoshihiro and Inoue, Naoya},
        journal={arXiv preprint arXiv:2502.14380},
        year={2025}
    }
  7. StaICC: Standardized Evaluation for Classification Task in In-context Learning
    Hakaze ChoNaoya Inoue
    Pre-print. 2025. 20 pages. 
    [PDF] [arXiv] [Github] [PyPI] [Abstract] [Bibtex
    Classification tasks are widely investigated in the In-Context Learning (ICL) paradigm. However, current efforts are evaluated on disjoint benchmarks and settings, while their performances are significantly influenced by some trivial variables, such as prompt templates, data sampling, instructions, etc., which leads to significant inconsistencies in the results reported across various literature, preventing fair comparison or meta-analysis across different papers. Therefore, this paper proposes a standardized and easy-to-use evaluation toolkit (StaICC) for in-context classification. Including, for the normal classification task, we provide StaICC-Normal, selecting 10 widely used datasets, and generating prompts with a fixed form, to mitigate the variance among the experiment implementations. To enrich the usage of our benchmark, we also provide a sub-benchmark StaICC-Diag for diagnosing ICL from several aspects, aiming for a more robust inference processing.
    @article{cho2025staicc,
        title={StaICC: Standardized Evaluation for Classification Task in In-context Learning},
        author={Cho, Hakaze and Inoue, Naoya},
        journal={arXiv preprint arXiv:2501.15708},
        year={2025}
    }
  8. NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning
    Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
    Pre-print. 2024. 20 pages. 
    [PDF] [arXiv] [Github] [Abstract] [Bibtex
    In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibration. Our experiments on two models and 12 downstream datasets show that NoisyICL can help ICL produce more accurate predictions. Our further analysis indicates that NoisyICL enables the model to provide more fair predictions, and also with more faithful confidence. Therefore, we believe that NoisyICL is an effective calibration of ICL. Our experimental code is uploaded to Github.
    @article{zhao2024noisyicl,
        title={NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning},
        author={Zhao, Yufeng and Sakai, Yoshihiro and Inoue, Naoya},
        journal={arXiv preprint arXiv:2402.05515},
        year={2024}
    }

日本国内会议 / 期刊 / 杂项

† = 国际会议论文的日本国内二次出版; 默认: 无同行评审,▲= 有同行评审
  1. Conference Note: Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze ChoNaoya Inoue
    Journal of Natural Language Processing (JNLP). 2025. 6 pages. 
    [PDF
  2. ▲†Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    本稿では,言語の埋め込み表現である単語ベクトルや埋め込み層について,表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し,その冗長度合いを定量評価する.具体的には,(1) Word2Vec や GloVe などの小規模モデルの埋め込みが持つIDを推定し,(2) Pythiaシリーズを代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する.実験の結果,埋め込み空間が外在次元に比べ低い次元の多様体上に分布する傾向が見られた.また,モデル規模の拡大に伴う冗長率の変化や,学習初期において ID が急速に収束する傾向が観察された.また,推定されたIDがLoRA適用時のランク選択に有効な可能性を示した.
  3. Analysis of Internal Representations of Knowledge with Expressions of Familiarity
    Kenshiro Tanaka, Yoshihiro Sakai, Hakaze ChoNaoya Inoue, Kai Sato, Ryosuke Takahashi, Benjamin HeinzerlingKentaro Inui
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    大規模言語モデル (LLM) の知識の既知性判断能力に関する研究が進められつつあるが、「It is known that…」のような既知性を示す言語表現を伴う知識を学習した際に、推論時にLLMがその知識の既知性を判断する能力については、検討されていない。本研究では、事前学習済みLLMに既知性を示す言語表現を付与した知識の記述を学習させ、その知識の内部表象を分析することで、既知性がどのようにLLMの内部に表現され得るのかを分析する。その結果、(1)知識の内部表象には、学習時に付与した言語表現毎に個別に既知性の情報が保持されていること、(2)既知性の情報は言語表現の記述位置毎に個別に保持されることが明らかになった。本研究は、LLMの既知性の判断能力のメカニズム解明の足がかりとなるものである。
  4. Internal Representations of Knowledge Recognition in Language Models
    Kai Sato, Ryosuke Takahashi, Benjamin Heinzerling, Kenshiro Tanaka, Hakaze Cho, Yoshihiro Sakai, Naoya InoueKentaro Inui
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    言語モデル(LM)の知識獲得能力は広く研究されているが,獲得した知識の既知性に関する判断機序については十分な理解が得られていない.本研究ではLMを用いて,特定の知識に対する出力生成時と既知性判断時の内部状態を比較分析した.結果として,言語モデルが実際に既知性判断を行う能力を持ち得ることが示され,(1)知識を学習した時点で,既知性を判断するための情報が内部表現中に存在すること,(2)既知と判断される知識と未知と判断される知識において,LMがそれぞれ異なる活性化パターンを示すことを明らかにした.これらの知見は,LMの既知性判断メカニズムの理解へ向けた手がかりを提供する.
  5. Revisiting In-context Learning Inference Circuit in Large Language Models
    Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages.  Oral, Outstanding Paper.
    [PDF] [Slides] [Abstract
    In-context Learning (ICL) は,言語モデルにおける新たな少数ショット学習パラダイムとして注目されているが,その内在的メカニズムは十分に解明されていない. 本研究では,ICL の推論ダイナミクスを3 つの基本操作に分解し,それらを基盤として推論回路を構築した上で精密な測定を行い,従来の研究で観察されてきた現象を統一的に説明することを試みた. さらに,提案した回路を無効化するアブレーション分析の結果,ICL の性能が顕著に低下することが確認され,提案した推論回路が ICL の主要なメカニズムであることが示唆された.
  6. Beyond the Induction Circuit: A Mechanistic Prototype for Out-of-domain In-context Learning
    Hakaze ChoNaoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages. 
    [PDF] [Poster] [Abstract
    In-contextLearning (ICL) is a promising few-shot learning paradigm with unclear mechanisms. Existing explanations heavily rely on Induction Heads, which fail to account for out-of-domain ICL, where query labels are absent in demonstrations. To address this, we model ICL as attribute resolution, where queries are mixtures of some attributes, and ICL identifies and resolves relevant attributes for predictions. In this paper, we propose a mechanistic prototype using toy models trained on synthetic data, and observe: (1) even 1-layer Transformers achieve non-trivial accuracy, with limited benefit from additional demonstrations, (2) scaling models effectively improve accuracy, and (3) inference operations can be decomposed into label space identification and generalized induction, warranting further exploration.
  7. Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages. 
    [PDF] [Abstract
    本研究では,言語の埋め込み表現である単語ベクトルや埋め込み層について,表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し,その冗長度合いを定量評価する.具体的には,(1)Word2Vec や GloVe などの小規模モデルの埋め込みが持つ ID を推定し,(2) Pythia 系列を代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する.実験の結果,埋め込み空間が外在的な次元に比べ低い次元の多様体上に分布する傾向が見られた.また,モデル規模の拡大に伴う冗長率の変化や,学習初期における急激な IDの形成が見られた.
  8. Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
    Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages. 
    [PDF] [Abstract
    文脈内学習 (In-Context Learning; ICL) において, デモンストレーション (デモ) の選択はタスク性能に大きな影響を与える. 既存研究ではデモの選択手順については研究されているが, 選択基準であるデモの性質は十分に調べられていない. 本研究では, デモの「親和性」と「多様性」という 2 つの性質を新たに提案し, その内の親和性が性質が複数のモデルおよびデータセットにおいてデモ選択に望ましい性質であることを示した. さらに, 既存手法で選ばれたデモが, 2 つの性質のタスク性能を向上させる方向へ集約していることを示し, デモ選択とタスク性能のメカニズム解明への示唆を得た.
  9. StaICC: Standardized Evaluation for Classification Task in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Naoya Inoue
    Symposium of Young Researcher Association for NLP Studies (YANS). 2025.  Poster Only.
    [Poster
  10. Image Feature Vectors are Frozen Informative Tokens for Language Models
    Mariko Kato, Hakaze Cho, Zhenzhu Yan, Yuting Shi, Naoya Inoue
    Symposium of Young Researcher Association for NLP Studies (YANS). 2025.  Poster Only.
  11. Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
    The 260th SIG for Natural Language, Information Processing Society of Japan (SIG-NL260, IPSJ). 2024. 17 pages.  Oral, Research Award for Young Scholars.
    [PDF] [Slides] [Abstract
    文脈内学習 (In-Context Learning; ICL) のタスクでは通常,ラベル空間に含まれるラベルトークンの生成確率を比べることで推論結果を決定するが,そのラベルトークンの選択は人間により恣意的に行われる.いくつかの先行研究は,これらのラベルトークンの生成確率の較正が ICL の性能向上に寄与することを明らかにしたが,これらの手法には依然として,人間が最適ではないラベルトークンを選べてしまうという問題が残る.そこで,本研究ではまず (1) LLM の隠れ状態を分析することで,現行のトークンベースの較正手法では,隠れ状態が持つ有益な情報をうまく表現出来ないことを明らかにする.そして,(2) 人間によるラベルトークン選択の影響を低減し,隠れ状態に含まれる有益な情報を効果的に利用出来る新たな ICL の手法を提案する.実験の結果,我々の提案手法は 3 つのモデルと 10 個の分類データセットでの実験で,現在のトークンベースの較正手法を約 20% 上回る性能を発揮した.
  12. NoisyICL: A Little Noise in Model Parameters Can Calibrate In-context Learning
    Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages.  Oral.
    [PDF] [Slides] [Abstract
    In-Context Learning (ICL), where language models learn tasks in a generative form from few-shot demonstrations without parameter update, is emerging while scaling up the language models. Nevertheless, the performance of ICL is still unsatisfactory. Some previous studies suggested that it is due to under-calibration and they fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for a calibration. Our experiments on 2 models and 7 downstream task datasets show that NoisyICL helps perform ICL better. Our further analysis indicates that NoisyICL can enable the model to provide more fair predictions, with less unfaithful confidence. So, NoisyICL can be considered as an effective calibration.
  13. Can LLM Learn Prompt Format in In-context Learning?
    Yoshihiro Sakai, Hakaze ChoNaoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages.  SB Intuitions Awards.
    [PDF] [Abstract
    In-Context Learning (文脈内学習;ICL) は,プロンプト中に与えられた少数のデモなどからパラメータを更新することなくタスクを学習する LLM の能力であるが,そのメカニズムは十分に明らかにされていない.先行研究の実験は,「タスクの入力の後にラベルを出力する」というフォーマットを LLM に示すことが特に重要である可能性を示唆する.そこで本研究では,LLM が与えられたデモから答え方のフォーマットを学習する様子を直接的に可視化した.結果として,(1) 確かに LLM はデモから答え方のフォーマットを学んでいること,(2) フォーマットの学習は意味の無いラベルについても可能であること,(3) 最悪のラベルが ICL の Macro-F1 を大きく向上させることを発見した.
  14. Find-the-Common: Benchmarking Inductive Reasoning Ability on Vision-Language Models
    Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages. 
    [PDF] [Abstract
    Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs) have revolutionized the landscape of integrated vision and language understanding. However, Inductive Visual Reasoning—a vital skill for textimage understanding—remains underexplored due to the absence of benchmarks. So, in this paper, we introduce Find–the–Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including implicit reasoning, symbolic reasoning, and implicit-symbolic reasoning with various models. Extensive experiments show that even state-ofthe-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset is available online.

学位论文

  1. The Mechanistic Basis of In-context Learning
    Yufeng Zhao
    博士学位论文 @ Japan Advanced Institute of Science and Technology. 2026. 223 pages.
  2. Fine-tuning with Randomly Initialized Downstream Network: Finding a Stable Convex-loss Region in Parameter Space
    Yufeng Zhao
    硕士学位论文 @ 北京理工大学. 2023. 81 pages.
  3. 两亲性聚集诱导发光增强化合物的合成及自组装研究
    Yufeng Zhao
    学士学位论文 @ 北京理工大学. 2021. 52 pages.

经历

学术活动

同行评审

  • Association for Computational Linguistics Rolling Review (ACL ARR): 2025 (May, July, October), 2026 (January)
  • Conference on Neural Information Processing Systems (NeurIPS): 2025
  • International Conference on Learning Representations (ICLR): 2025, 2026
  • International Conference on Machine Learning (ICML): 2025 Actionable Interpretability Workshop
  • Annual Meeting of the Association for Computational Linguistics (ACL): 2025 Student Research Workshop

学会会员

  • 言语处理学会(日本) 学生会员
  • 人工智能学会(日本) 学生会員
  • Association for Computational Linguistics (ACL)

科研经费

  • 项目负责人:Towards Mechanistic Controllability: Circuit-based Behavior Correction for Large Language Models
    理化学研究所基础科学特别研究员科研经费,2026.4 ~ 2029.3,3,000,000 日元.

获奖

  • Outstanding Paper @ The 31st Annual Conference of the (Japanese) Association for Natural Language Processing (NLP2025, ANLP). 2025. (top 14 in 765, 2.0%)
  • Research Award for Young Scholars @ The 260th SIG for Natural Language, Information Processing Society of Japan (SIG-NL260, IPSJ). 2024.
  • SB Intuitions Awards @ The 30st Annual Conference of the Japanese Association for Natural Language Processing (NLP2024, ANLP). 2024.
  • 文部科学省外国人留学生学习奖励费 @ 日本文部科学省. 2023.
  • Outstanding Oral Presentation @ 2022 Euro-Asia Conference on Frontiers of Computer Science and Information Technology. 2022.
  • 成绩进步奖 @ 北京理工大学. 2020. 由于健康原因,我缺席了大量2019年的期末考试,故我2020年的常规GPA被视为显著的进步.
  • 年度奖学金 @ 北京理工大学. 2018,2019,2021,2022,2023.
  • 一等奖 @ 第30届中国(高中)化学奥林匹克. 2016.
  • 二等奖 @ 第29届中国(高中)化学奥林匹克. 2015.

Copyright © 2025 Hakaze Cho / Yufeng Zhao. All rights reserved. Icon generated by StableDiffusion.
Generated on 2025-12-30 05:53:31 +0900, powered by Github Pages and Jekyll, template designed and programmed by Hakaze Cho.
Viewed.