趙羽風

@北京理工大学 2023年

博士後期課程2年生 @ 北陸先端科学技術大学院大学, コンピューティング科学研究領域
リサーチアシスタント・メンター @ RebelsNLU, 指導教員: 井之上直也准教授

別名: Yufeng Zhao（漢字表記：「趙羽風」）
生年: 1999年, 北京生まれ

E-mail: yfzhao [at] jaist.ac.jp
リンク: Twitter GitHub Google Scholar ORCID Researchmap Semantic Scholar Blog
住所: 石川県能美市旭台1-1 北陸先端科学技術大学院大学情報科学研究科 I棟 I-52室

私は中国のトップ大学である北京理工大学を卒業し, 2021年に化学の学士号, 2023年にソフトウェア工学の修士号を取得しました. 現在はJAISTにて博士課程に在籍しており, 2026年3月の早期修了を目指しています.
研究テーマは, 人工ニューラルネットワーク, 特にTransformerベースのニューラル言語モデルにおける訓練・推論中の内部挙動を, 数学的および表現学習の手法によって解明し, その理解に基づく性能向上を目指すものです.
2023年以降、この分野において20本以上の論文および研究発表を発信しており、その中にはICLRやNAACLといったトップカンファレンスに採択されたものも含まれます。

この研究分野に関心のある方との共同研究を積極的に募集しています. ご興味をお持ちの方は, ぜひお気軽にご連絡ください. 専門家だけでなく, 意欲と学習効率の高い初心者との協力も歓迎します. また, 他分野での共同研究についても柔軟に対応いたします.

English Site

研究関心

キーワード: 表現学習, 機械論的解釈可能性, 文脈内学習

人工ニューラルネットワークの解釈可能性: 機械論的解釈可能性, 低リソースモデル制御
大規模言語モデル: Transformer系大規模言語モデルの内部原理と改善
その他: 多様体学習, 低数値精度ニューラルネットワーク, モデル訓練ダイナミクス

論文一覧

Total Publications: 27, Cumulative IF: 73.1, Total Pages: 434.

国際会議

Revisiting In-context Learning Inference Circuit in Large Language Models
Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
International Conference on Learning Representations (ICLR). 2025. 37 pages. [h5=304, IF=48.9]
[OpenReview] [PDF] [arXiv] [Github] [Poster] [Abstract] [Bibtex]

In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations of demonstrations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. Through careful measurements, the proposed inference circuit successfully captures and unifies many fragmented phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.

@inproceedings{cho2025revisiting,
    title={Revisiting In-context Learning Inference Circuit in Large Language Models},
    author={Hakaze Cho and Mariko Kato and Yoshihiro Sakai and Naoya Inoue},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=xizpnYNvQq}
}
Token-based Decision Criteria Are Suboptimal in In-context Learning
Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
Annual Conference of the Nations of the Americas Chapter of the ACL (NAACL.main). 2025. 24 pages. [h5=132, IF=16.5]
[ACL Anthology] [PDF] [arXiv] [Github] [Poster] [Abstract] [Bibtex]

In-Context Learning (ICL) typically utilizes classification criteria from output probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation applied. To address this problem, we propose Hidden Calibration, which renounces token probabilities and uses the nearest centroid classifier on the LM’s last hidden states. In detail, we assign the label of the nearest centroid previously estimated from a calibration set to the test sample as the predicted label. Our experiments on 6 models and 10 classification datasets indicate that Hidden Calibration consistently outperforms current token-based baselines by about 20%~50%, achieving a strong state-of-the-art in ICL. Our further analysis demonstrates that Hidden Calibration finds better classification criteria with less inter-class overlap, and LMs provide linearly separable intra-class clusters with the help of demonstrations, which supports Hidden Calibration and gives new insights into the principle of ICL. Our official code implementation can be found at https://github.com/hc495/Hidden_Calibration.

@inproceedings{cho2025token,
    title={Token-based Decision Criteria Are Suboptimal in In-context Learning},
    author={Hakaze Cho and Yoshihiro Sakai and Mariko Kato and Kenshiro Tanaka and Akira Ishii and Naoya Inoue},
    booktitle={Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
    year={2025},
    url={https://aclanthology.org/2025.naacl-long.278/}
}
Understanding Token Probability Encoding in Output Embeddings
Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue
International Conference on Computational Linguistics (COLING). 2025. 16 pages. [h5=65, IF=7.7]
[ACL Anthology] [PDF] [arXiv] [Poster] [Abstract] [Bibtex]

In this paper, we investigate the output token probability information in the output embedding of language models. We find an approximate common log-linear encoding of output token probabilities within the output embedding vectors and empirically demonstrate that it is accurate and sparse. As a causality examination, we steer the encoding in output embedding to modify the output probability distribution accurately. Moreover, the sparsity we find in output probability encoding suggests that a large number of dimensions in the output embedding do not contribute to causal language modeling. Therefore, we attempt to delete the output-unrelated dimensions and find more than 30% of the dimensions can be deleted without significant movement in output distribution and sequence generation. Additionally, in the pre-training dynamics of language models, we find that the output embeddings capture the corpus token frequency information in early steps, even before an obvious convergence of parameters starts.

@inproceedings{cho2025understanding,
    title={Understanding Token Probability Encoding in Output Embeddings},
    author={Hakaze Cho and Yoshihiro Sakai and Kenshiro Tanaka and Mariko Kato and Naoya Inoue},
    booktitle={Proceedings of the 31st International Conference on Computational Linguistics},
    year={2025},
    url={https://aclanthology.org/2025.coling-main.708/}
}
Find-the-Common: A Benchmark for Explaining Visual Patterns from Images
Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
International Conference on Language Resources and Evaluation (LREC). 2024. 7 pages. [h5=59]
[ACL Anthology] [PDF] [Abstract] [Bibtex]

Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs), such as GPT-4V and InstructBLIP, have prompted some studies have started an in-depth analysis of the reasoning capabilities of IVLMs. However, Inductive Visual Reasoning, a vital skill for text-image understanding, remains underexplored due to the absence of benchmarks. In this paper, we introduce Find-the-Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including Image-Based Reasoning, Text-Based Reasoning, and Image-Text-Based Reasoning with various models. Extensive experiments show that even state-of-the-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset has been released and is available online: https://github.com/SSSSSeki/Find-the-common.

@inproceedings{shi2024find,
    title={Find-the-Common: A Benchmark for Explaining Visual Patterns from Images},
    author={Yuting Shi and Naoya Inoue and Houjing Wei and Yufeng Zhao and Tao Jin},
    booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    year={2024},
    url={https://aclanthology.org/2024.lrec-main.642/}
}
Methods to Enhance BERT in Aspect-Based Sentiment Classification
Yufeng Zhao, Evelyn Soerjodjojo, et al.
IEEE Euro-Asia Conference on Frontiers of Computer Science and Information Technology. 2022. 7 pages. Outstanding Oral Presentation Award.
[PDF] [Abstract] [Bibtex]

BERT is a widely used pre-trained model in Natural Language Processing tasks, including Aspect-Based sentiment classification. BERT is equipped with sufficient prior language knowledge in the enormous amount of pre-trained model parameters, for which the fine-tuning of BERT has become a critical issue. Previous works mainly focused on specialized downstream networks or additional knowledge to fine-tune the BERT to the sentiment classification tasks. In this paper, we design experiments to find the fine-tuning techniques that can be used by all models with BERT in the Aspect-Based Sentiment Classification tasks. Through these experiments, we verify different feature extraction, regularization, and continual learning methods, then we summarize 8 universally applicable conclusions to enhance the training and performance of the BERT model.

@inproceedings{zhao2022methods,
    title={Methods to enhance bert in aspect-based sentiment classification},
    author={Zhao, Yufeng and Soerjodjojo, Evelyn and Che, Haiying},
    booktitle={2022 Euro-Asia Conference on Frontiers of Computer Science and Information Technology (FCSIT)},
    pages={21--27},
    year={2022},
    organization={IEEE}
}

プレプリント

Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
Haolin Yang, Hakaze Cho, Yiqiao Zhong, Naoya Inoue
Pre-print. 2025. 45 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge the gap between attention heads and task vectors, offering a unified account of ICL's underlying mechanisms.

@article{yang2025unifying,
    title={Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning},
    author={Yang, Haolin and Cho, Hakaze and Zhong, Yiqiao and Inoue, Naoya},
    journal={arXiv preprint arXiv:2505.18752},
    year={2025}
}
Mechanistic Fine-tuning for In-context Learning
Hakaze Cho, Peng Luo, Mariko Kato, Rin Kaenbyou, Naoya Inoue
Pre-print. 2025. 28 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

In-context Learning (ICL) utilizes structured demonstration-query inputs to induce few-shot learning on Language Models (LMs), which are not originally pre-trained on ICL-style data. To bridge the gap between ICL and pre-training, some approaches fine-tune LMs on large ICL-style datasets by an end-to-end paradigm with massive computational costs. To reduce such costs, in this paper, we propose Attention Behavior Fine-Tuning (ABFT), utilizing the previous findings on the inner mechanism of ICL, building training objectives on the attention scores instead of the final outputs, to force the attention scores to focus on the correct label tokens presented in the context and mitigate attention scores from the wrong label tokens. Our experiments on 9 modern LMs and 8 datasets empirically find that ABFT outperforms in performance, robustness, unbiasedness, and efficiency, with only around 0.01% data cost compared to the previous methods. Moreover, our subsequent analysis finds that the end-to-end training objective contains the ABFT objective, suggesting the implicit bias of ICL-style data to the emergence of induction heads. Our work demonstrates the possibility of controlling specific module sequences within LMs to improve their behavior, opening up the future application of mechanistic interpretability.

@article{cho2025mechanistic,
    title={Mechanistic Fine-tuning for In-context Learning},
    author={Cho, Hakaze and Luo, Peng and Kato, Mariko and Kaenbyou, Rin and Inoue, Naoya},
    journal={arXiv preprint arXiv:2505.14233},
    year={2025}
}
Measuring Intrinsic Dimension of Token Embeddings
Takuya Kataiwa, Hakaze Cho, Tetsushi Ohki
Pre-print. 2025. 6 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

In this study, we measure the Intrinsic Dimension (ID) of token embedding to estimate the intrinsic dimensions of the manifolds spanned by the representations, so as to evaluate their redundancy quantitatively compared to their extrinsic dimensionality. In detail, (1) we estimate the ID of token embeddings in small-scale language models and also modern large language models, finding that the embedding spaces often reside on lower-dimensional manifolds compared to their extrinsic dimensionality; (2) we measure the ID across various model sizes and observe an increase in redundancy rates as the model scale grows; (3) we measure the dynamics of IDs during the training process, and find a rapid ID drop in the early stages of training. Moreover, (4) when LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs, suggesting that the ID can serve as a useful guideline for LoRA application.

@article{kataiwa2025measuring,
    title={Measuring Intrinsic Dimension of Token Embeddings},
    author={Kataiwa, Takuya and Cho, Hakaze and Ohki, Tetsushi},
    journal={arXiv preprint arXiv:2503.02142},
    year={2025}
}
Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
Pre-print. 2025. 8 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

The performance of In-Context Learning (ICL) is highly sensitive to the selected demonstrations. Existing approaches to demonstration selection optimize different objectives, yielding inconsistent results. To address this, we propose a unified metric--affinity and diversity--that leverages ICL model's internal representations. Our experiments show that both affinity and diversity strongly correlate with test accuracies, indicating their effectiveness for demonstration selection. Moreover, we show that our proposed metrics align well with various previous works to unify the inconsistency.

@article{kato2025affinity,
    title={Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations},
    author={Kato, Mariko and Cho, Hakaze and Sakai, Yoshihiro and Inoue, Naoya},
    journal={arXiv preprint arXiv:2502.14380},
    year={2025}
}
StaICC: Standardized Evaluation for Classification Task in In-context Learning
Hakaze Cho, Naoya Inoue
Pre-print. 2025. 20 pages.
[PDF] [arXiv] [Github] [PyPI] [Abstract] [Bibtex]

Classification tasks are widely investigated in the In-Context Learning (ICL) paradigm. However, current efforts are evaluated on disjoint benchmarks and settings, while their performances are significantly influenced by some trivial variables, such as prompt templates, data sampling, instructions, etc., which leads to significant inconsistencies in the results reported across various literature, preventing fair comparison or meta-analysis across different papers. Therefore, this paper proposes a standardized and easy-to-use evaluation toolkit (StaICC) for in-context classification. Including, for the normal classification task, we provide StaICC-Normal, selecting 10 widely used datasets, and generating prompts with a fixed form, to mitigate the variance among the experiment implementations. To enrich the usage of our benchmark, we also provide a sub-benchmark StaICC-Diag for diagnosing ICL from several aspects, aiming for a more robust inference processing.

@article{cho2025staicc,
    title={StaICC: Standardized Evaluation for Classification Task in In-context Learning},
    author={Cho, Hakaze and Inoue, Naoya},
    journal={arXiv preprint arXiv:2501.15708},
    year={2025}
}
NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning
Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
Pre-print. 2024. 20 pages.
[PDF] [arXiv] [Github] [Abstract] [Bibtex]

In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibration. Our experiments on two models and 12 downstream datasets show that NoisyICL can help ICL produce more accurate predictions. Our further analysis indicates that NoisyICL enables the model to provide more fair predictions, and also with more faithful confidence. Therefore, we believe that NoisyICL is an effective calibration of ICL. Our experimental code is uploaded to Github.

@article{zhao2024noisyicl,
    title={NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning},
    author={Zhao, Yufeng and Sakai, Yoshihiro and Inoue, Naoya},
    journal={arXiv preprint arXiv:2402.05515},
    year={2024}
}
SkIn: Skimming-Intensive Long-Text Classification Using BERT for Medical Corpus
Yufeng Zhao, et al.
Pre-print. 2022. 14 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

BERT is a widely used pre-trained model in natural language processing. However, since BERT is quadratic to the text length, the BERT model is difficult to be used directly on the long-text corpus. In some fields, the collected text data may be quite long, such as in the health care field. Therefore, to apply the pre-trained language knowledge of BERT to long text, in this paper, imitating the skimming-intensive reading method used by humans when reading a long paragraph, the Skimming-Intensive Model (SkIn) is proposed. It can dynamically select the critical information in the text so that the sentence input into the BERT-Base model is significantly shortened, which can effectively save the cost of the classification algorithm. Experiments show that the SkIn method has achieved superior accuracy than the baselines on long-text classification datasets in the medical field, while its time and space requirements increase linearly with the text length, alleviating the time and space overflow problem of basic BERT on long-text data.

@article{zhao2022skin,
    title={SkIn: Skimming-Intensive Long-Text Classification Using BERT for Medical Corpus},
    author={Zhao, Yufeng and et al.},
    journal={arXiv preprint arXiv:2209.05741},
    year={2022}
}

国内会議・ジャーナル・その他
(† = 国際会議論文の日本国内再録; 通常: 査読なし, ▲ = 査読あり)

▲†トークン埋め込みの内在次元を測る
片岩拓也, 趙羽風, 大木哲史
人工知能学会全国大会 (JSAI). 2025. 4 pages.
[PDF] [Abstract]

本稿では，言語の埋め込み表現である単語ベクトルや埋め込み層について，表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し，その冗長度合いを定量評価する．具体的には，(1) Word2Vec や GloVe などの小規模モデルの埋め込みが持つIDを推定し，(2) Pythiaシリーズを代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する．実験の結果，埋め込み空間が外在次元に比べ低い次元の多様体上に分布する傾向が見られた．また，モデル規模の拡大に伴う冗長率の変化や，学習初期において ID が急速に収束する傾向が観察された．また，推定されたIDがLoRA適用時のランク選択に有効な可能性を示した．
▲既知性を示す言語表現を伴う知識に関する内部表象の分析
田中健史朗, 坂井吉弘, 趙羽風, 井之上直也, 佐藤魁, 高橋良允, Benjamin Heinzerling, 乾健太郎
人工知能学会全国大会 (JSAI). 2025. 4 pages.
[PDF] [Abstract]

大規模言語モデル (LLM) の知識の既知性判断能力に関する研究が進められつつあるが、「It is known that…」のような既知性を示す言語表現を伴う知識を学習した際に、推論時にLLMがその知識の既知性を判断する能力については、検討されていない。本研究では、事前学習済みLLMに既知性を示す言語表現を付与した知識の記述を学習させ、その知識の内部表象を分析することで、既知性がどのようにLLMの内部に表現され得るのかを分析する。その結果、（1）知識の内部表象には、学習時に付与した言語表現毎に個別に既知性の情報が保持されていること、（2）既知性の情報は言語表現の記述位置毎に個別に保持されることが明らかになった。本研究は、LLMの既知性の判断能力のメカニズム解明の足がかりとなるものである。
▲言語モデルにおける知識の既知性判断の内部表象
佐藤魁, 高橋良允, Benjamin Heinzerling, 田中健史朗, 趙羽風, 坂井吉弘, 井之上直也, 乾健太郎
人工知能学会全国大会 (JSAI). 2025. 4 pages.
[PDF] [Abstract]

言語モデル（LM）の知識獲得能力は広く研究されているが，獲得した知識の既知性に関する判断機序については十分な理解が得られていない．本研究ではLMを用いて，特定の知識に対する出力生成時と既知性判断時の内部状態を比較分析した．結果として，言語モデルが実際に既知性判断を行う能力を持ち得ることが示され，（1）知識を学習した時点で，既知性を判断するための情報が内部表現中に存在すること，（2）既知と判断される知識と未知と判断される知識において，LMがそれぞれ異なる活性化パターンを示すことを明らかにした．これらの知見は，LMの既知性判断メカニズムの理解へ向けた手がかりを提供する．
†大規模言語モデルにおける In-context Learning の推論回路
趙羽風, 加藤万理子, 坂井吉弘, 井之上直也
言語処理学会年次大会 (NLP). 2025. 6 pages. Oral, 優秀賞.
[PDF] [Slides] [Abstract]

In-context Learning (ICL) は，言語モデルにおける新たな少数ショット学習パラダイムとして注目されているが，その内在的メカニズムは十分に解明されていない. 本研究では，ICL の推論ダイナミクスを3 つの基本操作に分解し，それらを基盤として推論回路を構築した上で精密な測定を行い，従来の研究で観察されてきた現象を統一的に説明することを試みた. さらに，提案した回路を無効化するアブレーション分析の結果，ICL の性能が顕著に低下することが確認され，提案した推論回路が ICL の主要なメカニズムであることが示唆された.
Beyond the Induction Circuit: A Mechanistic Prototype for Out-of-domain In-context Learning
趙羽風, 井之上直也
言語処理学会年次大会 (NLP). 2025. 5 pages.
[PDF] [Poster] [Abstract]

In-contextLearning (ICL) is a promising few-shot learning paradigm with unclear mechanisms. Existing explanations heavily rely on Induction Heads, which fail to account for out-of-domain ICL, where query labels are absent in demonstrations. To address this, we model ICL as attribute resolution, where queries are mixtures of some attributes, and ICL identifies and resolves relevant attributes for predictions. In this paper, we propose a mechanistic prototype using toy models trained on synthetic data, and observe: (1) even 1-layer Transformers achieve non-trivial accuracy, with limited benefit from additional demonstrations, (2) scaling models effectively improve accuracy, and (3) inference operations can be decomposed into label space identification and generalized induction, warranting further exploration.
†埋め込み表現の内在次元を測る
片岩拓也, 趙羽風, 大木哲史
言語処理学会年次大会 (NLP). 2025. 5 pages.
[PDF] [Abstract]

本研究では，言語の埋め込み表現である単語ベクトルや埋め込み層について，表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し，その冗長度合いを定量評価する．具体的には，(1)Word2Vec や GloVe などの小規模モデルの埋め込みが持つ ID を推定し，(2) Pythia 系列を代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する．実験の結果，埋め込み空間が外在的な次元に比べ低い次元の多様体上に分布する傾向が見られた．また，モデル規模の拡大に伴う冗長率の変化や，学習初期における急激な IDの形成が見られた．
†文脈内学習におけるデモの親和性と多様性の提案
加藤万理子, 趙羽風, 坂井吉弘, 井之上直也
言語処理学会年次大会 (NLP). 2025. 6 pages.
[PDF] [Abstract]

文脈内学習 (In-Context Learning; ICL) において, デモンストレーション (デモ) の選択はタスク性能に大きな影響を与える. 既存研究ではデモの選択手順については研究されているが, 選択基準であるデモの性質は十分に調べられていない. 本研究では, デモの「親和性」と「多様性」という 2 つの性質を新たに提案し, その内の親和性が性質が複数のモデルおよびデータセットにおいてデモ選択に望ましい性質であることを示した. さらに, 既存手法で選ばれたデモが, 2 つの性質のタスク性能を向上させる方向へ集約していることを示し, デモ選択とタスク性能のメカニズム解明への示唆を得た.
†StaICC: 文脈内学習における分類タスクの標準的なベンチマーク
趙羽風, 坂井吉弘, 加藤万理子, 井之上直也
言語処理若手シンポジウム (YANS). 2025. Poster Only.
[Poster]
画像特徴ベクトルは重みを固定した言語モデルで情報豊かなトークンである
加藤万理子, 趙羽風, 閻真竺, 石钰婷, 井之上直也
言語処理若手シンポジウム (YANS). 2025. Poster Only.
†In-context Learning におけるトークンベース較正手法の用いる決定境界は最適でない
趙羽風, 坂井吉弘, 加藤万理子, 田中健史朗, 石井晶, 井之上直也
情報処理学会NL研第260回研究発表会 (SIG-NL260, IPSJ). 2024. 17 pages. Oral, 若手奨励賞.
[PDF] [Slides] [Abstract]

文脈内学習 (In-Context Learning; ICL) のタスクでは通常，ラベル空間に含まれるラベルトークンの生成確率を比べることで推論結果を決定するが，そのラベルトークンの選択は人間により恣意的に行われる．いくつかの先行研究は，これらのラベルトークンの生成確率の較正が ICL の性能向上に寄与することを明らかにしたが，これらの手法には依然として，人間が最適ではないラベルトークンを選べてしまうという問題が残る．そこで，本研究ではまず (1) LLM の隠れ状態を分析することで，現行のトークンベースの較正手法では，隠れ状態が持つ有益な情報をうまく表現出来ないことを明らかにする．そして，(2) 人間によるラベルトークン選択の影響を低減し，隠れ状態に含まれる有益な情報を効果的に利用出来る新たな ICL の手法を提案する．実験の結果，我々の提案手法は 3 つのモデルと 10 個の分類データセットでの実験で，現在のトークンベースの較正手法を約 20% 上回る性能を発揮した．
†NoisyICL: A Little Noise in Model Parameters Can Calibrate In-context Learning
趙羽風, 坂井吉弘, 井之上直也
言語処理学会年次大会 (NLP). 2024. 6 pages. Oral.
[PDF] [Slides] [Abstract]

In-Context Learning (ICL), where language models learn tasks in a generative form from few-shot demonstrations without parameter update, is emerging while scaling up the language models. Nevertheless, the performance of ICL is still unsatisfactory. Some previous studies suggested that it is due to under-calibration and they fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for a calibration. Our experiments on 2 models and 7 downstream task datasets show that NoisyICL helps perform ICL better. Our further analysis indicates that NoisyICL can enable the model to provide more fair predictions, with less unfaithful confidence. So, NoisyICL can be considered as an effective calibration.
In-context Learning においてLLMはフォーマットを学べるか
坂井吉弘, 趙羽風, 井之上直也
言語処理学会年次大会 (NLP). 2024. 6 pages. スポンサー賞.
[PDF] [Abstract]

In-Context Learning (文脈内学習；ICL) は，プロンプト中に与えられた少数のデモなどからパラメータを更新することなくタスクを学習する LLM の能力であるが，そのメカニズムは十分に明らかにされていない．先行研究の実験は，「タスクの入力の後にラベルを出力する」というフォーマットを LLM に示すことが特に重要である可能性を示唆する．そこで本研究では，LLM が与えられたデモから答え方のフォーマットを学習する様子を直接的に可視化した．結果として，(1) 確かに LLM はデモから答え方のフォーマットを学んでいること，(2) フォーマットの学習は意味の無いラベルについても可能であること，(3) 最悪のラベルが ICL の Macro-F1 を大きく向上させることを発見した．
†Find-the-Common: Benchmarking Inductive Reasoning Ability on Vision-Language Models
Yuting Shi, Naoya Inoue, Houjing Wei, , Tao Jin
言語処理学会年次大会 (NLP). 2024. 6 pages.
[PDF] [Abstract]

Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs) have revolutionized the landscape of integrated vision and language understanding. However, Inductive Visual Reasoning—a vital skill for textimage understanding—remains underexplored due to the absence of benchmarks. So, in this paper, we introduce Find–the–Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including implicit reasoning, symbolic reasoning, and implicit-symbolic reasoning with various models. Extensive experiments show that even state-ofthe-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset is available online.

(学位論文)

Fine-tuning with Randomly Initialized Downstream Network: Finding a Stable Convex-loss Region in Parameter Space
Yufeng Zhao
修士論文 - 評価A @ 北京理工大学, 2023. 81 pages.
Synthesis and Self-Assembly of Aggregation-induced Emission Compounds
Yufeng Zhao
学士論文 @ 北京理工大学, 2021. 52 pages.

履歴

博士（情報科学） リサーチアシスタント, 2023年10月～ 2026年3月 (予定)
北陸先端科学技術大学院大学コンピューティング科学研究領域
指導教員: 井之上直也准教授
修士（ソフトウェア工学） 2021年9月～ 2023年6月
北京理工大学情報科学技術研究科
指導教員: 趙羽風 (自主研究)
学士（化学） 2017年8月～ 2021年6月
北京理工大学基礎科学部
指導教員: 石建兵准教授

受賞歴

優秀賞 @ 言語処理学会第31回年次大会 (NLP2025, ANLP), 2025 (全765件中上位15件, 2.0%)
若手奨励賞 @ 情報処理学会第260回NL研究会 (SIG-NL260), 2024
スポンサー賞 (SB Intuitions Awards) @ 言語処理学会第30回年次大会 (NLP2024), 2024
文部科学省外国人留学生学習奨励費 @ 文部科学省, 2023
優秀口頭発表賞 @ 2022年欧亜フロンティアコンピュータ科学技術国際会議
GPA向上賞 @ 北京理工大学, 2020 2019年に病気のため多くの試験を欠席したため、2020年の通常のGPAは顕著な向上と見なされました
北京理工大学年間(GPA)優秀賞：2018, 2019, 2021, 2022, 2023

趙 羽風