NLPCC 2025 Tutorial主题及讲者揭晓
NLPCC 2025将于8月7日至9日在新疆维吾尔自治区乌鲁木齐市举行。会议期间,将与第十届语言与智能高峰论坛共同举办(8月10日)。大会荣幸邀请到多位领域知名学者来进行tutorial报告。
CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC),是中国计算机学会自然语言处理专委会(CCF-NLP)主办的年度学术盛会。
NLPCC 2025将于2025年8月7日至9日在新疆维吾尔自治区乌鲁木齐市举行。会议期间,将与第十届语言与智能高峰论坛共同举办(8月10日)。大会荣幸邀请到多位领域知名学者来进行tutorial报告。
以下是tutorial的详细信息。
Junfeng Fang, Wanli Yang
Model Editing in LLMs: Advances and Challenges
Large Language Models (LLMs) have achieved remarkable success in knowledge-intensive tasks, yet they inevitably encode incorrect, outdated, or biased knowledge. Model editing has recently emerged as a promising lightweight approach, enabling efficient and precise modifications to LLMs without the need for full-scale training. In this tutorial, we will provide a comprehensive overview of model editing, covering the latest techniques, evolving evaluations, potential applications, and emerging challenges in the field. We will delve into cutting-edge methods for continuously updating the knowledge of LLMs without compromising their general capabilities, and reliable evaluation protocols for rigorously assessing both the effectiveness and broader impact of these edits.
Junfeng Fang is a first-year postdoctoral researcher at the National University of Singapore (NUS) in the NExT Lab supervised by Prof. Chua Tat-Seng. His research focuses on trustworthy large language models, including safety alignment and knowledge editing. In the past two years, he has published nearly ten first-author papers in top-tier conferences and journals such as ICLR, NeurIPS and ACL, with multiple Oral papers at the NeurIPS and ICLR (average acceptance rate < 2%). His honors include the National Scholarship of China and the ICLR 2025 Outstanding Paper Award.
Wanli Yang is a first-year PhD student at Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS), under the supervision of Prof. Fei Sun and Prof. Xinran Liu. His research focuses on knowledge memorization and updating in large language models, with an emphasis on model editing. He has published several papers on this topic at top-tier NLP conferences, including ACL and EMNLP.
Jiaheng Liu, Yizhi Li
Title:
How to Build the Domain-Specific LLMs effectively?
This tutorial provides an in-depth exploration of key technical practices for developing domain-specific large language models (LLMs) by focusing on continued pretraining, fine-tuning, and reinforcement learning. As general-purpose LLMs often lack accuracy and relevance in specialized tasks, there is a growing need for vertical models tailored to industries like healthcare, finance, legal, and scientific research. We discuss continued pretraining to adapt base models to domain-specific corpora, enabling them to internalize relevant terminology and knowledge. Supervised fine-tuning is examined for aligning model outputs with task-specific objectives using curated datasets. Additionally, we cover reinforcement learning for alignment, including RLHF (Reinforcement Learning from Human Feedback) and RLVR (Reinforcement Learning from Verifiable Rewards), addressing challenges such as reward model design and scalable optimization. Practical insights, case studies, and experimental results are provided to demonstrate the impact on model performance, serving as a technical reference for building trustworthy, high-performing models in specialized domains.
Jingjing Chen
Title:
Advancing Responsible Generative AI: Progress and Challenges
The rapid development of generative AI has enabled powerful capabilities in visual content creation, but also raised serious concerns regarding content authenticity, misuse, and societal harm. This tutorial focuses on advancing responsible generative AI, particularly in the context of visual content safety. We begin by outlining the emerging risks of AI-generated visual media, such as deepfakes and synthetic misinformation. The tutorial then covers two main research frontiers: (1) Detection and attribution of generated visual content, including recent techniques for identifying AI-generated images and videos across diverse domains and manipulation methods; and (2) Mitigating unsafe concepts in generative models, such as removing sensitive, harmful, or misuse-prone concepts through concept erasure and editing mechanisms. We will introduce key datasets, evaluation protocols, and technical frameworks, and share insights from practical deployments and challenges. The tutorial concludes by identifying open research questions and future directions toward building trustworthy, safe, and regulation-ready generative AI systems.
Jian Yang
Title:
From Code Foundation Model to General Code Agents
This tutorial explores the transformative shift from standalone code models to dynamic, tool-augmented code agents that revolutionize software development. By integrating large language models with compilers, debuggers, APIs, and other resources, these agents enhance developer workflows through intelligent automation and real-time feedback, as seen in AI-driven editors like Cursor and AI pair programmers like GitHub Copilot. We examine key advancements in tool invocation, agentic workflows, and frameworks that connect model outputs with execution environments, highlighting applications in automated software engineering and adaptive DevOps pipelines. Challenges such as reliability, tool dependencies, and computational overhead are addressed with solutions like fine-tuning and hybrid architectures. Emerging paradigms, such as cursor-centric agent frameworks, are discussed as vital for seamless human-agent collaboration. We conclude with open research directions, including agent specialization and human-agent interaction ethics, emphasizing the potential of code agents to democratize complex programming tasks and accelerate the transition from natural language specifications to robust, executable systems.
Jian Yang is a researcher in Alibaba Group, Qwen team, where his primary research areas include multilingual natural language understanding and generation, pre-training and fine-tuning for large language model, software code understanding and generation, and software fuzzing. I have published multiple papers in top-tier conferences such as ICLR, NeurIPS, ICML, ACL, EMNLP, NAACL, COLING, WWW, AAAI, IJCAI, and WMT with over 6500+ citations on Google Scholar. During my Ph.D., I was deeply involved in academic-industrial collaboration, participating in a long-term joint Ph.D. program between Beihang University and Microsoft Research Asia. I won first place in three tracks of the multilingual translation competition at WMT 2021 (World Machine Translation, involving tens of thousands of translation directions). I have served as an Area Chair for conferences such as ACL, EMNLP, NAACL, and NeurIPS. During my tenure at Alibaba (as a member of the Alibaba Star Program), I was a core contributor to Qwen and QwenCoder, responsible for the code capabilities of the Qianwen model and the code-specific models CodeQwen1.5/Qwen2.5-Coder. I actively participate in open-source initiatives with the community and universities, including OpenCoder, RoleLLM, FullStackBench, SuperGPQA, YuE, TableBench, SimpleVQA, McEval, COIG-P, and AutoKaggle.
Speaker:
Jingyuan Sun
Title:
Brain Encoding and Decoding with Computational Linguistics
Computational linguistics (CL) has witnessed tremendous advancements in recent years, with models such as large language models demonstrating exceptional performance in various natural language processing tasks. These advancements highlight their potential in helping understand brain language processing, especially through the lens of brain encoding and decoding. Brain encoding involves the mapping of linguistic stimuli to brain activity, while brain decoding is the process of reconstructing linguistic stimuli from observed brain activities. CL models that excel at capturing and manipulating linguistic features are crucial for mapping linguistic stimuli to brain activities and vice versa. Brain encoding and decoding have vast applications, from enhancing human-computer interaction to developing assistive technologies for individuals with communication impairments. This tutorial will focus on elucidating how computational linguistics can facilitate brain encoding and decoding. We will delve into the principles and practices of using computational linguistics methods for brain encoding and decoding. We will also discuss the challenges and future directions of brain encoding and decoding. Through this tutorial, we aim to provide a comprehensive and informative overview of the intersection between computational linguistics and cognitive neuroscience, inspiring future research in this exciting and rapidly evolving field.
Jingyuan Sun is an Assistant Professor within the Department of Computer Science, the University of Manchester. He obtained his Ph.D. from the Institute of Automation, Chinese Academy of Sciences, where he focused on exploring natural language processing models to predict and decipher human brain activity related to language understanding, as well as improving computational models taking inspiration from the human brain. During his postdoctoral research at KU Leuven funded by a Horizon 2020 project, Jingyuan continued and expanded upon this interdisciplinary work, transferring insights between AI foundation models and brain in language and visual representations. Currently, his research interests lie in seeking causal explanation for the behavioural and neural alignment between brain and AI systems in language processing, and building brain-inspired multimodal foundational models. He published papers in and served as senior PC members or reviewers for top venues in AI and neuroscience, including NeurIPS, AAAI, IJCAI, ICML, ACL, EMNLP and TNNLS etc.
请点击以下链接或扫描二维码预订酒店:
https://www.hyterp.cn/website/1631/index.html?v=1749719701000&channel=