AI Safety Reading Group
Exploring the latest research in AI safety, alignment, and interpretability
About Our Reading Group
Our reading group meets bi-weekly to discuss the latest papers in AI safety, alignment, and related fields. We cover both foundational work and cutting-edge research, providing a platform for deep technical discussions and collaborative learning.
📅 Schedule
When: Every other Wednesday, 18:30-20:00
Where: AI Center Lounge, EPFL
Format: Paper presentations followed by group discussion
📖 Paper Selection
We focus on high-impact papers from top-tier venues like NeurIPS, ICML, ICLR, and specialized AI safety conferences. Members can suggest papers for discussion through our Telegram group.
🎯 Focus Areas
Core Topics: Interpretability, robustness, alignment, reward modeling, scalable oversight, societal impact
Emerging Areas: Constitutional AI, RLHF, mechanistic interpretability, AI governance
👥 Who Should Join
Students, researchers, and professionals interested in AI safety. All experience levels welcome - from undergraduates curious about the field to PhD students working on related research.
Join Us
Next Session: May 27, 2026
Join our Telegram group to find out which paper will be discussed and get all the details.
Previous Sessions (Fall 2025)
Here's what we covered in our previous reading group sessions to give you an idea of our discussions:
Bi-weekly reading group session discussing current AI safety research and papers.
Location: EPFL Campus
Bi-weekly reading group session discussing current AI safety research and papers.
Location: EPFL Campus
Bi-weekly reading group session discussing current AI safety research and papers.
Location: EPFL Campus
Bi-weekly reading group session discussing current AI safety research and papers.
Location: EPFL Campus
Paper: Strong Model Collapse (ICML '25)
Location: EPFL Campus
Paper: Utility Engineering-Analyzing and Controlling Emergent Value Systems in AIs ('25)
Location: CM09, EPFL
Paper: Stress Testing Deliberative Alignment for Anti-Scheming Training ('25)
Location: CM09, EPFL
Paper: Detecting Pretraining Data from Large Language Models (ICLR '24)
Location: CO019, EPFL
Paper: Chain-of-Thought Is Not Explainability
Location: CO-0XX (underground level), EPFL
Suggested Reading
New to AI safety research? Here are some foundational papers and resources to get you started:
🌟 Foundational Papers
- Concrete Problems in AI Safety - Amodei et al.
- Risks from Learned Optimization - Hubinger et al.
- Training language models to follow instructions - Ouyang et al.
📚 Key Resources
- AI Safety Info - Comprehensive resource hub
- Alignment Forum - Research discussions
- AI Safety Fundamentals - Online course
🔬 Current Trends
- Constitutional AI and RLHF improvements
- Mechanistic interpretability techniques
- Scalable oversight and AI governance
- Robustness and adversarial examples
Contribute to the Discussion
💡 Suggest Papers
Found an interesting paper? Share it in our Telegram group or email us. We're always looking for relevant, high-quality research to discuss.
🎤 Present a Paper
Want to dive deep into a particular paper? Volunteer to present! It's a great way to thoroughly understand the work and share insights with the group.
📝 Discussion Notes
We take collaborative notes during sessions. Access to previous discussions and key insights available to registered members.
Get Involved
Ready to join our AI safety discussions? Whether you're new to the field or an experienced researcher, we welcome diverse perspectives and thoughtful engagement.