AI Safety Reading Group
Exploring the latest research in AI safety, alignment, and interpretability
About Our Reading Group
Our reading group meets bi-weekly to discuss the latest papers in AI safety, alignment, and related fields. We cover both foundational work and cutting-edge research, providing a platform for deep technical discussions and collaborative learning.
📅 Schedule
When: Every other Wednesday, 18:30-20:00
Where: AI Center Lounge, EPFL
Format: Paper presentations followed by group discussion
📖 Paper Selection
We focus on high-impact papers from top-tier venues like NeurIPS, ICML, ICLR, and specialized AI safety conferences. Members can suggest papers for discussion through our Telegram group.
🎯 Focus Areas
Core Topics: Interpretability, robustness, alignment, reward modeling, scalable oversight, societal impact
Emerging Areas: Constitutional AI, RLHF, mechanistic interpretability, AI governance
👥 Who Should Join
Students, researchers, and professionals interested in AI safety. All experience levels welcome - from undergraduates curious about the field to PhD students working on related research.
Show Your Interest
Planning Next Sessions
We ran successful reading group sessions in Fall 2024 and are gauging interest for future sessions. If there's enough interest, we'll restart in the coming semester.
Help us plan: Fill out our interest form so we know when to schedule sessions, what topics to focus on, and how to structure the group.
Join our Telegram group for updates and discussions.
Previous Sessions (Fall 2024)
Here's what we covered in our previous reading group sessions to give you an idea of our discussions:
Paper: Strong Model Collapse (ICML '25)
Location: EPFL Campus
Paper: Utility Engineering-Analyzing and Controlling Emergent Value Systems in AIs ('25)
Location: CM09, EPFL
Paper: Stress Testing Deliberative Alignment for Anti-Scheming Training ('25)
Location: CM09, EPFL
Paper: Detecting Pretraining Data from Large Language Models (ICLR '24)
Location: CO019, EPFL
Paper: Chain-of-Thought Is Not Explainability
Location: CO-0XX (underground level), EPFL
Want More Sessions Like These?
Show your interest and we'll organize more reading group sessions based on demand!
Express InterestSuggested Reading
New to AI safety research? Here are some foundational papers and resources to get you started:
🌟 Foundational Papers
- Concrete Problems in AI Safety - Amodei et al.
- Risks from Learned Optimization - Hubinger et al.
- Training language models to follow instructions - Ouyang et al.
📚 Key Resources
- AI Safety Info - Comprehensive resource hub
- Alignment Forum - Research discussions
- AI Safety Fundamentals - Online course
🔬 Current Trends
- Constitutional AI and RLHF improvements
- Mechanistic interpretability techniques
- Scalable oversight and AI governance
- Robustness and adversarial examples
Contribute to the Discussion
💡 Suggest Papers
Found an interesting paper? Share it in our Telegram group or email us. We're always looking for relevant, high-quality research to discuss.
🎤 Present a Paper
Want to dive deep into a particular paper? Volunteer to present! It's a great way to thoroughly understand the work and share insights with the group.
📝 Discussion Notes
We take collaborative notes during sessions. Access to previous discussions and key insights available to registered members.
Get Involved
Ready to join our AI safety discussions? Whether you're new to the field or an experienced researcher, we welcome diverse perspectives and thoughtful engagement.