EPFL Safe AI Lausanne

Building a safer future through responsible AI development and research. We are a community of students, researchers, and professionals at EPFL working towards ensuring artificial intelligence systems are developed safely and beneficially.

Get Involved!

Our Mission

We believe that as AI systems become more powerful, ensuring their safety and alignment with human values becomes increasingly critical. Our goal is to foster research, education, and community engagement around AI safety at EPFL and beyond.

📚 Reading group

We host a reading group on EPFL campus every other Wednesday starting October 8th. Please register following this link.

🧠 Research & Discussion

We organize workshops, seminars, and we do independent research to help members understand the technical foundations of AI safety research.

🤝 Community

Connect with like-minded individuals, collaborate on projects, and contribute to the growing AI safety research community at EPFL.

đŸ”Ŧ Hands-on Projects

Work on practical AI safety research projects, from interpretability studies to robustness evaluations and alignment techniques.

Reading Group

Our weekly reading group meets to discuss the latest papers in AI safety, alignment, and related fields. We cover both foundational work and cutting-edge research.

📅 Weekly Meetings

Every other Wednesday, 18:30-20:00 in AI Center Lounge. We alternate between paper presentations and open discussions on current AI safety topics.

📖 Paper Selection

We focus on high-impact papers from top venues. Members can suggest papers for discussion.

đŸŽ¯ Focus Areas

Interpretability, robustness, alignment, reward modeling, scalable oversight, societal impact, and other key areas of AI safety research.

Past projects (Spring 2025)

Gain hands-on experience with AI safety research through structured semester projects. Work individually or in teams under the guidance of experienced researchers.

🔍 OS-Harm

SAIL Authors: Thomas Kuntz, Agatha Duzan

Lab: Theory of Machine Learning Laboratory (TML)

Created a harmful capabilities benchmark for agents. Got accepted as a spotlight paper at NeurIPS 2025

🔗 ArXiv

🔍 Watermarking for LLMs

SAIL Author: Joshua Cohen-Dumani

Lab: Natural Language Processing Lab (NLP)

This project explored synthetic text detection in open-source language models by studying whether watermarking patterns can be learned directly through fine-tuning. To do so, we built a research pipeline that generated custom datasets, applied contrastive training, and evaluated detectability using automated and model-based methods. The work contributes to understanding how watermarking could help mitigate misuse and disinformation in widely available LLMs.

🔍 Toxicity in LLMs

SAIL Author: LÊo Gabriel Paoletti

Lab: Natural Language Processing Lab (NLP)

Investigated the existence and cross-model transferability of multilingual prompts that evade toxicity detection yet trigger toxic outputs in LLMs. Benchmarked Apertus and identified limitations in state-of-the-art jailbreak and toxicity detection systems.

🔍 Memorization in LLMs

SAIL Author: Arthur Wuhrmann

Lab: Natural Language Processing Lab (NLP)

Investigated how perplexity can help to detect verbatim memorization in LLMs output by identifying low-perplexity regions in generated text. We developed an open-source tool for

🔗 ArXiv and 🔗 Code link

Upcoming Events

October 31th-November 2nd, 2025
AI Safety Hackathon

48-hour hackathon focused on developing predictive models and forecasting methodologies to anticipate AI development timelines and capability advancements.

Get In Touch

Join our community and contribute to making AI safer for everyone. We welcome students, researchers, and anyone interested in AI safety.

Meetings

BC Cafeteria, Friday 12pm-1pm