Projects

Past SAIL Projects (Spring 2025)

Our members have contributed to cutting-edge AI safety research through semester projects at various EPFL labs.

OS-Harm

SAIL Authors: Thomas Kuntz, Agatha Duzan

Lab: Theory of Machine Learning Laboratory (TML)

Created a harmful capabilities benchmark for agents. Got accepted as a spotlight paper at NeurIPS 2025

📄 ArXiv

Watermarking for LLMs

SAIL Authors: Joshua Cohen-Dumani

Lab: Natural Language Processing Lab (NLP)

This project explored synthetic text detection in open-source language models by studying whether watermarking patterns can be learned directly through fine-tuning. To do so, we built a research pipeline that generated custom datasets, applied contrastive training, and evaluated detectability using automated and model-based methods. The work contributes to understanding how watermarking could help mitigate misuse and disinformation in widely available LLMs.

Toxicity in LLMs

SAIL Authors: Léo Gabriel Paoletti

Lab: Natural Language Processing Lab (NLP)

Investigated the existence and cross-model transferability of multilingual prompts that evade toxicity detection yet trigger toxic outputs in LLMs. Benchmarked Apertus and identified limitations in state-of-the-art jailbreak and toxicity detection systems.

Memorization in LLMs

SAIL Authors: Arthur Wuhrmann

Lab: Natural Language Processing Lab (NLP)

Investigated how perplexity can help to detect verbatim memorization in LLMs output by identifying low-perplexity regions in generated text. We developed an open-source tool for detecting memorization patterns.

📄 ArXiv • 💻 Code

Interdisciplinary Opportunities

Industry Partnerships

Interested in working on real-world AI safety challenges? We welcome partnerships with industry for practical applications of AI safety research.

Propose Your Own

Have an AI safety research idea? We're always excited to hear from students with their own project proposals that align with our mission.

Get in touch →

Get Involved

Ready to contribute to AI safety research? Whether you're interested in our reading group, want to apply for a project, or just want to connect with our community, we'd love to hear from you.

Join Our Community Apply for Projects

Research Projects

Past SAIL Projects (Spring 2025)

OS-Harm

Watermarking for LLMs

Toxicity in LLMs

Memorization in LLMs

Interdisciplinary Opportunities

Industry Partnerships

Propose Your Own

Get Involved