Research Projects
Explore our past work and discover new research opportunities in AI safety
Past SAIL Projects (Spring 2025)
Our members have contributed to cutting-edge AI safety research through semester projects at various EPFL labs.
OS-Harm
SAIL Authors: Thomas Kuntz, Agatha Duzan
Lab: Theory of Machine Learning Laboratory (TML)
Created a harmful capabilities benchmark for agents. Got accepted as a spotlight paper at NeurIPS 2025
Watermarking for LLMs
SAIL Authors: Joshua Cohen-Dumani
Lab: Natural Language Processing Lab (NLP)
This project explored synthetic text detection in open-source language models by studying whether watermarking patterns can be learned directly through fine-tuning. To do so, we built a research pipeline that generated custom datasets, applied contrastive training, and evaluated detectability using automated and model-based methods. The work contributes to understanding how watermarking could help mitigate misuse and disinformation in widely available LLMs.
Toxicity in LLMs
SAIL Authors: Léo Gabriel Paoletti
Lab: Natural Language Processing Lab (NLP)
Investigated the existence and cross-model transferability of multilingual prompts that evade toxicity detection yet trigger toxic outputs in LLMs. Benchmarked Apertus and identified limitations in state-of-the-art jailbreak and toxicity detection systems.
Memorization in LLMs
SAIL Authors: Arthur Wuhrmann
Lab: Natural Language Processing Lab (NLP)
Investigated how perplexity can help to detect verbatim memorization in LLMs output by identifying low-perplexity regions in generated text. We developed an open-source tool for detecting memorization patterns.
Available Projects at MLO Lab
The Machine Learning and Optimization (MLO) Lab offers exciting research opportunities in AI safety and related fields. These projects are open to students interested in contributing to cutting-edge research.
How to Apply
Students interested in doing a project at the MLO lab should apply through our centralized application form. Priority is given to Master Thesis Projects (full time).
- Grade sheet may be required if you haven't taken MLO courses
- Applications accepted at the start of each semester
- Limited number of projects available each semester
Large Language Models / Apertus Projects
Several projects around large language models / Apertus
- Data curation and curriculum learning for pre-training, theory and practice
- Scaling to longer context windows through modifications of attention
- Efficiency engineering for LLM pretraining and specifically MoE architectures
- Finetuning and alignment models on top of pretrained LLMs
- ...and more
Contact: Apply via the application form
Multilingual Data Curation
This project aims to advance state-of-the-art curation beyond baselines like FineWeb-2 HQ by implementing novel techniques for signal extraction and semantic filtering.
- Novel techniques for signal extraction and semantic filtering
- Impact of data quality on the 'curse of multilinguality'
- Relation to scaling laws
Contact: Bettina Messmer and Vinko Sabolčec
Learning to Optimize
Learns an optimizer on the fly by teaching a neural network (for example an RNN) to takes the current raw gradient as input, updates its state, and outputs an improved 'treated' gradient.
- Neural network-based optimization
- Collaborative learning with multiple agents
- Gradient processing and improvement
Contact: El Mahdi Chayti
Learning to Sample for SGD / Curriculum Learning
Use an auxiliary network that learns to assign scores to different data points in the training set, then use these scores to select an improved set of training points.
- Auxiliary network training
- Data point scoring and selection
- Training set optimization
Contact: El Mahdi Chayti
Landscape Analysis and Second-order Methods
Study the geometry of loss landscapes in deep neural networks and generalization properties of stationary points.
- Loss landscape visualization (2D/3D projections)
- Sharp vs wide minimum analysis
- First-order vs cubically regularized Newton trajectories
- Saddle point region identification
Contact: Nikita Doikov
Improving Factuality Understanding for LLM Training
Add a factuality tag/masking for each document to guide LM's understanding of what is factual and what is not.
- Factuality tagging and masking
- Reducing LLM hallucinations
- Post-training techniques
Contact: Dongyang Fan & Diba Hashemi
Build Decentralized ML in the Browser
Join our larger team project to build a decentralized (and federated) training software, where many clients can collaboratively train a joint ML model while respecting data privacy.
- Decentralized training algorithms
- Federated learning implementation
- Privacy-preserving techniques
- JavaScript/browser implementation
Interdisciplinary Opportunities
Cross-Lab Collaborations
We're open to interdisciplinary projects with other organizations (Academic, NGOs, Industry). We frequently collaborate with EPFL's Light Lab and other research groups.
Industry Partnerships
Interested in working on real-world AI safety challenges? We welcome partnerships with industry for practical applications of AI safety research.
Propose Your Own
Have an AI safety research idea? We're always excited to hear from students with their own project proposals that align with our mission.
Get Involved
Ready to contribute to AI safety research? Whether you're interested in our reading group, want to apply for a project, or just want to connect with our community, we'd love to hear from you.