Past SAIL Projects (Spring 2025)

Our members have contributed to cutting-edge AI safety research through semester projects at various EPFL labs.

OS-Harm

SAIL Authors: Thomas Kuntz, Agatha Duzan

Lab: Theory of Machine Learning Laboratory (TML)

Created a harmful capabilities benchmark for agents. Got accepted as a spotlight paper at NeurIPS 2025

Watermarking for LLMs

SAIL Authors: Joshua Cohen-Dumani

Lab: Natural Language Processing Lab (NLP)

This project explored synthetic text detection in open-source language models by studying whether watermarking patterns can be learned directly through fine-tuning. To do so, we built a research pipeline that generated custom datasets, applied contrastive training, and evaluated detectability using automated and model-based methods. The work contributes to understanding how watermarking could help mitigate misuse and disinformation in widely available LLMs.

Toxicity in LLMs

SAIL Authors: Léo Gabriel Paoletti

Lab: Natural Language Processing Lab (NLP)

Investigated the existence and cross-model transferability of multilingual prompts that evade toxicity detection yet trigger toxic outputs in LLMs. Benchmarked Apertus and identified limitations in state-of-the-art jailbreak and toxicity detection systems.

Memorization in LLMs

SAIL Authors: Arthur Wuhrmann

Lab: Natural Language Processing Lab (NLP)

Investigated how perplexity can help to detect verbatim memorization in LLMs output by identifying low-perplexity regions in generated text. We developed an open-source tool for detecting memorization patterns.

Available Projects at MLO Lab

The Machine Learning and Optimization (MLO) Lab offers exciting research opportunities in AI safety and related fields. These projects are open to students interested in contributing to cutting-edge research.

How to Apply

Students interested in doing a project at the MLO lab should apply through our centralized application form. Priority is given to Master Thesis Projects (full time).

  • Grade sheet may be required if you haven't taken MLO courses
  • Applications accepted at the start of each semester
  • Limited number of projects available each semester

Apply Now

Large Language Models / Apertus Projects

Several projects around large language models / Apertus

Focus Areas:
  • Data curation and curriculum learning for pre-training, theory and practice
  • Scaling to longer context windows through modifications of attention
  • Efficiency engineering for LLM pretraining and specifically MoE architectures
  • Finetuning and alignment models on top of pretrained LLMs
  • ...and more

Contact: Apply via the application form

Multilingual Data Curation

This project aims to advance state-of-the-art curation beyond baselines like FineWeb-2 HQ by implementing novel techniques for signal extraction and semantic filtering.

Focus Areas:
  • Novel techniques for signal extraction and semantic filtering
  • Impact of data quality on the 'curse of multilinguality'
  • Relation to scaling laws

Contact: Bettina Messmer and Vinko Sabolčec

Learning to Optimize

Learns an optimizer on the fly by teaching a neural network (for example an RNN) to takes the current raw gradient as input, updates its state, and outputs an improved 'treated' gradient.

Focus Areas:
  • Neural network-based optimization
  • Collaborative learning with multiple agents
  • Gradient processing and improvement

Contact: El Mahdi Chayti

Learning to Sample for SGD / Curriculum Learning

Use an auxiliary network that learns to assign scores to different data points in the training set, then use these scores to select an improved set of training points.

Focus Areas:
  • Auxiliary network training
  • Data point scoring and selection
  • Training set optimization

Contact: El Mahdi Chayti

Landscape Analysis and Second-order Methods

Study the geometry of loss landscapes in deep neural networks and generalization properties of stationary points.

Focus Areas:
  • Loss landscape visualization (2D/3D projections)
  • Sharp vs wide minimum analysis
  • First-order vs cubically regularized Newton trajectories
  • Saddle point region identification

Contact: Nikita Doikov

Improving Factuality Understanding for LLM Training

Add a factuality tag/masking for each document to guide LM's understanding of what is factual and what is not.

Focus Areas:
  • Factuality tagging and masking
  • Reducing LLM hallucinations
  • Post-training techniques

Contact: Dongyang Fan & Diba Hashemi

Build Decentralized ML in the Browser

Practical Project

Join our larger team project to build a decentralized (and federated) training software, where many clients can collaboratively train a joint ML model while respecting data privacy.

Focus Areas:
  • Decentralized training algorithms
  • Federated learning implementation
  • Privacy-preserving techniques
  • JavaScript/browser implementation

Contact: Martin Jaggi

Interdisciplinary Opportunities

Cross-Lab Collaborations

We're open to interdisciplinary projects with other organizations (Academic, NGOs, Industry). We frequently collaborate with EPFL's Light Lab and other research groups.

Industry Partnerships

Interested in working on real-world AI safety challenges? We welcome partnerships with industry for practical applications of AI safety research.

Propose Your Own

Have an AI safety research idea? We're always excited to hear from students with their own project proposals that align with our mission.

Get in touch →

Get Involved

Ready to contribute to AI safety research? Whether you're interested in our reading group, want to apply for a project, or just want to connect with our community, we'd love to hear from you.

Join Our Community Apply for Projects