Eliciting Latent Knowledge

18 Aug 2022

My friends Uzay Girit, Pranav Gade and I recently participated in an ELK (Eliciting Latent Knowledge) contest held by Prometheus Science with a prize pool of $100,000. Our Predicting The Predictor proposal won 2nd place.

Suppose we train a model to predict what the future will look like according to cameras and other sensors. We then use planning algorithms to find a sequence of actions that lead to predicted futures that look good to us.

But some action sequences could tamper with the cameras so they show happy humans regardless of what’s really happening. More generally, some futures look great on camera but are actually catastrophically bad.

In these cases, the prediction model “knows” facts (like “the camera was tampered with”) that are not visible on camera but would change our evaluation of the predicted future if we learned them. How can we train this model to report its latent knowledge of off-screen events?

Contrary to Neural Network interpretability/transparency focused on low-level neurons or circuits, ELK focuses on generating natural language descriptions that reflect the model’s “true beliefs”. In short, this AI-alignment problem is trying to think of ways to get an AI to tell the truth in a contrived hypothetical situation. Here are our two proposals:

Continue Reading »

CryptoGolf

24 Apr 2021

Challenge Description:

Description: 
nc challs.m0lecon.it 11000

Foreword

CryptoGolf was the first challenge of a series of very fun and interesting cryptography challenges that my teammates and I solved in m0leconCTF World Quals where we placed 5th globally in the Open Division, qualifying for the Grand Finals in Turin, Italy.

Continue Reading »

Access=0000

17 Jul 2020

This is a writeup for a crypto challenge in RACTF 2020, where we placed 6th.

Challenge Description:

Challenge instance ready at 95.216.233.106:57735

We found a strange service, it looks like you can generate an access token for the network service, but you shouldn't be able to read the flag... We think.

Solving :

We are given access.py. Lets take a look the server file to see what the program does.

From the top, we see that get_flag:

Continue Reading »

Really Smart Acronym

16 Jul 2020

Challenge Description:

Man, oracles are weird.

nc challenges1.hexionteam.com 5000

Solving :

Really Smart Acronym, of course, is RSA. Looking at the code, it uses PyCrypto to generate a RSA key to encrypt the flag. You also get one encryption and 1024 decrypts, but you only get the last bit of the decrypts. At first we thought it could be Franklin-Reiter related-message attack, but there is not enough information for that.

Continue Reading »

S.S.S.

15 Jul 2020

This is a writeup for HexionCTF 2020, where we placed third.

Challenge Description:

Math is so beautiful and can always be used for cryptographic
encryption!
nc challenges1.hexionteam.com 5001

Solving :

We are given an sss.py. See here for source.

We found that SSS stands for Shamir’s Secret Sharing by copy-pasting the loop from eval_at, which brought me to this Wikipedia Page. Shamir Secret Sharing is based on polynomials and lagrange interpolation.

Continue Reading »

The Haven

Eliciting Latent Knowledge

CryptoGolf

Challenge Description:

Foreword

Access=0000

Challenge Description:

Solving :

Really Smart Acronym

Challenge Description:

Solving :

S.S.S.

Challenge Description:

Solving :