CryptoCTF 2021 Lower WRITE-UP

An amazing CTF problem.

Challenge

This is a challenge related to learning with errors (LWE) problems. Basically, there is a secret vector s=(s1,,sn)\mathbf{s}=(s_1,\dots,s_n). Each time an LWE instance (a,b)(\mathbf a, b) is generated, where a=(a1,,an)\mathbf a=(a_1,\dots, a_n) is a random vector, and bb is defined by b=a,s+eb=\langle\mathbf a, \mathbf s\rangle+e, where ee is a error term. In classic LWE problems, ee is usually a small number, e.g. sampled from a narrow discrete gaussian distribution. But in this challenge, ee is sampled from EE, a set of random elements in Fq\mathbb F_{q}. This is not a huge problem, and we have different ways to deal with this. The real trouble is, we do not even know EE, and what we only know is E=5|E|=5. Now, given an oracle that generates LWE instances, we need to recover s\mathbf{s}.

Solution

I am quite familiar with this kind of problems. What arises in my mind is a paper: New Algorithms for Learning in Presence of Errors by Sanjeev Arora and Rong Ge. Their algorithm works perfectly for fixed, narrow error distribution. What they do is basically treat sis_i as variables, then a,s\langle\mathbf a, \mathbf s\rangle is a polynomial in s1,,sns_1, \dots, s_n. Furthermore, we note that

P(s)=eiE(ba,sei)P(\mathbf{s})=\prod_{e_i\in E}(b - \langle\mathbf a, \mathbf s\rangle-e_i)

is always zero. So as we gather LWE instances, we actually gather a collection of polynomial equations P1(s)=0,,Pm(s)=0P_1(\mathbf s)=0,\dots, P_m(\mathbf s)=0. Just solve these equations and we're done.

Unfortunately, solving polynomial equations is still NP-hard. What Arora and Ge do next is they use more variables, say tjt_j, to represent any monomial in s1,,sns_1,\dots,s_n. For example, tjt_j may equals to s1α1s2α2snαns_1^{\alpha_1}s_2^{\alpha_2}\dots s_n^{\alpha_n}. By simple combinatorics, we know the number of such tjt_j is (n+55)\binom{n+5}{5}, which is not too large. So now the only thing to do is to solve a linear system, which is easy. They proved that with high probability, the only solution of the linear system corresponds to the secret vector. Now it seems we have solved the problem.

But wait, we don't know EE! To deal with this, I come up with a simple idea: just make eie_i variables, too. Then eiE(ba,sei)\prod_{e_i\in E}(b - \langle\mathbf a, \mathbf s\rangle-e_i) is now a polynomial in si,ejs_i, e_j for 1in1\le i\le n and 1j51\le j\le 5. Linearize the new polynomials, then we can solve the linear system to recover s\mathbf{s} (and EE). A trick is that note e1,e2,,e5e_1,e_2,\dots, e_5 are interchangeable. To cancel the permutation of these variables, the polynomial I use is actually

P(s,e)=(ba,se)5P(\mathbf s, e)=(b - \langle\mathbf a, \mathbf s\rangle-e)^5

Here the variable ee is not a certain error anymore, but rather a placeholder for any error term. In this way, the kernel of the system is of dimension 11, exactly what we want.

Unfortunately, there's a remaining flaw, that the number of variables after the linearization is 6578065780, a little larger than 6000060000. The contest only provides 6000060000 instances, and due to some unexpected effects, we can not get unlimited access to the oracle. But after some discussion with the manager of the contest, I believe the effects are not intended and we should have been able to get all the 6578065780 instances. Thus, this should have been a valid solution.

Code

from sage.all import *
from tqdm import tqdm

q = 127
n = 20
r = 5

def enc(z):
for con, mon in z:
return (con, mon.degrees())

def dec(p, l):
cons, degs = p
for i, j in zip(degs, l):
if i != 0:
cons *= F(j)**i
return cons

F = GF(q)
Q = F[','.join([f'a{i}' for i in range(n)] + ['b'])]
aas = vector(Q.gens()[:-1])
bb = Q.gens()[-1]
T = Q[','.join([f'x{i}' for i in range(n)] + ['e'])]
g = vector(T.gens()[:-1])
ee = T.gens()[-1]
y = prod(1 + sum(g) + ee for _ in range(r)).monomials()
ff = prod(bb - aas * g - ee for _ in range(r))
pat = [ff.monomial_coefficient(z) for z in y]

ppat = [enc(p) for p in pat]
test = set(deg for _, deg in ppat)
print(len(y))

inss = []
with open('data.txt', 'r') as f:
for line in f.readlines():
_, ins = line.split(':')
ins = eval(ins)
inss.append(ins)

m = len(y)

cs = []

for i in tqdm(range(m)):
a, b = inss[i]
a.append(b)
cs.append([dec(p, a) for p in ppat])

cs = matrix(cs)
A = cs.right_kernel_matrix()

v = vector(A)
v /= v[-1]

print(''.join(chr(i) for i in v[-2 - n:-2]))

Update

I have newly come up with a way to tackle the limitation on the number of instances. More specifically, each time you start a conversation with the oracle, the errors are randomly generated like this: E = [getRandomRange(0, p) for _ in range(5)]. This means in each conversation the errors are re-generated, so we can not just concatenate the instances from two conversations to obtain a larger bunch of instances. However, this line of code turns out to be my savior. As each element is selected at random in EE independently, there is a 0.2%0.2\% chance that there are only 33 different elements. When E=3|E|=3, we only need 20242024 instances and it is super easy to solve the problem. Thus, I can just repeatedly get a bunch of instances and try to solve the problem with E=3|E|=3. This approach succeeded after 136 tries. The flag is CCTF{._h0W_5OLv3_LWE?!_:.}.

from sage.all import *
from tqdm import tqdm
import pwn

server = 'some_ip_addr'

def get_new_batch(n):
n = n // 12 + 1

conn = pwn.remote(server, 30010)

l = []

for i in range(n):
conn.sendline(b'Q')
for i in range(n):
conn.recvuntil(b'it\n')
for i in range(12):
s = conn.recvline().decode()
l.append(s)

conn.sendline(b'E')
conn.close()
return l

def enc(z):
for con, mon in z:
return (con, mon.degrees())

def dec(p, l):
cons, degs = p
for i, j in zip(degs, l):
if i != 0:
cons *= F(j)**i
return cons

q = 127
n = 20
r = 3

F = GF(q)
Q = F[','.join([f'a{i}' for i in range(n)] + ['b'])]
aas = vector(Q.gens()[:-1])
bb = Q.gens()[-1]
T = Q[','.join([f'x{i}' for i in range(n)] + ['e'])]
g = vector(T.gens()[:-1])
ee = T.gens()[-1]
y = prod(1 + sum(g) + ee for _ in range(r)).monomials()
ff = prod(bb - aas * g - ee for _ in range(r))
pat = [ff.monomial_coefficient(z) for z in y]
ppat = [enc(p) for p in pat]
test = set(deg for _, deg in ppat)

print(len(y))
m = len(y)

count = 0
while True:
print(f'Round {count}')
count += 1

inss = []
for line in get_new_batch(m):
_, ins = line.split(':')
ins = eval(ins)
inss.append(ins)

cs = []

for i in tqdm(range(m)):
a, b = inss[i]
a.append(b)
cs.append([dec(p, a) for p in ppat])


cs = matrix(cs)
print(cs.rank())
if (cs.rank() == m - 1):
A = cs.right_kernel_matrix()

v = vector(A)
v /= v[-1]

print(''.join(chr(i) for i in v[-2 - n:-2]))
break

Leave a Reply

Your email address will not be published. Required fields are marked *