An amazing CTF problem.
Challenge
This is a challenge related to learning with errors (LWE) problems. Basically, there is a secret vector . Each time an LWE instance is generated, where is a random vector, and is defined by , where is a error term. In classic LWE problems, is usually a small number, e.g. sampled from a narrow discrete gaussian distribution. But in this challenge, is sampled from , a set of random elements in . This is not a huge problem, and we have different ways to deal with this. The real trouble is, we do not even know , and what we only know is . Now, given an oracle that generates LWE instances, we need to recover .
Solution
I am quite familiar with this kind of problems. What arises in my mind is a paper: New Algorithms for Learning in Presence of Errors by Sanjeev Arora and Rong Ge. Their algorithm works perfectly for fixed, narrow error distribution. What they do is basically treat as variables, then is a polynomial in . Furthermore, we note that
is always zero. So as we gather LWE instances, we actually gather a collection of polynomial equations . Just solve these equations and we're done.
Unfortunately, solving polynomial equations is still NP-hard. What Arora and Ge do next is they use more variables, say , to represent any monomial in . For example, may equals to . By simple combinatorics, we know the number of such is , which is not too large. So now the only thing to do is to solve a linear system, which is easy. They proved that with high probability, the only solution of the linear system corresponds to the secret vector. Now it seems we have solved the problem.
But wait, we don't know ! To deal with this, I come up with a simple idea: just make variables, too. Then is now a polynomial in for and . Linearize the new polynomials, then we can solve the linear system to recover (and ). A trick is that note are interchangeable. To cancel the permutation of these variables, the polynomial I use is actually
Here the variable is not a certain error anymore, but rather a placeholder for any error term. In this way, the kernel of the system is of dimension , exactly what we want.
Unfortunately, there's a remaining flaw, that the number of variables after the linearization is , a little larger than . The contest only provides instances, and due to some unexpected effects, we can not get unlimited access to the oracle. But after some discussion with the manager of the contest, I believe the effects are not intended and we should have been able to get all the instances. Thus, this should have been a valid solution.
Code
from sage.all import *
from tqdm import tqdm
q = 127
n = 20
r = 5
def enc(z):
for con, mon in z:
return (con, mon.degrees())
def dec(p, l):
cons, degs = p
for i, j in zip(degs, l):
if i != 0:
cons *= F(j)**i
return cons
F = GF(q)
Q = F[','.join([f'a{i}' for i in range(n)] + ['b'])]
aas = vector(Q.gens()[:-1])
bb = Q.gens()[-1]
T = Q[','.join([f'x{i}' for i in range(n)] + ['e'])]
g = vector(T.gens()[:-1])
ee = T.gens()[-1]
y = prod(1 + sum(g) + ee for _ in range(r)).monomials()
ff = prod(bb - aas * g - ee for _ in range(r))
pat = [ff.monomial_coefficient(z) for z in y]
ppat = [enc(p) for p in pat]
test = set(deg for _, deg in ppat)
print(len(y))
inss = []
with open('data.txt', 'r') as f:
for line in f.readlines():
_, ins = line.split(':')
ins = eval(ins)
inss.append(ins)
m = len(y)
cs = []
for i in tqdm(range(m)):
a, b = inss[i]
a.append(b)
cs.append([dec(p, a) for p in ppat])
cs = matrix(cs)
A = cs.right_kernel_matrix()
v = vector(A)
v /= v[-1]
print(''.join(chr(i) for i in v[-2 - n:-2]))
Update
I have newly come up with a way to tackle the limitation on the number of instances. More specifically, each time you start a conversation with the oracle, the errors are randomly generated like this: E = [getRandomRange(0, p) for _ in range(5)]
. This means in each conversation the errors are re-generated, so we can not just concatenate the instances from two conversations to obtain a larger bunch of instances. However, this line of code turns out to be my savior. As each element is selected at random in independently, there is a chance that there are only different elements. When , we only need instances and it is super easy to solve the problem. Thus, I can just repeatedly get a bunch of instances and try to solve the problem with . This approach succeeded after 136 tries. The flag is CCTF{._h0W_5OLv3_LWE?!_:.}
.
from sage.all import *
from tqdm import tqdm
import pwn
server = 'some_ip_addr'
def get_new_batch(n):
n = n // 12 + 1
conn = pwn.remote(server, 30010)
l = []
for i in range(n):
conn.sendline(b'Q')
for i in range(n):
conn.recvuntil(b'it\n')
for i in range(12):
s = conn.recvline().decode()
l.append(s)
conn.sendline(b'E')
conn.close()
return l
def enc(z):
for con, mon in z:
return (con, mon.degrees())
def dec(p, l):
cons, degs = p
for i, j in zip(degs, l):
if i != 0:
cons *= F(j)**i
return cons
q = 127
n = 20
r = 3
F = GF(q)
Q = F[','.join([f'a{i}' for i in range(n)] + ['b'])]
aas = vector(Q.gens()[:-1])
bb = Q.gens()[-1]
T = Q[','.join([f'x{i}' for i in range(n)] + ['e'])]
g = vector(T.gens()[:-1])
ee = T.gens()[-1]
y = prod(1 + sum(g) + ee for _ in range(r)).monomials()
ff = prod(bb - aas * g - ee for _ in range(r))
pat = [ff.monomial_coefficient(z) for z in y]
ppat = [enc(p) for p in pat]
test = set(deg for _, deg in ppat)
print(len(y))
m = len(y)
count = 0
while True:
print(f'Round {count}')
count += 1
inss = []
for line in get_new_batch(m):
_, ins = line.split(':')
ins = eval(ins)
inss.append(ins)
cs = []
for i in tqdm(range(m)):
a, b = inss[i]
a.append(b)
cs.append([dec(p, a) for p in ppat])
cs = matrix(cs)
print(cs.rank())
if (cs.rank() == m - 1):
A = cs.right_kernel_matrix()
v = vector(A)
v /= v[-1]
print(''.join(chr(i) for i in v[-2 - n:-2]))
break