Did you know that PDF files are valid even if the first 3 bytes aren’t “PDF”? A key to what makes a PDF “valid” is the second-to-last object in the PostScript structure, which points to the object that should be processed first (usually Page 1). Being almost at the end of the file, it makes it impossible to stream PDFs (portable my ass), also you can make a PDF that is also a PNG, HTML or almost whatever else if you change the extension. Even look different on different PDF readers!
Identifying both is hard.
Did you know that PDF files are valid even if the first 3 bytes aren’t “PDF”? A key to what makes a PDF “valid” is the second-to-last object in the PostScript structure, which points to the object that should be processed first (usually Page 1). Being almost at the end of the file, it makes it impossible to stream PDFs (portable my ass), also you can make a PDF that is also a PNG, HTML or almost whatever else if you change the extension. Even look different on different PDF readers!
This is all explored in Funky File Formats, a 31c3 talk by Ange Albertini (and the follow-up 2024 talk Fearsome FIle Formats). (YouTube mirror)
Oh a reference to CCC talks, are we friends now? I think we’re friends now!
The video was seen by over 30k people… but I’m actually considering going to 39c3, maybe we’ll see each other