Malicious JARs and Polyglot files: “Who do you think you JAR?”

Throughout 2022, Deep Instinct observed various combinations of polyglot files with malicious JARs.

The initial technique dates to around 2018 when it used signed MSI files to bypass Microsoft code signing verification. A year later, in 2019, Virus Total wrote about the MSI+JAR polyglot technique. Microsoft decided not to fix the issue at that time. Then in 2020, this technique was again abused in malicious campaigns and Microsoft assigned CVE-2020-1464 to address the issue.

Despite being fixed, Deep Instinct observed in 2022 that the technique was still in use and included new types of polyglots which do not necessarily exploit the CVE; rather, attackers now use the polyglot technique to confuse security solutions that don’t properly validate the JAR file format.

What is polyglot?

A polyglot file is created by combining two or more file formats together in such a way that each format can be interpreted individually without an error.

This is a little tricky as you can’t make polyglots from any arbitrary combination of file formats, although there are quite a few options to choose from.

Why JAR?

JAR files are essentially ZIP archives. What’s special about ZIP files is that they’re identified by the presence of an end-of-central directory record which is located at the end of the archive. This means that any “junk” we append in the beginning of the file will be ignored and the archive is still valid.

Other file formats have a special magic header at the beginning of the file, and they should be read from the start, unlike JAR. One of these formats is MSI.

If those two are combined, we receive a file that is both a valid MSI and a valid JAR.

The 2022 Campaigns

Throughout 2022, Deep Instinct observed StrRAT and Ratty samples being distributed as polyglots or as JAR files with junk appended in the beginning.

Both RATs are known threats. For StrRAT, there is a configuration extractor. For Ratty, we wrote a config extractor of our own, which we shared with the community in GitHub: https://github.com/deepinstinct/RattyConfigExtractor

While we couldn’t determine whether the files belong to a single threat actor or to several threat actors, the usage of the Bulgarian hosting “BelCloud LTD” has been observed in many of the samples.

Additionally, we found a few Ratty and StrRAT samples that shared the same C2 server.

For simplicity we will categorize the files by what is appended to them.

#1 MSI:

Sample 5e288df18d5f3797079c4962a447509fd4a60e9b76041d0b888bcf32f8197991

If we inspect the file with the Linux file command, we see it’s an MSI file:

Figure 1: Linux file identifies sample as MSI file

However, the file contains a valid JAR as well. Our Ratty configuration extractor was able to successfully extract the C2 server information.

This sample has been observed being sent using Sendgrid. While this is not unique, some of the StrRAT samples that were MSI+JAR polyglots have been sent via Sendgrid as well.

Some of the StrRAT samples using the MSI+JAR polyglot have been observed spreading with URL shortening services such as cutt.ly and rebrand.ly. In some cases, the samples were hosted on Discord.

#2 CAB:

CAB files have a unique magic header and are interpreted from the beginning to the end, which makes them a perfect candidate for a polyglot with JAR files.

We have observed both Ratty and StrRAT samples using CAB+JAR polyglots.

In particular, we found the Ratty sample f620c4f59db31c7f63e8fde3016a33b3bfb3934c17874dcfae52ca01e23f14de and the StrRAT sample 2f4c6eb0a307657fb46f4a8f6850842d75c1535a0ed807cd3da6b6678102e571 both using the CAB+JAR polyglot and using the same C2 server, donutz.ddns[.]net

Figure 2: StrRAT and Ratty (CAB+JAR) linked to the same C2 server donutz.ddns[.]net

#3 HEX Trash / Fake PE:

We are not sure if the files were made by mistake or on purpose by the threat actor, but those JAR files have HEX values appended to them:

Figure 3: Sample 19154b831614211de667c2aedd6a4b5b89d4bfc1e129eb402a6300ad2e156dcf with HEX values appended.

The appended textual hex value is of an executable file. While those files are not polyglots because the appended hex is just “junk” text, the Linux “file” command would return “data” such as the file type.

In this way attackers can confuse and bypass security solutions that would simply check the type of file using the Linux “file” command.

#4 Binary Junk:

Figure 4: Sample 8d801f58d10dbcd52739fa35aa862286c3fe9606411f0e5f7b8b3fd71f678cad with appended binary data.

This variation is once again not truly a polyglot, and it is not clear if this is a mistake or if it was done on purpose. The appended content seems to be a part of some binary, but the appended data appears to be missing the beginning of the appended file.

The result is the same: a valid JAR file that is detected as “data” by checking the file type.

Is this effective? (YES!)

While Ratty and StrRAT are well known threats, some of those appended JAR files receive very low detections in VirusTotal:

Figure 5: Low detection rate for CAB+JAR polyglot

The VirusTotal detection rate doesn’t tell the true capabilities of specific vendors as some products behave differently when files are executed dynamically.

That said, we can clearly see that JAR files with appended content can stay undetected, at least until executed, even for known threats.

Can this be any worse? Yes, my friends, it can…

Let’s think about this from a detection engineering perspective. If we failed to identify a JAR file because we relied on the simple file type check, like Linux “file” command, the easy solution would be just to “scan” every file with the JAR extension. If you want to, you can even throw a validation that the file is a valid JAR by checking the presence of the end of a central directory record at the end of the file.

However, this is still not sufficient. JAVA does not care about the file extension and will happily execute a valid JAR file with any extension. The only downfall in renaming the “.jar” extension is that the JAR needs to be executed from the command line instead of simply clicking on it to execute.

We have created an HTML+JAR polyglot to demonstrate this:

Figure 6: HTML+JAR polyglot with html extension properly rendered in the browser and properly executed by JRE

This file is also available in our GitHub along with the Ratty malware configuration extractor.

Conclusion

The proper detection for JAR files should be both static and dynamic. It’s inefficient to scan every file for the presence of an end of central directory record at the end of the file.

Defenders should monitor both “java” and “javaw” processes. If such a process has “-jar” as an argument the filename passed as an argument should be treated as a JAR file regardless of the file extension or the output of the Linux “file” command. Incident responders and the cyber community are welcome to use the Ratty malware configuration parser provided from Deep Instinct in case of infection to get to the IOC fast and quickly move on to remediation steps.

MITRE ATT&CK:

Tactic	Technique	Description	Observable
Initial Access	T1566.002 Phishing: Spearphishing Link	Attackers use URL shortening services that leads to the payload.	Rebrand[.]ly/afjlfvp
Defense Evasion	T1036.001 Masquerading: Invalid Code Signature	Attackers append a signed MSI file.	85d8949119dad6215ae0a21261b037af
Defense Evasion	T1027.001 Obfuscated Files or Information: Binary Padding	Attackers append junk data at the beginning of the file causing “file” command to return a different file type.	cb17f27671c01cd27a6828faaac08239
Command and Control	T1102 Web Service	Attackers used Discord’s content delivery network (CDN) to deliver malware.	https://cdn[.]discordapp[.]com/attachments/938795529683480586/941658014962823208/Package_info[.]jar