A malware is a malicious program file. It can be a standalone executable program file or malicious code embedded into other files, such as macro malware embedded into office files or PDF files, or script malicious code embedded into email or web pages. It typically contains at least 2 main ligic parts, replicating(spreading/infecting) part, and payload (business logic) part. Some malware may contain some additional logic parts, such as hiding part to cover its trace, anti-scanning, self protection, etc. The following lists mostly used malware detection methods and procs/cons:
- By hashing value of the entirely or portion of the binary malware file, such as md5, sha1 or sha256. This method is fast, accurate, and direct. The problem is that it cannot detect the unknown malware, and it's easy to be defeated since small change can result in different hash value.
- By signature of the malware file. Singature can be a binary string abstracted from the malware file in different locations, or malware writer embedded self recognizable string to avoid repeating infection, or special name for embedded mutex object, etc. This method is fast, accurate, and efficient, very widely used. The problem is that it can detect the malware only after the signature is available, and it requires constant updates.
- By heuristic scanning, i.e. adding some extra rules, malware metadata abstraction, fuzzy logic, etc. trying to increase sygnature scanning capability, to make one signature detects more than one malware. This method greatly increased signature detecting capability, but it also introduced some false positives. At the end, it is still not enough to detect unknown (never seen it before) malware.
- By malware family classification, through some mathmatical methods, learning individual malware file's structure, binary code pattern, etc, once it passes certain threthold, classify the malware as a variant of existing malware family. This method can be very efficient in detectin new malware variant, but it still relys existing malware knowledge. It cannot detect new malware family member.
- By malware behavior, through malware code executed in sandbox to demostrate its behavior to detect malware. This method is powerful in detecting new unknown malware and has been widely used. The challeng are that it is slow, usually needs to wait for the code completes its execution within sandbox. It may also produce false positives, largely depending on behavioral rule sets. Some utility program has very similar behavior as malware.
- By IOC (Indicator of Compromise) querying, through listing all known indicators in files, url connections, IP addressess, registries, etc. to find if such indicator existing in the system. If found, it means the system has been infected by malware. This method is very simple and faster, very easy to implement and use. But the challange is that it can only detect malware if the indicator of compromise is available for query.
- By AI/ML/DL methods, training an AL model with very large number of malware files and then apply the learnt model to detect existing or new malware variants. The training data can be malware file binary static code, or malware behavior patterns. This method is very efficient in detecting new malware variants, if the malware writer re-use existing malware techniques. However, if a complete new fresh malware, never seen before and use new techniques, this method is likely going to miss the detection.
- Some other methods, such as integrity checking, runtime behavior monitoring, whitelisting checking, and honney tokens, etc. Those method can definitely increase the overall detection rate, but they are not the main stream for malware detection.
From above list, each method has some pros and cons. When a new method is invented, it definitely adds additional capability into the overall detection solution. But, any single method is not good enough to cover all kinds of malwares. TXShield has included all above methods built-in, to reach the highest possible detection rate.