-TML logo-

T-110.6220 Special Course in Communications Security

Spring 2008: Malware Analysis and Antivirus Technologies (5 ECTS) P V

Course Assignment: Antivirus Engine


Recent changes to the page

Implement a simple antivirus engine and write a short paper explaining and evaluating your design. Deliverables for this assignment include:

Two sets of test files are provided: files in directories "Detect" and "Do-not-detect". Engine needs to be a "blacklisting" solution that can detect files in "Detect". All files are Win32 PE executables. The files are not malicious -- they are test files created for this assignment -- and they are all very similar. However, files in "Detect" possess certain code or characteristics that can be used to tell them apart from the other files. The student is advised to start the assignment by reverse engineering the files to figure out what kinds of things separate the two sets of files.

You can use whatever scanning methodology or technique you choose. However, please note that since none of the test files are malicious, at least dynamic heuristics is not really a viable option here. You should start the assignment by reverse engineering the files in the sample set to figure out a proper way to detect them. Solutions like string search or hash search (both combined with file structure parsing to find the optimal scanning area) are advised.

Engine must have a definition database (well, probably just a small flat file) that includes detection signatures, algorithmic detections, or similar detection data. New files can be detected and existing detections can be removed (or whitelisted) using this database. If a student chooses to create a heuristic engine, the database may not contain signatures but instead it controls the heuristics in some way.

There is not need to implement unarchiver, unpacker, or many other typical components of a antivirus engine. However, in addition to the engine itself you must implement some sort of a simple user interface and logic for using the engine for scanning. A simple command line interface is perfectly sufficient. The user has to be able to give a path to the file to scan and the scanner should return a detection name or "clean" if no matching detection was found.

A part of the grading of the assignment is based on the report the student writes about the engine design. The report should describe the engine and also analyze how well the engine could work in real life. The test files are very small amd simple, there are very few of them, and they do not e.g. use runtime packing. Therefore it is possible to detect these files with much simpler mechanisms than what real-life antiviruses use. Also, all files in the test set are PE executables; The engine does not need to be able to scan any other types of files and only PE scanning is considered in grading.

The report should be roughly 2-3 pages of text -- try to keep the report short. It needs to contain the following:

  1. Description of the engine architecture. The idea/concept behind the design.
  2. Pros and cons of the design and implementation
  3. How to improve the engine: Future work
  4. Evaluation on the performance (speed, memory usage, disk usage, ...) of the engine. Evaluation of the algorithms chosen.
  5. Evaluation of how prone to false positives the engine would be in real life. Why?
  6. Instructions on how to add detections to the database and how to remove them.

The engine can be implemented with the following programming languages:

Engine can use library calls provided by the language chosen. Also, libraries with appropriate licensing terms can be used for the following tasks (if these techniques are used):

All other code, functionality, and components need to be either implemented by the student or be a standard part of the chosen language.

Engine is demoed to course staff during May: 8.5.2008 or 19.5.2008 between 9:00-17:00. Demo takes place in Maarintalo, classroom Maari-M (Linux classroom). Book a demo time for yourself by using webTopi. Engine needs to run on the system without any extra programs or environments. The student needs to submit the report (in txt or pdf format) and the source files to the course staff using Optima by Sun 4.5.2008 at 24:00. Upload all your files to the YourName/project folder in Optima.

During the demo the student has to show how the engine works and show that it detects only the correct file set. The student is also asked to demonstrate or explain some other aspects of the engine. The demo time is 15 minutes: 10 minutes for the demo itself and 5 minutes for questions from the course staff.

The student is expected to spend roughly 40 to 60 hours on the assignment. Assignments are evaluated and graded using the following criteria:

  1. Detection capability: Engine must report all files in directory "detect" with the correct detection name. [6 points, a single miss means the student gets 0 points for this]
  2. Lack of false positives: Engine must not report files in the directory "do-not-detect". A single false positive deducts all points from this category [6 points]
  3. Level of generic detections: How many detection fingerprints/records does the database contain? There are five files to detect. [Records/Points: 1..2/10, 3/8, 4/5, 5+/0]
  4. Performance considerations. Score is based on the report and the chosen architecture. Since the engine can be implemented with different languages and abstraction levels, the real performance of the engine is not measured. How would the engine perform in real-life and how could it handle hundreds of thousands of unique malware files? Does the size of the database grow too much? How is scanning speed affected by a large number of detections? [10 points]
  5. Adding detections to the database. How easy is it to add detections to the database? [5 points]
  6. Analysis. How well have the student been able to find all the pros and cons of his/her design and implementation? [10 points]
  7. Future work. How well has the student presented the roadmap of things to improve in the engine? [5 pts]
  8. Overall quality of the architecture, design, and implementation. [15 pts]
  9. Quality of the demo. How well did the student present the engine? Were all important aspects covered? Was the engine stable during the demo? Was the student able to get his/her engine running immediately? [5pts]

Total: 72 points