Implement a simple antivirus engine and write a short paper explaining and evaluating your design. Deliverables for this assignment include:
Two sets of test files are provided: files in directories "Detect" and "Do-not-detect". Engine needs to be a "blacklisting" solution that can detect files in "Detect". All files are Win32 PE executables. The files are not malicious -- they are test files created for this assignment -- and they are all very similar. However, files in "Detect" possess certain code or characteristics that can be used to tell them apart from the other files. The student is advised to start the assignment by reverse engineering the files to figure out what kinds of things separate the two sets of files.
You can use whatever scanning methodology or technique you choose. However, please note that since none of the test files are malicious, at least dynamic heuristics is not really a viable option here. You should start the assignment by reverse engineering the files in the sample set to figure out a proper way to detect them. Solutions like string search or hash search (both combined with file structure parsing to find the optimal scanning area) are advised.
Engine must have a definition database (well, probably just a small flat file) that includes detection signatures, algorithmic detections, or similar detection data. New files can be detected and existing detections can be removed (or whitelisted) using this database. If a student chooses to create a heuristic engine, the database may not contain signatures but instead it controls the heuristics in some way.
There is not need to implement unarchiver, unpacker, or many other typical components of a antivirus engine. However, in addition to the engine itself you must implement some sort of a simple user interface and logic for using the engine for scanning. A simple command line interface is perfectly sufficient. The user has to be able to give a path to the file to scan and the scanner should return a detection name or "clean" if no matching detection was found.
A part of the grading of the assignment is based on the report the student writes about the engine design. The report should describe the engine and also analyze how well the engine could work in real life. The test files are very small amd simple, there are very few of them, and they do not e.g. use runtime packing. Therefore it is possible to detect these files with much simpler mechanisms than what real-life antiviruses use. Also, all files in the test set are PE executables; The engine does not need to be able to scan any other types of files and only PE scanning is considered in grading.
The report should be roughly 2-3 pages of text -- try to keep the report short. It needs to contain the following:
The engine can be implemented with the following programming languages:
Engine can use library calls provided by the language chosen. Also, libraries with appropriate licensing terms can be used for the following tasks (if these techniques are used):
All other code, functionality, and components need to be either implemented by the student or be a standard part of the chosen language.
Engine is demoed to course staff during May: 8.5.2008 or 19.5.2008 between 9:00-17:00. Demo takes place in Maarintalo, classroom Maari-M (Linux classroom). Book a demo time for yourself by using webTopi. Engine needs to run on the system without any extra programs or environments. The student needs to submit the report (in txt or pdf format) and the source files to the course staff using Optima by Sun 4.5.2008 at 24:00. Upload all your files to the YourName/project folder in Optima.
During the demo the student has to show how the engine works and show that it detects only the correct file set. The student is also asked to demonstrate or explain some other aspects of the engine. The demo time is 15 minutes: 10 minutes for the demo itself and 5 minutes for questions from the course staff.
The student is expected to spend roughly 40 to 60 hours on the assignment. Assignments are evaluated and graded using the following criteria:
Total: 72 points