Track 2: Scalable Observation infrastructure - Advanced host-based Centralized data store and software pattern identification

The objective of track 2 is to capture online and save in memory (in a Centralized data store made of a specialized data model) this enhanced data plus other types of useful information for detection analysis (such as results from previous detection analyses for the detection of chained cyber attacks for example). The selected data will be saved on disk for offline analysis (forensic investigations, software improvement). The content and structure of the specialized data model (semantics and cause-effect interrelationships) must be optimized to efficiently support the Detection mechanisms in track 3 and the Knowledge base in track 4.

Anomaly detection is a difficult problem for which no single method can achieve in and by itself a sufficient success rate with few false positives. In an earlier project on tracing and monitoring, one of the main problems found was that most modules in the framework often needed the same context and state information and ended up each recreating it in different ways. The objective of this track is therefore to provide a global state data model and storage module for all the host information obtained or deduced from the observations. This is key to fully exploit all the information available for anomaly detection and empower the different anomaly detection and analysis modules to exchange information and build on each other’s strengths.

This centralized data model will represent the state of different components at different levels (table of processes, opened network ports, opened files, states in a finite state machine defining the pattern of a multi-step cyber-attack, processing load, I/O queues). This centralized model will be queried and updated by all the available observation and anomaly detection modules in order to benefit from the combination of the strengths of the different modules. For example, an operating system modeling module builds the table of processes and opened file and network descriptors, while an anomaly pattern checker would use information about opened descriptors to verify the context when a strange behavior happens (e.g. writing to a closed descriptor or attempting to connect to a closed socket). The challenge is to have an efficient data structure and access algorithms for a generic state storage with a history database, working both for a posteriori and for live analysis. The history database allows interactive navigation through the trace while quickly recomputing the modeled state associated with a time position in the trace. It may also be used for specialized queries about the state history of some components, to uncover chains of state values related to a performance bottleneck or to a multi-stage attack leading to a compromised system.

 

Team members

Simon Delisle École Polytechnique de Montréal Master Student
Fabien Reumont-Locke École Polytechnique de Montréal Master Student
François Doray École Polytechnique de Montréal Master Student
Houssem Daoud École Polytechnique de Montréal Intern

 

Documents and presentations

Pages