Wednesday, April 18, 2012

Tools in the Toolbox - Triage


Imagine that you are on an Incident Response team. Imagine that your local Incident Response capabilities have adequate staff and a slowly maturing process. Imagine that your company has multiple remote locations. Imagine that in those remote locations your Incident Response capabilities are not as mature and there is a resource shortage. Imagine that there is a language barrier between local and remote staff. Imagine the slow response time due to incorrect files sent for analysis. Imagine the headaches and frustration of trying to troubleshoot a security incident when you do not have physical access to the box and you have to rely on others to collect the data you need. 

This was not something I had to imagine, it was something that I have been living for awhile now. It was a problem that needed to be changed, and with this tool it has allowed our investigations to become more effective. 

This is my story: 

The initial process that we utilized was very ineffective and time consuming. To get the correct usable data would usually require a business day turnaround, unless multiple machines showed the same indicators.  If multiple machines were impacted we still could spend 4 hours waiting on the data before we could analyze, and then only have a small segment of data to analyze.

The following questions were posed to the team:
  1. How can we rapidly get consistent data across every incident from remote locations?
  2. How can we standardize the collection process across all locations, so primary incident handler receive actionable data?
  3. Can we do this with a free toolset?
  4. Can we customize this based on our needs?
  5. Simplified Interface for data collection?


The team discussed some options on remote collection capabilities but based on the way the internal network is structured limited us in usability. During this time I stumbled across a post by Mike Ahrendt about his capstone project called Triage. According to Mike, the functionality of Triage is:
 The script is designed to perform basic triage commands, as well as acquire evidence automatically on the system.  I designed the script to be ran from a flash drive, but you can really run it from anywhere.  All reports and evidence will be collected in the script directory under an Incident folder with a time stamp ("mm-dd-yy Incident").  

The capabilities of the script promising so I downloaded and started working with the script to see if it could meet our needs.  Initial opinion of Triage showed that it had promise, but there were some issues with application. The biggest concerns I had with Triage was the need to run with Administrator credentials, inability to log actions taken, multiple steps to run the program from the GUI, being prompted for the EULA for a Sysinternal tool, and the folder naming convention for an incident.

While my concerns were there, knowing them allows us to still use the tool as long as we understood the behavior of Triage. I submitted a feature enhancement request to Mike on these concerns and he said that when he had time he would look into them.

We started using Triage locally for our Malware Incident Response, making sure that the tool was useful in what we did. We expected that using Triage would shave off some of our local response time and give us some useful data. We were pleasantly surprised that Triage actually trimmed our response time by 80%, and provided more usable data than we had previously acquired and analyzed. I noticed that my first step in analysis with Triage was to hit the Autorun.txt to find suspicious registry entries, previously my first step was attempting to hit all the known autorun locations in a registry dump file. On most incidents I could confirm an infection within 30 minutes; this was down from the normal 2hrs that it usually took. 

With the capabilities of Triage being proven in our decreased response time we started to analyze the capabilities of allowing the remote analyst to use. The process of deploying this tool to remote analyst required us to document the process required to use it. In testing Triage at a few remote sites we decided on a few code improvements that would need to be implemented for wide scale usage.

The Triage code was then enhanced to handle these improvements. The following were changed:

We removed the GUI from the code because we collected the same data and it was an extra couple of steps that were no longer needed. We changed the naming standards on the Incident folder to the following format ComputerName_Date_Incident, this allowed us to run multiple Triage analysis and store the information in a central place, without renaming every incident file as we upload it. Created a Command Line modification to the registry in order to bypass the EULA that is prompted by one of the Sysinternals tools. With the addition of the registry change I implemented a custom cmd.exe that was detailed in the Malware Analyst Cookbook.

  1. As of this posting, our customization to the Triage application has allowed us to do the following:
  2. It has increased response efficiency by standardizing the collection process of files by remote analyst. 
  3. It has decreased incorrect files being sent to us for analysis.
  4. It has decreased the analysis time when looking for suspicious and unknown autoruns.
  5. It allows for a training tool for new malware analysts and what to look at.
  6. It allows for keeping historic snap shots of suspicious machines that we were unable to find IOC’s on.
  7. Response time has dropped to about 1hr with getting files from remote analysts.


Right before I posted this, I gave Mike a heads up, and he got busy and started cleaning up the code and making some changes, so part of my wish list has been implemented.  If you haven’t taken a look at Triage I would highly recommend it.

Next Steps
  • Getting Robocopy to work in XP. I have the exe, just need to test it. 
  • Have Triage execute on machines when our AntiVirus Solution fires off.
  • Compress and move the Incident file off the host machine and onto a network share.
  • Modify Triage to use an INI file for variables, so that customization can happen without recompiling.
  • Create an Error tracking log.
  • Integrate into RegRipper to parse out Registry Settings.
Currently I am testing the next release of Triage, so expect it to be updated within the next week or so. 


2 comments:

  1. thanks for sharing your experience, but i think the cost of the effort spent on creation of the tools may equal to buy commercial tool that can provide you what you need, since creation of tools is not your core business.

    what do you think ? can you share how many hours spent and the cost ?

    Thanks
    Tamer

    ReplyDelete
  2. Tamer,

    To answer your questions:
    I have probably spent 10 hours total testing, tweaking and troubleshooting Triage. What I have learned I have been able to give back to Mike to include in the new releases.
    I am not sure how many hour's Mike has put into this, but as it started out being a project for a college class it probably was 40+.

    We could have gone with a commercial tool, but that requires including it in the budget, and a future expense request, and than rolling it out to all the work stations in multiple locations. So the earliest I would have seen a commercial product in my environment providing the same features as Triage would have been early 2013.

    Triage also provides the benefits of being easily customized. If I find a new tool I want to use I add the commands and it will run. A commercial tool I have to wait for the feature to be added. This allows me to have a feature rich solution with minimal expenses included.

    That being said, I am not aware of any other product out there that will automate all of Sysinternal tools, Collection of AV logs and Quarantine files, Collection of Hive Files, Run RegRipper against the Hive Files, log all Command line interactions and than compress and move the file to a network share. If you know of a tool, I would love to look at it..

    ReplyDelete