When using ADF tools to collect files, either by File Properties, Hashes, or Keywords, ADF tools provide three methods for file identification:
- Fast Identification
- Thorough identification for files without extensions
- Thorough identification for all files
When creating your own File Capture it is important to know what each option will work for you.
Fast Identification identifies files by file extension only and is the fastest method by which we identify files. File extensions are the characters following the file name and . (dot). Those file extensions may look like the examples below:
- Dog Picture.jpg
- Dog Article.doc
- Dog Article.docx
This method will not identify files without file extensions or that have the wrong file extension.
Sometimes files do not have file extensions. This can happen with internet cache files, or apps that specifically do not save files with an extension (to prevent user interaction outside of the application). In this case we rely on the file signature, also called the file header or magic number.A file signature is a unique sequence of identifying bytes written to a file's header. On a Windows system, a file signature is normally contained within the first 20 bytes of the file.
When we look at a file, for example, a JPG image file, the device has interpreted it to be an image file by the file extension and displays the image to you. But if we remove the file extension JPG, how does the device know what type of file it is and what to display? The device will revert to the file header and read the first few bytes and describe prior data. These first bytes are specific to the file type.
The next method for identifying files is thorough identification for files without extensions. We will identify all files by file extension. However, if a file does not have a file extension, we will look at the file header for that specific file and identify it. This allows us to retain the speed of identifying files quickly to get more data, such as cache files and files specifically created without extensions. This is also referred to as "speed optimized". Any Search Profile labeled as "speed optimized" has this identification method selected.
Our third option is thorough identification for all files. This method exclusively looks at the file header of each and every file and determines what type of file it is, regardless of the file extension. This is a longer process; however, it will give you mismatched files, or files where the files extension does not match the file header. An example is shown here where all of the image files have the wrong file extension, and are recorded as such in the file properties.
To summarize, fast identification is for file extensions only- we think of this as the fastest, where you can get to the low-hanging fruit. Thorough file identification for files without extensions is "speed optimized" and focuses on the file extension first, if there is no file extension file header. Finally, thorough file identification for all files is the most comprehensive option which allow for search of mismatched files and file header analysis only.