Resource Requirements

13 min

the trufflehog scanner supports concurrency by default, it uses a concurrency value that is equal to the number of cpu cores that you have the detection engine will fully utilize this concurrency, but only some source integrations support concurrency some source integrations that fetch data via apis, such as slack, jira, and confluence may have their throughput limited on the api server side and may not saturate your cpu minimum recommended requirements cpu 4 cores or more memory 16gb or more storage 50gb or more in the system's temporary directory resource calculator to help you estimate the scan time for your specific setup, we’ve created a resource calculator spreadsheet this spreadsheet allows you to input your data size and machine specs to get a rough idea of how long your scans will take it provides four sections to update instructions click the link below to access the resource calculator spreadsheet https //docs google com/spreadsheets/d/1jvqz6fgrgdhsv6htsssjwklte33wwfvnuma8modgi u/edit#gid=0 important before using the calculator, make a copy of the spreadsheet for your own use enter interval (hours) indicate the desired scan completion interval in hours enter size of source being scanned specify the size of the source data that will be scanned in gigabytes (gb) based on the combination of these two factors, the spreadsheet automatically calculates the cpu cores and memory required to complete the scan within the specified interval the color of the cpu cores and memory cells represents the pain/cost of provisioning those resources, from easy to difficult note that this is a rough and rather subjective estimate based on factors like ease of acquiring additional compute resources typical ratios of cpu cores to memory scalability and expandability of existing systems organizational policies/restrictions around provisioning green provisioning these resources should be relatively easy and inexpensive red provisioning these resources may be more difficult and/or more expensive please note remember that these are rough estimates actual scan times may vary depending on several factors, including hardware storage type (ssd vs hdd) cpu architecture cache size software operating system background applications network network bandwidth network latency network congestion other data type (compressed data may be slower) scan configuration (including attachments scanning, user scanning in github, etc) the temporary directory that is used for cloning repositories for scanning can be changed via the $tmpdir env var for linux and darwin/osx dependencies this setup requires specific tools for effective operation git is essential for repository management, while rpm2cpio, binutils, and cpio are necessary for extracting files from rpm and deb package formats git for cloning repositories rpm2cpio to extract content from rpm packages binutils includes the "ar" tool, crucial for extracting contents from deb files cpio a versatile file archiver utility, compatible with various archive formats including rpm and deb installing dependencies on ubuntu to install these dependencies on an ubuntu system, follow these steps open a terminal update your package lists to ensure you get the latest version available $ sudo apt update install the required packages sudo apt install git rpm2cpio binutils cpio