How we handle your data

Data flow

19 min

overview the foundsecret api is the primary mechanism through which trufflehog scanners communicate discovered secrets to your customer isolated api this document explains what data is transmitted from your environment to help you understand the data flow and privacy implications data structure when a secret is discovered, the scanner sends a foundsecretrequest containing the following information core secret information secret type the type of secret detected (e g , aws, github, slack, etc ) redacted secret a key id when possible, otherwise a safely redacted version of the secret intended to help you locate it (maximum 20 characters) verification status whether the secret was successfully verified as live cryptographic fingerprint a secure fingerprint of the secret used for deduplication verification error any error messages from the verification process source context information source type the type of source being scanned (e g , github, gitlab, s3, etc ) source name the configured name of the source source metadata contextual information about where the secret was found, including file path and line number repository information (for git based sources) commit hash and timestamp user/author information build information (for ci/cd sources) container/image details (for container registries) additional metadata decoder type the type of decoder used to extract the secret extra data additional structured data about the secret (if applicable) false positive information whether the secret was flagged as a potential false positive cryptographic fingerprinting trufflehog uses cryptographic fingerprints to identify and deduplicate secrets without storing the actual secret values the system generates two types of fingerprints primary fingerprint used for basic deduplication enhanced fingerprint provides additional uniqueness for better deduplication these fingerprints ensure that identical secrets produce the same fingerprint while making it computationally infeasible to reverse engineer the original secret data privacy and security what is not transmitted raw secret values the actual secret values are never transmitted to the api full file contents only the specific secret and minimal surrounding context is sent personal information sensitive personal data is redacted or excluded what is transmitted redacted secrets key ids or safely redacted versions to help you locate the secret (e g , "sk live 1234 ") contextual information file paths, line numbers, repository names, etc verification results whether the secret is live and any verification errors metadata source information, timestamps, and technical details source specific data examples github/gitlab sources repository name and url commit hash and timestamp file path and line number author email and username branch information s3/cloud storage sources bucket name and region file path and name upload timestamp access control information ci/cd sources build number and step pipeline/organization name build timestamp job information container registry sources image name and tag layer information registry details creation timestamp data flow process discovery scanner identifies a potential secret in your environment extraction the secret is extracted and redacted for display fingerprinting cryptographic fingerprints are generated for deduplication verification the secret is optionally verified to determine if it's live transmission contextual data and metadata are sent to the api storage the api stores the fingerprints and metadata, not the raw secret api response the api responds with information about whether this secret has been seen before whether notifications should be prevented whether the secret should be analyzed further any cloud credential information for analysis bandwidth usage the foundsecret api uses grpc with protocol buffers and http/2 compression, resulting in efficient data transmission typical request sizes are simple secrets 500 800 bytes secrets with rich metadata 1 2 kb complex secrets with extensive context 2 4 kb these estimates include all metadata, contextual information, and fingerprints the actual bandwidth usage depends on the complexity of the source metadata and the amount of contextual information available compliance and privacy this data flow is designed to minimize data exposure only necessary metadata is transmitted enable deduplication cryptographic fingerprints prevent duplicate alerts provide context sufficient information for investigation and remediation maintain security raw secrets are never stored or transmitted for more information for detailed technical specifications of the data structures, refer to the trufflehog protocol buffer definitions https //github com/trufflesecurity/trufflehog/tree/main/proto in the open source repository data retention and security the api retains cryptographic fingerprints for deduplication source metadata for context and investigation verification status and timestamps location information for remediation security measures encryption at rest all stored data is encrypted at rest in the database mtls communication the api communicates with the storage layer using mutual tls (mtls) for secure, authenticated connections no raw secrets raw secret values found in your environment are never stored in the api or database