---
title: Data flow
slug: data-flow
docTags: 
createdAt: 2025-07-29T02:38:25.247Z
---

## Overview

The FoundSecret API is the primary mechanism through which TruffleHog scanners communicate discovered secrets to your customer-isolated API. This document explains what data is transmitted from your environment to help you understand the data flow and privacy implications.

## Data Structure

When a secret is discovered, the scanner sends a `FoundSecretRequest` containing the following information:

### Core Secret Information

- **Secret Type**: The type of secret detected (e.g., AWS, GitHub, Slack, etc.)
- **Redacted Secret**: A key ID when possible, otherwise a safely redacted version of the secret intended to help you locate it (maximum 20 characters)
- **Verification Status**: Whether the secret was successfully verified as live
- **Cryptographic Fingerprint**: A secure fingerprint of the secret used for deduplication
- **Verification Error**: Any error messages from the verification process

### Source Context Information

- **Source Type**: The type of source being scanned (e.g., GitHub, GitLab, S3, etc.)
- **Source Name**: The configured name of the source
- **Source Metadata**: Contextual information about where the secret was found, including:
  - File path and line number
  - Repository information (for Git-based sources)
  - Commit hash and timestamp
  - User/author information
  - Build information (for CI/CD sources)
  - Container/image details (for container registries)

### Additional Metadata

- **Decoder Type**: The type of decoder used to extract the secret
- **Extra Data**: Additional structured data about the secret (if applicable)
- **False Positive Information**: Whether the secret was flagged as a potential false positive

## Cryptographic Fingerprinting

TruffleHog uses cryptographic fingerprints to identify and deduplicate secrets without storing the actual secret values. The system generates two types of fingerprints:

1. **Primary Fingerprint**: Used for basic deduplication
2. **Enhanced Fingerprint**: Provides additional uniqueness for better deduplication

These fingerprints ensure that identical secrets produce the same fingerprint while making it computationally infeasible to reverse-engineer the original secret.

## Data Privacy and Security

### What is NOT Transmitted

- **Raw Secret Values**: The actual secret values are never transmitted to the API
- **Full File Contents**: Only the specific secret and minimal surrounding context is sent
- **Personal Information**: Sensitive personal data is redacted or excluded

### What IS Transmitted

- **Redacted Secrets**: Key IDs or safely redacted versions to help you locate the secret (e.g., "sk\_live\_1234...")
- **Contextual Information**: File paths, line numbers, repository names, etc.
- **Verification Results**: Whether the secret is live and any verification errors
- **Metadata**: Source information, timestamps, and technical details

## Source-Specific Data Examples

### GitHub/GitLab Sources

- Repository name and URL
- Commit hash and timestamp
- File path and line number
- Author email and username
- Branch information

### S3/Cloud Storage Sources

- Bucket name and region
- File path and name
- Upload timestamp
- Access control information

### CI/CD Sources

- Build number and step
- Pipeline/organization name
- Build timestamp
- Job information

### Container Registry Sources

- Image name and tag
- Layer information
- Registry details
- Creation timestamp

## Data Flow Process

1. **Discovery**: Scanner identifies a potential secret in your environment
2. **Extraction**: The secret is extracted and redacted for display
3. **Fingerprinting**: Cryptographic fingerprints are generated for deduplication
4. **Verification**: The secret is optionally verified to determine if it's live
5. **Transmission**: Contextual data and metadata are sent to the API
6. **Storage**: The API stores the fingerprints and metadata, not the raw secret

## API Response

The API responds with information about:

- Whether this secret has been seen before
- Whether notifications should be prevented
- Whether the secret should be analyzed further
- Any cloud credential information for analysis

## Bandwidth Usage

The FoundSecret API uses gRPC with Protocol Buffers and HTTP/2 compression, resulting in efficient data transmission. Typical request sizes are:

- **Simple secrets**: \~500-800 bytes
- **Secrets with rich metadata**: \~1-2 KB
- **Complex secrets with extensive context**: \~2-4 KB

These estimates include all metadata, contextual information, and fingerprints. The actual bandwidth usage depends on the complexity of the source metadata and the amount of contextual information available.

## Compliance and Privacy

This data flow is designed to:

- **Minimize Data Exposure**: Only necessary metadata is transmitted
- **Enable Deduplication**: Cryptographic fingerprints prevent duplicate alerts
- **Provide Context**: Sufficient information for investigation and remediation
- **Maintain Security**: Raw secrets are never stored or transmitted

## For More Information

For detailed technical specifications of the data structures, refer to the [TruffleHog Protocol Buffer definitions](https://github.com/trufflesecurity/trufflehog/tree/main/proto) in the open-source repository.

## Data Retention and Security

The API retains:

- Cryptographic fingerprints for deduplication
- Source metadata for context and investigation
- Verification status and timestamps
- Location information for remediation

### Security Measures

- **Encryption at Rest**: All stored data is encrypted at rest in the database
- **mTLS Communication**: The API communicates with the storage layer using mutual TLS (mTLS) for secure, authenticated connections
- **No Raw Secrets**: Raw secret values found in your environment are never stored in the API or database

