Data Tokenization

The process applied to datasets to protect sensitive information is known as Data Tokenization.

Updated: December 4, 2023

The process applied to datasets to protect sensitive information is known as Data Tokenization. It is most commonly done for the data at rest. Sensitive data is replaced with non-sensitive, stand-in data known as token data which will be in the same format as the original data.

Non-sensitive token data remains in the dataset, while the reference of token to the original sensitive data is often stored securely outside of the system in a token server with data tokenization. The relationship of token data with the original sensitive data can be looked up on the token server when the original sensitive data is needed again which is called a detokenization process.

Vault tokenization and Vaultless tokenization are two types of data tokenization. The techniques of data tokenization are most commonly used in the payment processing industry. It is used by the Payment Card Industry Data Security Standard (PCI-DSS) which requires to protect the sensitive data such as credit card numbers. However, any kind of sensitive data can be protected by using data tokenization.

Data tokenization is used by companies to meet industry security standards, reduce data misuse and improve customer confidence. Securing sensitive information is the most common impact of using data tokenization techniques. It reduces threat vectors and the need for advanced security controls.

Types of data tokenization

Word Tokenization
Sentence Tokenization
Subword Tokenization
Character Tokenization
N-gram Tokenization
Treebank Tokenization
Regular Expression Tokenization
White Space Tokenization
Named Entity Recognition (NER)
Part-of-Speech (POS) Tagging
Hash-based Tokenization
Byte Pair Encoding (BPE)

Data Tokenization

Types of data tokenization

Browse Software Providers

Database

Data Management Software

Database Management Software

Master Data Management Software