Aaron Lord
- Aug 4, 2020
- 6 min read

Drive security decisions with Data Classification

Updated: Aug 31, 2020

All work requires prioritization and application security is no different. Performing a full security review for every application development puts out will drive you mad. Considering there is, on average, only one application security professional per one-hundred developers, that is an untenable amount of work. How do we prioritize our work when all new code has potential to contain hidden vulnerabilities? It can feel like no matter how much we review and review and review there will always be something dangerous lurking in the code.

What barometers do we use for prioritization of application security reviews? It all comes down to risk and how much we are willing to endure, because you can never lower risk to zero. Having a risk scale for what is important is key for any tech organization. And what is important at the end of the day to Information Security? Data of course.

Data is what drives almost all online business, but it is also what drives most hackers to break into the network and applications. Data is the proverbial crown jewels that every organization needs to be protecting. When you generally look at Defense-in-Depth models, what is always at the center of the “defense?” The Data. But not all data is weighed equally when it comes to value. There should be a massive difference in security protections that go into protecting credit card data versus, say, employee email addresses.

Adding on to previous discussions (here and here) around this topic, let’s dive more deeply into data classification and how it drives prioritization and security controls.

What is data classification?

Data classification is creating categories of data types and weighing their collective risk factor. This can broadly be broken into four categories:

Critical
1. Credit Card numbers
2. Banking information
3. Encryption keys
4. User Credentials (Passwords)
5. Protected health information
6. Other Production Secrets
Highly Confidential / Internal Production
1. Customer Personally Identifiable Information
  1. Emails
  2. Addresses
  3. Phone Numbers
  4. Names
2. Geo-location Data
3. Internal Source Code
4. Product details and intellectual property
5. Company Finance Details
6. Employee Personally Identifiable Information (PII)
7. Tax information
8. Health Benefit Details
9. Employee Payroll Details
Confidential / Other Internal
1. Internal Processes
2. Internal Domain Names
3. Single Pieces of PII with no context
4. Internal Tools Data (Jira, Confluence, etc.)
5. IP Addresses and other networking data and logs
Public
1. Employee Names and Emails
2. Company Address and Phone Numbers
3. Public Domain Names
4. Public Statements

Each of these data points are categorized by their criticality rating and this information will be available throughout your organization.

How do you build one?

First, Information Security needs to identify what types of data the organization requires for their business. Not every organization needs to deal with customer payment information while some only need customer emails. Most organizations need to handle employee data and other corporate info that exists regardless of the core product. Some may even have very unique information needs like Social Security Numbers (SSN), traveler information, geo-location data, and voice-activated data. The key is classifying all of the data types your organization requires to function and get it all centrally recorded somewhere.

Next step is to build your classification categories. Classification categories like the ones written above are a good starting point, but it’s not the same for every organization. Some organizations may have specific requirements based on their business model and already established policies and standards. These are key to knowing exactly how to sort all of the collated data types.

Once you have your classifications and data types laid out, you need to match each data type with one data classification. Each classification can have many data types associated with it and each data type will be associated with one classification (One-to-Many and One-to-One respectively). This can be a challenge if certain data types do not have an obvious risk associated with them. For example, the last four digits of a credit card can be innocuous by itself but when combined with any customer PII it can be used as a way to confirm one’s identity with their account.

In general, data that deals with finances, payments, credentials that give you access to accounts, and the encryption that protects all of those things are the most crucial to protect. Anything else can generally be in a lower tier, but that highly depends on your organization. If you do not deal with payment information but still handle customer PII, then PII would be your most crucial. It is key to know where the “risk ceiling” is for your organization.

Secondly, for data types you're unsure how to classify, ask yourself these questions: ‘Is this data in bulk’ and ‘Is it opaque and have no other context?’ If the data is in bulk, then it is a higher tier, particularly if it’s all stored in one system/database. If the data is opaque or incomplete, for example shipping information that only has customer IDs, five-digit zip codes, and product IDs, then it can be of a lower tier. It is important to know how data is being stored via the architecture, because how much data is in a system and how distributed the data is among other systems can affect how it’s classified.

Match Requirements to Criticality

Now that you have a Data Classification table, what do we do with it? Well this is where the rest of your security review processes comes into place. When you begin a security review for a new application or a new feature, one of the first questions to ask is, “What data types are you handling?” This should now match up to a criticality rating in the Data Classification table. The criticality of the application under review will be the same level as the highest criticality data they are handling.

To enable developers to understand their security requirements and risks without having to consult with AppSec is to document a list that lays out the security requirements that come with building an application at each criticality level. A critical application will have more stringent guidelines when it comes to encryption, authorization, and authentication, while a Highly Confidential or Confidential application will not have as stringent of requirements. In some cases, if an application is only dealing with Public data, you can generally give it a quick sign off and move on to more important security reviews.

Use the OWASP Application Security Verification Standard (ASVS) as a guide to build these security requirements from scratch. Each requirement in the ASVS has a L1 to L3 checkmark that lets you know which requirements are necessary based on the criticality of the application. L1 can be reserved for public applications, L2 for Confidential or Highly Confidential applications, and L3 is reserved for Critical applications. The more you can communicate this to development ahead of time the sooner they begin thinking of these security requirements in their designs.

Data classification is useful for other tech teams as well. DevOps and QA can tailor their pipelines or tools to be run more tests for Critical apps and Architecture teams can use it to ensure that Critical apps can only run in the most protected segments of the network.

Should you just copy another one?

It can be tempting to find an example of a Data Classification guide from another organization and copy it for your own, but this is inadvisable for a couple reasons. First you may not capture all of the data types that your company should be concerned with; conversely you may also classify data types that your company is not concerned with. Second, the categories for Data Classification may not mirror the context for your organization's business model and already existing policies. It will be incongruous to the organization and cause unnecessary confusion. Take the time to chat with people throughout the organization to understand what data drives the company and build those relationships with key stakeholders.

Conclusion

By creating a Data Classification guide for your organization, you can match up the amount of work and attention your security reviews require. This allows you to prioritize appropriately and not waste time on applications that, in the grand scheme of things, do not pose much risk to an organization.

Catalog all of the data your company handles, build your relevant classifications, sort the data types to those classifications, document and communicate this to the technology group, and then consult it upon every new security review to prioritize the work required. Your sanity will thank you.

About the Author

Aaron is an Application Security Engineer with over 10 years of experience. His unorthodox career path has led to many unique insights in the security industry.