Skip to content

Create and use datasets

Last updated: November 17, 2021

Applies to:

Operations Hub Enterprise

A dataset is a collection of data from across your HubSpot account that can then be used in custom reports. A dataset can include properties for CRM objects and HubSpot assets, along with formulas to calculate your data as needed. For example, you can create a field to calculate annual recurring revenue based on the deal amount property. 

Creating multiple datasets for your teams means that report creators won’t have to select their data sources every time they need to create a report. A dataset can also be updated after creation, making it possible to update all reports using that dataset at the same time.

Below, learn how to create a dataset, how to use the dataset in the report, and the definitions for available functions within the datasets tool.

Create a dataset

To create a dataset:

  • In your HubSpot account, navigate to Reports Datasets.
  • In the upper right, click Create dataset.

You'll then be directed to the Join tab of the dataset builder.

Join

First, select the data sources to include in your dataset. Data sources are the objects, assets, and events that you want to report on. This includes all CRM objects such as contacts or custom objects, and assets such as website pages and emails, conversations, sales activities, and more. You can select up to 5 data sources per dataset.

The primary data source will be the focus of the dataset, with all other data sources relating to that primary source. To connect these data sources, HubSpot joins the data in the background using the shortest possible path. Contacts and deals, for example, are directly related and can be selected with no extra joins needed. 

However, other data sources cannot be associated directly and require additional sources to join the data together. For example, if your primary source is Deals and you want to include blog post data in the report, HubSpot can only link those sources through the Contacts and Web activities sources. These other sources will be selected automatically to join the data.

  • To select the primary data source, click the Primary data source dropdown menu, then select a data source.  

    datasets-join-tab0
  • Continue selecting additional data sources. In the right panel, you can view the relationships between the current selected data sources.
  • As you select sources, view the Preview pane at the bottom of the screen for a preview of your data. 
  • After selecting your data sources, in the upper right, click Next.

Prepare

Select the fields to include in the dataset. You can add existing HubSpot properties to the dataset, as well as custom formula fields.

Add properties

  • To add properties to the dataset, drag and drop the properties from the left sidebar into the Build your dataset section.

2021-10-28_11-28-34 (1)

  • To rename a property or view a preview of its data, click the property under Build your dataset, then enter a new name into the name field on the right. Renaming a property will update its name only in this dataset. This allows you to customize how these fields appear in the report builder, but will not impact the name of the existing property.

Add formula fields

Formula fields are specific to the dataset and can be used to calculate values based on properties in the dataset. Learn more about building formulas using flexible expressions.

  • To create a formula field, click Create formula field

    dataset-create-formula-field0
  • On the right, enter a name for the field.
  • In the Formula field, enter your formula. You can reference properties that you've added to the dataset as well as other HubSpot properties outside of the dataset, and use functions to calculate based on property and literal data. Learn more about formula syntax and definitions below.
    • To manually enter your formula, begin typing in the Formula field. By default, HubSpot will display auto-complete options as you enter text. To turn off auto-complete, clear the Show formula guidance checkbox. 

      dataset-formula-field-auto-complete0
    • To insert a property you've added to the dataset, click the Created field dropdown menu, then select the property.
    • To insert a property that isn't in the dataset, click the HubSpot property dropdown menu, then select the property.
    • To insert a function, click the Function dropdown menu, then select a function.
formula-field-options0
  • As you create formulas, the Formula field will display a validation status. When a formula is invalid, a red status indicator will display and you can click Validation to view error details.
     dataset-formula-field-validation0
  • Once you've set up your fields, click Next in the upper right.

Filter

Refine your data further by adding filters to your fields.

To add a filter:

  • From the left sidebar, click and drag fields into the Inactive filters section.
  • Click a field to view its filter options. Select a filter, then click Apply.
  • You can group filters together by clicking a field, then clicking the Group with another filter dropdown menu. Select another  active filter, then click Apply.

datasets-group-filters0

    • You can change whether the filters are grouped by and or or logic by clicking the logic selector between the filters, then selecting and or or

      datasets-and-or-filter0
    • To ungroup filters, click a filter, then click the Group with another filter dropdown menu. Select None, then click Apply.
  • After setting up your filters, click Next.

Review

Review your dataset before saving it. 

  • Under Sources, view the data sources you’ve selected. 
  • Under Fields, view the fields in the dataset. This includes:
    • Field: the name of the field.
    • Derived: whether the field is a standard HubSpot field or a custom calculated field.
    • Data type: the type of data contained in the field.
    • Input: the field’s expression.
    • Source: the source of the data (e.g., Contacts).
  • In the Preview panel, view the preview of your data. You can click View table relationships to see how the data is connected.
  • After reviewing your data, save the dataset by clicking Save.
  • In the right panel, enter a name and description for the dataset.
  • Click Apply.

You’ll then be brought to the report builder where you can create a report based on your dataset.

View and manage datasets

On the datasets dashboard, you can view and edit existing datasets.

  • In your HubSpot account, navigate to Reports > Datasets.
  • You can filter your existing datasets by using the filters at the top of the table.
  • To edit a dataset, hover over the dataset, then click Edit. You’ll then be brought to the dataset details page.
    • On the Preview tab, view a preview of the dataset’s data. 
    • On the Metadata tab, view the data sources and fields included in the dataset. This includes:
      • Field: the name of the field.
      • Derived: whether the field is a standard HubSpot field or a custom calculated field.
      • Data type: the type of data contained in the field.
      • Input: the string used to bring data into the field.
      • Source: the source of the data (e.g., Contacts).
    • On the Reports tab, view the reports that’ve been created using the dataset.
    • From the datasets details page, you can also create a new report using the dataset by clicking Build report.

Create a report using a dataset

Once a dataset has been created, you can create a report based on the dataset either from the report builder or from the datasets tool.

  • To create a report from within a dataset:
  • To create a report from the custom report builder:
    • In your HubSpot account, navigate to Reports > Reports.
    • In the upper right, click Create custom report.
    • In the left sidebar, select Custom Report Builder.
    • On the right, click the Browse datasets tab to view your existing datasets. 

      browse-datasets-tab0
    • Select the dataset you want to use, then click Next.

Reference

Syntax

Within a function, you can use data from properties and fields or literal data. Property and field data will be dynamic based on the individual data sources, while literal data is constant. For example:

  • 2021-03-05 is a literal date, which is constant.
  • [CONTACT.createdate] is a property-based date, which is dynamic for each contact record.

Functions can include both literal and property/field data, as long as the data types are compatible with the function’s required arguments. For example:

DATEDIFF("MONTH", "2021-01-01", “[CONTACT.createdate]”)

Below, learn about the syntax for literals and property/field data, and how to incorporate them into formulas.

Literal syntax

Use literals to add specific, static strings of text, numbers, true or false values, and dates in your calculations.

  • String literal: text surrounded by quotation marks. For example, “My cool string”.
  • Number literal: numbers without quotation marks. For example, 42.
  • Boolean literal: true or false.
  • Date literal: a string formatted as "YYY-MM-DD" , or a date timestamp number (e.g, 1635715904). 

Property syntax

Property references allow you to directly include values from the properties of your selected data sources. You don't need to add the property as a dataset field to reference it. 

Use the following syntax when referencing properties:

  • Reference expressions are always surrounded by square brackets ( [ and ] ).
  • Property references specify the object or event name, followed by a period and the internal property name. For example:
    • [CONTACT.lifecyclestage]
    • [COMPANY.name]
    • [e_hs_scheduled_email_v2.__hs_event_native_timestamp]

Field reference syntax

You can reference fields in a formula by surrounding the field name with square brackets. For example:

    • [Field 1]
    • [My awesome custom field]

You can reference fields in formulas as long as the formula’s operators and functions accept the field’s data type. For example, if you create a new field that contains a string, you can reference the field in a function that accepts strings:

  • If Field 1 is [DEAL.name], it contains a string value (the deal's name).
  • CONCAT([Field 1], "Q4") would be valid because it contains two string values.
  • CONCAT([DEAL.name], 2012) would not be valid because it contains both a string and a number value.

Operators

You can use operators with literal and property/field values, and operators are evaluated in the standard PEMDAS order of operations. This allows you to nest operators using parentheses. For example:

  • Adding a number with a property reference: 1 + [DEAL.amount]
  • Using parentheses to nest operations: (1 + 2) * (3 + 4)
Operator Description Example usage

+

Add numbers. Returns a number.

1 + 1
= 2

[DEAL.amount] + 100

-

Subtract numbers. Returns a number.

100 - 1
= 99

EXP(1) - EXP(1)
= 0

WEEKNUM([DEAL.closedate]) - WEEKNUM([DEAL.createdate])

*

Multiply numbers. Returns a number.

2 * 2
= 4

POW(10, 2) * -1
= -100

[DEAL.amount] * 0.5

/

Divide numbers. Returns a number.

10/ 2
= 5

[DEAL.amount] / DATEDIFF("DAY", [DEAL.createdate], [DEAL.closedate])

-

Negates a number.

-100
= -100

-[DEAL.amount]

&&

Checks if both of two boolean values are true. Returns a boolean.

true && false
= false

CONTAINS(“HubSpot”, “Hub”) && CONTAINS(“HubSpot”, “Spot”)
= true

CONTAINS(“HubSpot”, “Hub”) && CONTAINS(“HubSpot”, “CRM”)
= false

||

Check if either of two boolean values are true. Returns a boolean.

true || false
= true

CONTAINS(“HubSpot”, “Hub”) || CONTAINS(“HubSpot”, “CRM”)
= true

CONTAINS(“HubSpot”, “Sales”) && CONTAINS(“HubSpot”, “Marketing”)
= false

!

Negatives a boolean value. Returns another boolean value.

!true
= false

==

Equality operator. Returns a boolean value.

==true
= true

Numerical functions

Function Definition Arguments Example usage

ABS

Compute the absolute value of a number. Returns a number.

ABS(number)

number: the number to take the absolute value of.

ABS(-10)
= 10

ABS(10)
= 10

CEIL

Round a decimal value up to the nearest integer. Returns a number.

CEIL(number)

number: the number to take the ceiling of.

CEIL(3.14)
= 4

CEIL(EXP(1))
= 3

CEIL(LN([DEAL.amount]))

DIV0

Divide a number, but return zero when the divisor is zero. Returns a number.

DIV0(dividend, divisor)

dividend: the number to use as a dividend in the division operation.

divisor: the number to use as a divisor in the division operation, with zero resulting in a zero value overall

DIV0(5, 2)
= 2.5

DIV0(5, 0)
= 0

DIV0([DEAL.amount], DATEDIFF("DAY", [DEAL.createdate], [DEAL.closedate]))

EXP

Computer Euler's number raised to a value. Returns a number.

EXP(exponent)

exponent: the exponent to raise Euler’s number to.

EXP(1)
= 2.718281828459045

EXP(0)
= 1

FLOOR

Round a decimal value down to the nearest integer. Returns a number.

FLOOR(number)

number: the exponent to raise Euler’s number to.

FLOOR(3.14)
= 3

CEIL(EXP(1))
= 2

FLOOR(LN([DEAL.amount]))

LN

Compute the natural logarithm of a value. Returns a number.

LN(number)

number: the value to take the natural logarithm of.

LN(1)
= 0

LN(EXP(1))
= 1

LN([DEAL.amount])

LOG

Compute the logarithm of a value within a specified base. Returns a number.

LOG(base, value)

base: the base to be used in the logarithm calculation of the value.

value: the number to take the logarithm of.

LOG(10, 1)
= 0

LOG(10, 10)
= 1

LOG(10, [DEAL.amount])

POWER

Raise a base value to a specified power. Returns a number.

POWER(base, exponent)

base: the number to compute the power of.

exponent: the number to raise the base by.

POWER(2, 10)
= 1024

POWER(100, 0.5)
= 10

POWER([DEAL.hs_arr], 2)

SQRT

Take the square root of a non-negative number. Returns a number.

SQRT(number)

number: the number to take the square root of.

SQRT(100)
= 10

SQRT([DEAL.hs_arr])

WIDTH_BUCKET

Bin numerical values into buckets of equal width. Returns the number of the bucket that the value falls into.

If returned value is below the minimum, returns zero. If returned value is above the maximum, returns the number of buckets +1.

WIDTH_BUCKET(value, minValue, maxValue, bucketCount)

value: the number to compute in the bin number.

minValue: the minimum value to start binning from.

maxValue: the maximum value to bin to.

bucketCount: the desired number of equal-width buckets to bin values into between minValue and maxValue. 

WIDTH_BUCKET(25, 0, 100, 10)
= 3

WIDTH_BUCKET(95, 0, 100, 10)
= 10

WIDTH_BUCKET(-1000, 0, 100, 10)
= 0

WIDTH_BUCKET(9999, 0, 100, 10)
= 11

WIDTH_BUCKET([DEAL.amount], 0, 10000, 1000)

String functions

Function Definition Arguments Example usage

CONTAINS

Determine whether a string contains a substring. Returns a boolean value.

CONTAINS("string", "substring")

string: the string value to test.

substring: the value to check for inside the string.

CONTAINS("HubSpot", "Hub")
= true

CONTAINS("foo", "bar")
= false

CONTAINS([CONTACT.firstname], "Mike")

CONCAT

Concatenate two strings. Returns a string.

CONCAT("string1", "string2")

string1: the string value to which string2 will be appended.

string2: the string value to append to string1.

CONCAT("Hub", "Spot")
= "HubSpot"

CONCAT([CONTACT.firstname], CONCAT(" ", [CONTACT.lastname]))

LENGTH

Compute the length of a string. Returns a number.

LENGTH("string")

string: the string value to calculate the length of.

LENGTH("HubSpot")
= 7

LENGTH([FEEDBACK_SUBMISSION.hs_content])

TRIM

Remove leader and trailing whitespace from a string. Returns a string.

TRIM(" string ")

string: the string value to trim whitespace from.

TRIM(" Cats are great ")
= "Cats are great"

Date functions

Function Definition Arguments Example usage

DATE_FROM_PARTS

Create a date value from year, month, and day parts. Returns a date.

DATE_FROM_PARTS(year, month, day)

year: the year part of the desired date.

month: the month part of the desired date.

day: the day part of the desired date.

DATE_FROM_PARTS(2021, 1, 1)
= 2021-01-01

DATEDIFF

Return the number of time units between the first date value and the second for a specified time unit. Returns a number.

DATEDIFF(“datePart”, “date1”, “date2”)

datePart: the year, quarter, month, week, or day unit to use in the difference calculation. 

date1: the starting date value to subtract from date2.

date2: the ending date value that date1 will be subtracted from.

DATEDIFF("DAY", "2021-01-01", "2021-02-01")
= 31

DATEDIFF("MONTH", "2021-01-01", DATE_FROM_PARTS(2021, 2, 1))
= 1

DATEDIFF("QUARTER", [DEAL.createdate], [DEAL.closedate])

DATEPART

Extract the year, quarter, month, week, or day from a date value. Returns a number.

DATEPART(“datePart”, “date”)

datePart: the year, quarter, month, week, or day unit to extract.

date: the date value to extract a date part from.

DATEPART("DAY", "2021-03-15")
= 15

DATEPART("MONTH", DATE_FROM_PARTS(2021, 3, 15))
= 3

DATEPART("YEAR", [DEAL.createdate])

DATETRUNC

Truncate a date value to its year, quarter, month, week, or day.

DATETRUNC(“datePart”, “date”)

datePart: the year, quarter, month, week, or day unit to truncate to.

date: the date value to truncate.

DATETRUNC("YEAR", DATE_FROM_PARTS(2021, 3, 15))
= 2021-01-01

DATETRUNC("MONTH", "2021-03-15")
= 2021-03-01

DATETRUNC("DAY", [e_visited_page.__hs_event_native_timestamp])

TIMESTAMP_FROM_PARTS

Create a timestamp value from year, month, day, hour, minute, and second parts. Returns a datetime value.

TIMESTAMP_FROM_PARTS(year, month, day)

year: the year part of the desired date.

month: the month part of the desired date.

day: the day part of the desired day. 

TIMESTAMP_FROM_PARTS(2021, 1, 1)
= 2021-01-01

WEEKNUM

Calculate the week number within a year for a date. Returns a number.

WEEKNUM(“date”)

date: the datetime value with which to compute the week number within a year.

WEEKNUM("2021-03-15")
= 11

WEEKNUM(“1609459200”)
= 11

WEEKNUM(“[deal.createdate]”)

NOW

Returns the current time based on your account's timezone. Returns a datetime value.

NOW()

 

NOW()
= 1633611966314