rejustify Google Sheets add-on

Find, merge and rejustify statistical datasets in real-time, from multiple sources in just a few clicks.
For economists, researchers, data scientists and financial analysts.

Description

The purpose of the rejustify Google Sheets add-on is to provide a user-friendly and accessible tool to use the rejustify data Extract Transform Load (ETL) API. In this respect, the add-on offers two functions: ‘Analyze’ and ‘Find’, which link directly to two rejustify API endpoints with the same names. The add-on offers a click-through accessibility to the endpoints, which are described below in detail.


Access to API is provided through a token. In the development phase the tokens are distributed to our subscribers and testers.

Analyze function

 The purpose of the endpoint is to recognize the structure of the data and the key properties of the dimensions, to know how to reconcile it with a new data set. The structure of the data is communicated through two main elements: direction and header. The columns/rows which are non-empty are analyzed based on the samples of provided values, and they will be used to describe the relevant dimensions of the data. Columns/rows that are empty will be analyzed based on the information provided in the header, and they will be automatically marked as place where the User wants to see the reconciled data.

Figure 1. The Analyze function workflow diagram.
The dimensions are categorized into six main classes, with the corresponding subcategories describing the details behind each class, like features, cleaning procedures and the respective format. Classes are used to determine the default matching rules, so that class ‘geography’ will be by default matched with a dimension of the same class for which the values have the closest proximity to the original data set. The subcategories like features, cleaning rules and format are used as arguments in the merging algorithm.
Figure 2. The main classification methodology diagram.

Request elements

Token (token = (string) … )

Informs the API about the user access rights. In the case of non-premium users, the functionality of the endpoint is not disabled but limited.

Direction (direction = (string) ‘vertical’, ‘horizontal’)

Informs the API whether the data set should be read in by columns (vertical) or by rows (horizontal). If the number of columns is greater than the number of rows, the add-on automatically pre-selects ‘horizontal’. The default is direction = ‘vertical’.

Header (inits = (int) 1, 2, 3, 4, 5)

Informs the API how many initial rows (or columns in horizontal data), correspond to the header description. The default is inits=1.

Separator (sep = (char) ‘,’)

The header can also be described by single field values, separated by a given character separator, for instance ‘GDP, Austria,1999’. The option informs the API which separator should be used to split the initial header string into corresponding dimensions. The default is sep= ‘,’.

Fast analysis (speed = (string) ‘fast’, ‘slow’)

 Informs the API on how big a sample of original data should be. The larger the sample, the more precise but overall slower the algorithm. Under the fast option the API samples 15% of data, under the slow option it is 50%.

Enable data learning (db_allowed = (int) 0, 1)

It is equal to 1 if the user accepts the rejustify to store her/his activity in the database. It should be selected in the settings tab in the add-on. It is equal to 0 if the user doesn’t approve rejustify to track its data activity, but then the learning algorithms will not learn from it. The default is db_allowed = 0.

Data (data = (matrix) [ … ] )

The data matrix as defined by user. Empty columns/rows will be discarded.

Find function

The endpoint offers a one-click tool to download, reshape and reconcile the data sets provided by the User by the relevant external resources, both public and private. In the reconciliation process, there are several rules which guide the underlying algorithm to provide the most intuitive and user-friendly experience:

  • Throughout the process, the convention is to call the data provided by the User by X, and the external resources by Y. In the reconciliation process, only the external data are reshaped. The data provided by the User remains unchanged, except for the empty columns which may be filled with the relevant values.

  • For the moment, the external data come in the raw format, i.e. they are not aggregated or split, to guarantee the consistency of the variables against the original sources.

The core element of the data finding process are the matching keys. If not provided by the user, the endpoint will analyze the data and provide a default key based on the observed characteristics of the data, including the class of the dimension (see endpoint analyze) and the proximity score between the relevant X and Y dimensions. The algorithm will only provide a match if the proximity between the X and Y values is above a given accuracy threshold, specified in the argument min_accuracy.

Figure 3. The Find function diagram.

Options

Token (token = (string) … )

Informs the API about the user access rights. In the case of non-premium users, the functionality of the endpoint is not disabled but limited.

Accuracy threshold (min_accuracy = (double) between 0 and 1)

 Informs the API about the minimum threshold to determine which dimension matches should be deemed as successful/unsuccessful. The default is min_accuracy = 0.75.

Enable data learning (db_allowed = (int) 0, 1)

It is equal to 1 if the user accepts the rejustify to store her/his activity in the database. It should be selected in the settings tab in the add-on. It is equal to 0 if the user doesn’t approve rejustify to track its data activity, but then the learning algorithms will not learn from it. The default is db_allowed = 0.

Data structure (structure = (matrix) [ … ] )

The data structure as reported by the endpoint analyze, possibly modified by the User. Empty columns/rows will be discarded.

Data (data = (matrix) [ … ] )

The data matrix as defined by user. Empty columns/rows will be discarded.

Optional – Matching keys (keys = (matrix) [ … ] )

The data keys which should be used for the data reconciliation. In particular, for each column aimed to be filled with the data, the object should specify which are the columns from the X and Y data sets to be paired with each other and which matching method should be used.