Data parsers

From SpinetiX Support Wiki

Jump to: navigation, search

This page is about the data parsers used within data-driven widgets. See also Data filters page.

Introduction

The information retrieved by a data-driven widget from a data source is parsed and arranged in a table-like format, with columns and rows, using the built-in data parsers. Most of the data-driven widgets are dedicated to a certain data source type and thus locked to a certain parser, except for the data feed widgets (available only to Elementi X users) which allow selecting any of the data parsers detailed below from Data Properties dialog.

Developers can also use jSignage parser functions or getURL() function along with low-level parsing functions.

CSV parser

"CSV Parser" tab of "Data Feed Properties" dialog in Elementi 2015

A spreadsheet file can be parsed into a table-like format, with columns and rows, using the CSV parser. Once selected, the "CSV Parser" tab is added within the "Data Feed Properties" dialog, offering the following parameters:

  • Separator
    Select the column separator used within the CSV data, from the following: comma (,), semicolon (;), pipe (|) and tab.
  • CSV fields are not quoted
    Enable this when the data fields inside the CSV file are not quoted. When not checked, automatic quote detection is done.
  • Specify column headers
    Enable this when the spreadsheet file does not have column headers (i.e. the first row contains the name of the column), and enter the column names in the table below.
Note Note:
Make sure that the CSV spreadsheet file is encoded using UTF-8 for proper display of the output data - otherwise, non-Latin characters would not display correctly.
For more details about the CSV parser parameters detailed above, see $.parseCSV() and parseCSV2() functions.

Date/Time parser

The Date/Time parser is available only under the Format option, and can be used to parse the input data and format it as a date. Once selected for a particular entry column, a "Date/Time Parser" tab is added within the "Data Feed Properties" dialog, offering the possibility to select which date format to use to parse the entry column, from the following:

"Date/Time Parser" tab of "Data Feed Properties" dialog in Elementi 2015 X
  • Unix
    Select this when the input contains the number of seconds since Jan 1st, 1970 UTC, which is the standard Unix time representation.
  • Javascript
    Select this when the input contains the number of milliseconds since Jan 1st, 1970 UTC, as returned by the JavaScript Date.prototype.getTime() function.
  • RFC822
    Select this when the input contains a human readable date format, as described by the RFC-822 protocol (e.g., "Mon, 25 Dec 1995 13:30:00 GMT").
  • ISO-8601
    Select this when the input contains an ISO 8601 date (e.g., "2011-10-10T14:48:00").
  • Custom
    Select this when the input contains a custom date described with Unicode CLDR date-time patterns (e.g., "dd/MM/yyyy HH:mm", "MM-dd-yy h:mm a" etc.).

Directory listing parser

The directory listing parser retrieves a list of the files and / or directories present within a given location, according to the specified filter. It can only be used as a main parser.

"Directory Listing Parser" tab of "Data Feed Properties" dialog in Elementi 2015

When configuring the data source property of any of the widgets above, a "Directory Listing Parser" tab is displayed within the "Data Feed Properties" dialog, which allows configuring the following properties:

  • Show hidden files
    Enable this to list all files (including the ones starting with a dot).
  • Resource type
    Specify the type of resource, from: "All", "Files", "Directories".
  • Filter
    Specify the file extension filter to be applied on the result set. Either select an existing filter or enter a custom one (you can use multiple extensions, separated by semicolons).
    For instance, to list both video and audio files, the filter can look like this: *.mp4;*.avi;*.wmv;*.mpg;*.mp3;*.wav;*.wma;*.m4a;*.aac

For each item of the result set, it generates the following output columns:

  • creationdate: Creation date of the file / folder.
  • filename: Name of the file / directory (without the path).
  • getcontentlength: Size of the file in bytes or "null" for directories.
  • getetag: Unique identifier for the file (can be used to check if the file has been modified) or "null" for directories.
  • getlastmodified: Last modification date for the file / folder.
  • href: Path to the file (can be used to display the media file).
  • resourcetype: Resource type - "collection" or "null" (for files)
Note Note:
Only the files from the given folder are retrieved, but not those from the inner folders.
For more details about the parser parameters detailed above, see the propFindURL() function.

ICS parser

A calendar data source can be parsed into a table-like format, with columns and rows, using the ICS parser. Once selected, the "ICS Parser" tab is added within the "Data Properties" dialog, offering the possibility to filter the events retrieved from the calendar using the following parameters:

"ICS Parser" tab of "Data Feed Properties" dialog in Elementi 2015
  • From
    Specify the starting date and time for the data to be displayed. No events before this date will be included in the final data set. The following options are possible:
    • Date
      Specify an absolute date / time for filtering the data.
    • Relative
      Select a date relative to the moment when the calendar data is parsed, such as: "Now", "Today", "Yesterday", "Current Month", "Last Year" etc.
    • Custom
      Specify a starting date using a custom date string.
  • To
    Specify the ending date / time for the data to be displayed. No events after this date will be included in the final data set. The following options are possible:
    • Date
      Specify an absolute end date / time.
    • Relative
      Select a date relative to the moment when the data is parsed, such as: "Now", "Today", "Tomorrow", "Current Month", "Next Year" etc.
    • Duration
      Specify a duration relative to the starting time specified under "From".
    • Custom
      Enter an ending date using a custom date string.
    • Indefinite
      Select this when the end date doesn't matter.
Note Note:
To insert a line break in the event description of an external ICS calendar item, use the line feed character ("\n").

JSON parser

JSON (JavaScript Object Notation) is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs.

A JSON source can be parsed into a table-like format, with columns and rows, using the JSON parser. Once selected, a "JSON Parser" tab is added within the "Data Feed Properties" dialog, offering the following parameters:

"JSON Parser" tab of "Data Feed Properties" dialog in Elementi 2015 X
  • Path
    Enter a path (as standard JavaScript syntax) into the object to identify the location of the element containing the useful rows of data. For instance, you can use:
    • main.rows to select the rows array inside the main object,
    • main.1.rows to select the rows array inside the second element of the main object.
  • Specify output
    Enable this to create a custom mapping between the JSON elements selected under "Path" and the output columns, by specifying:
    • Name -> enter the name of the new column; (this name will be used as data placeholder, i.e. [[column_name]])
    • Path -> enter the path (as above) within the selected element.
For more details about the JSON parser, see $.parseJSON() function.

Customization example

See also this tutorial about how to display a JSON data feed.

Let's say that we would like to retrieve the transport data for Lausanne provided by Swiss public transport API, which looks like this:

{
  "station": { ... },
  "stationboard": [
    {
      "stop": {
        "station": { ... }, "departure": "2015-10-16T19:11:00+0200", ...
        "platform": "6", "prognosis": { ... }, "location": { ... }
      },
      "name": "RE 3235",  ...
      "operator": "SBB", "to": "Vevey",
      "passList": [ ... ]
    },  ... 
  ]
}
Custom JSON parser configuration

For that, follow these steps:

  1. Open the "Data Feed Properties" dialog.
  2. Set the URI to http://transport.opendata.ch/v1/stationboard?station=Lausanne&limit=10.
  3. Set the "Parser" option to "JSON".
  4. Click on the "JSON Parser" tab.
  5. Set the "Path" option to "stationboard".
  6. Enable "Specify output" option and, within the table, enter the following lines:
    1. Name: "departure" & Path: "stop.departure".
    2. Name: "name" & Attribute: "name".
    3. Name: "operator" & Attribute: "operator".
    4. Name: "to" & Attribute: "to".

Additionally, a string filter could be set on the "operator" column, so that only the trains are retrieved (i.e. operator equals SBB).

Query String parser

"Query String Parser" tab of "Data Feed Properties" dialog in Elementi 2015 X

See also Query String source type.

A query string type of data (e.g., field1=value1&field2=value2&field3=value3&...) can be parsed into a table-like format, with columns and rows, using the Query String parser. Although such parsing can be done with the RegExp parser, the Query String parser also performs a URL decoding in the process.

Once "Query String" is selected as main parser or formatting parser, a "Query String Parser" tab is added within the "Data Feed Properties" dialog, offering the following parameters:

  • Support tabular data
    Enable this when the query string contains indexed keys (e.g., key[0]=v0&key[1]=v1&...) that should generate rows under the "key" column, rather than one row with multiple columns (e.g., key[0], key[1]).
  • Specify output
    This option is not taken into account (for now).

RegExp text parser

The built-in text parser uses regular expressions (RegExp) to parse the text file content into rows and columns of data - the default behavior is to put each line of the input text file into a new row having a single column called "title".

When configuring the data source property of any of the text file widgets above, a "RegExp Parser" tab is displayed within the "Data Feed Properties" dialog, which allows configuring the following properties:

"RegExp Parser" tab of "Data Feed Properties" dialog in Elementi 2015 X
  • Split
    Enable this to be able to enter a string or a regular expression as separator for splitting the input text into rows. The separator itself is not included in the output data.
    Default value (when enabled): [\r\n]+
    This is a regular expression that splits the original text using the end-of-line characters, thus each line from the text file becomes a row in the results set.
    Examples:
    • Use "\s+" to split the text input after every space character (e.g. every word is put on a new row).
    • Use ";" to split the text input into rows after each semicolon character.
  • Match
    Enter a regular expression to be used to generate the output data; the matched text is added into the output data, while the rest is discarded.
    Default value: (.*)
    This is a regular expression that will map each input line to a new row (because in JavaScript, ".*" matches all the characters except the new-line (\n) character).
    Examples:
    • Use ".{10}" to have each match of 10 characters (excepting the new-line character) assigned to a row of data.
    • Use "[\s\S]{50}" to have each match of 50 characters (including the new-line character) assigned to a row of data.
  • Specify output
    Allows to create different output columns based on the match groups (i.e. round brackets pairs) used within the regular expression (entered under "Match" option).
    When enabled, you can specify inside the mapping table the following:
    • Name -> enter the name of the new column; (this name will be used as data placeholder, i.e. [[column_name]])
    • Match # -> enter the index of the numbered capturing group starting from 1. (index 0 refers to the entire match)
Note Notes:
  • The RegExp engine tries to use the match expression multiple times and, in case of success, each match creates a new row of data.
  • If the regular expression is empty, then the entire content is put on a single row of output data.
  • If the regular expression contains match groups (i.e. round brackets pairs), then each match group can be assigned to a new column under the "Specify output" option.
  • If the "Specify output" option is not enabled, then the results are put in a single column named "title".
  • Character encoding is inferred from the mime type string returned by the web server, or from the byte order mark if there is one at the beginning of the file. The default encoding, and most recommended for proper formatting of the output data, is UTF-8.
For more details about this parser, see $.parseTXT(), RegExp.exec(), and String.split() functions.

Customization example

Let's say that we would like customize a text file widget so that the first word of each line is displayed using a different styling (e.g., color, font etc.). That means that we need to split the input text into lines (rows) and to use a matching expression that separates the first word from the rest. For that, follow these steps:

Custom RegExp configuration
  1. Open the "Data Feed Properties" dialog and click on the "RegExp Parser" tab.
  2. Enable the "Spit" option (leave the default value).
  3. Change the "Match" expression to "(\S*)\s(.*)". You can also use "(\w*)\W(.*)".
  4. Enable the "Specify output" option.
  5. Add three entries to the table:
    • "sentence" (as match #0, containing the entire line),
    • "first_word" (as match #1, containing the first word),
    • "next_words" (as match #2, containing the rest of the words).
  6. Click on the "OK" button to save the changes.
  7. Next modify the "Text template" property of the widget to incorporate the data placeholders specified above:
    1. Click on the "Edit Text" button (it opens the "Edit Text" dialog) to edit the text template.
    2. Change the text content to something like this:
      [[first_word]] [[next_words]]
    3. Select only "[[first_word]]" and change its font style to bold and its color to blue.
    4. Click on the "OK" button to apply the changes.
  8. Click on the "Save" button on the toolbar.

RSS parser

"Source" tab of "Data Properties" dialog in Elementi 2016 X

An RSS feed can be parsed into a table-like format, with columns and rows, using the RSS parser.

The RSS parser automatically detects whether the data is formatted according to RSS, RSS2.0 or Atom. It also cleans up malformed XML documents, fixes incorrect character entities and encoding errors, strips HTML tags from the fields like title and description, and extracts the URI for the media attached to the news items.

Once selected, the "RSS Parser" tab is added within the "Data Feed Properties" dialog, offering the following parameter:

  • Channel
    Enable this when the RSS feeds contains multiple channels, and specify which channel to be used. If not set, the first channel is used.
For more details about the RSS parser, see $.parseRSS() and parseRSS() functions.

Script parser

The Script parser can be used whenever any other parser cannot be applied over the input data and / or additional transformations need to be performed.

Once selected, a "Script Parser" tab is added within the "Data Feed Properties" dialog, offering the following parameters:

"Script Parser" tab of "Data Feed Properties" dialog in Elementi 2015 X
  • External file
    Select this option if the code of the parsing function is present within an external file.
    • URI
      Enter the name of the JavaScript file containing the custom parser. If the JS file is not in the same place as the widget, include the path as well.
      Default value: "custom.js".
    • Function
      Specify the name of the function, implemented in the JS file specified above, which needs to be used to process the input data.
      Default value: "parse".
  • Inline function
    Enable this option and enter the parsing function code directly in the dialog.

Parsing function

The parsing function is called differently depending on how the Script parser is used:

  • as main parser → the function is executed once and the raw data is passed as parameter.
  • as formatting parser on a field → the function is executed for each row of data from the resulted dataset and the value of that field is passed as parameter.
    Note Note: If another field value is needed, the entire row of data can be retrieved from the arguments[1] object.

The value returned by the parsing function can be one of the following types:

  • String
    The string value (e.g., 'foo') is assigned to the title column when used as the main parser, or it will replace the value of input column value when formatting an existing column.
  • Plain Object
    When returning an object (e.g., { col1: 'foo', col2: 'bar', col3: 5, ... }), each value (e.g., 'foo', 'bar' etc.) is assigned to the column matching the key (e.g., col1, col2 etc.). If these columns don't exist in the result set, they are created.
    Returning an empty object (i.e. { }) means that there's no change to perform.
  • Array
    When returning an array (e.g., [ v1, v2, ... ]), each value of the array generates a new row of data. The values included into the array can be of type String (e.g., [ 'foo', 'bar' ]) or Plain Object (e.g., [ { col1: 'foo', col2: 'bar', col3: 5 } , { col1: 'baz', col2: 'qux', col3: 15 } ]).

Example 1

Check the input data to see if this is a fruit or a vegetable, and create a new column called type within the result set.

function parse( data ) {
  if ( data=="Bananas" || data=="Oranges" )
    return { type: "Fruits"};
  if ( data=="Carrots" || data=="Potatoes" )
    return { type: "Vegetable"};
  return { type: "I cannot eat this" };
}
Note Note:
This kind of structure can be used to replace the switch template used in HMD. See also how to create a switch widget tutorial.

Example 2

Split the input data lines (i.e. the new-line character) into multiple rows, each having 3 columns.

function parse( data ) {
  var result = [];
  var lines = data.split( '\n' );
  for (var i=0; i<lines.length; i+=3 )
    result.push( { l1: lines[i], l2: lines[i+1], l3: lines[i+2]	} );
  return result;
}

Example 3

Convert the input data from a UNIX / JavaScript timestamp into a date.

function dateFromUnixTimestamp( data ) {
    return new Date( data * 1000 );
}

function dateFromJSTimestamp( data ) {
    return new Date( parseFloat( data ) );
}
Note Note:
Other custom parsing functions returning a date can be found within dateParser.js file.

XML parser

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable. It is a textual data format with strong support via Unicode for different human languages. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures such as those used in web services.

XML Parser tab - default configuration
XML Parser tab - using selectors

An XML source can be parsed into a table-like format, with columns and rows, using the XML parser. Once selected, an "XML Parser" tab is added within the "Data Properties" dialog, offering the following parameters:

  • Selector
    Enter a non-empty CSS3 selector to identify the data rows. For instance, you can use:
    • item to select all the elements of type "item" (i.e. <item>) from here;
    • row > title to select elements of type "title" and child of a "row" element;
  • Keep XML markup in columns
    Enable this to preserve the XML markup within the CDATA content inside the retrieved data.
  • Specify output
    Enable this to create a custom mapping between the XML elements selected under "Selector" and the output columns, by specifying:
    • Name -> enter the name of the new column, which can then be used as data placeholder (e.g., [[column_name]]);
    • Selector -> enter the selector (same syntax as above) that retrieves the child element.
    • Attribute -> enter the attribute that gives the output data value. It can be empty.

Customization example

Let's say that we would like to retrieve the currency data from the XML daily exchange rate provided by European Central Bank:

<gesmes:Envelope ... >
	...
	<Cube>
		<Cube time='2015-10-14'>
			<Cube currency='USD' rate='1.1410'/>
			<Cube currency='JPY' rate='136.48'/>
			...
		</Cube>
	</Cube>
</gesmes:Envelope>
Custom XML parser configuration

For that, follow these steps:

  1. Open the "Data Feed Properties" dialog.
  2. Set the URI to http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml.
  3. Set the "Parser" option to "XML".
  4. Click on the "XML Parser" tab.
  5. Set the "Selector" option to "Cube > Cube > Cube".
  6. Enable "Specify output" option and, within the table, enter the following two lines:
    • Name: "currency" & Attribute: "currency".
    • Name: "rate" & Attribute: "rate".

The custom XML parser configuration assures that the third level Cube element (the one containing the currency data) is selected and that two columns are created containing the currency and the rate values retrieved from the child element attributes.

For more details about the XML parser, see $.parseXML() and parseXML() functions.
This page was last modified on 16 March 2022, at 17:58.