HTTP Retriever Data Source
The HTTP Retriever data source allows Mango to collect data from any web page or HTTP endpoint accessible on the internet or an intranet. It works similarly to a web browser: Mango makes an HTTP request to a configured URL at each poll interval, receives the response, and then uses regular expressions to extract specific values from the response content. This is a polling data source that supports both HTTP and HTTPS connections.
This data source is commonly used to integrate Mango with REST APIs, web-based sensor gateways, weather services, equipment web interfaces, and any system that exposes data over HTTP. It is particularly useful when no dedicated protocol module exists for a device but the device provides a web interface or API.
Overview
| Property | Value |
|---|---|
| Module | mangoAutomation-HttpRetriever |
| Protocol | HTTP/HTTPS |
| Direction | Polling |
| Typical Use | Scraping data from web pages and REST APIs |
Prerequisites
- An HTTP or HTTPS endpoint that returns data in a parseable text format (HTML, JSON, XML, plain text, CSV, etc.).
- The URL of the endpoint, including any required query string parameters.
- Knowledge of the response format so that appropriate regular expressions can be crafted to extract values.
- If the endpoint requires authentication, the necessary credentials or API keys.
- For HTTPS endpoints: understanding of any TLS/SSL certificate requirements (self-signed certificates, custom CAs, etc.).
Configuration
Data Source Settings
| Setting | Description |
|---|---|
| Name | A descriptive name for the data source. |
| Update period | How often Mango sends an HTTP request to the URL and updates point values. |
| URL | The full URL of the resource to retrieve, including the scheme (http:// or https://), hostname, path, and any query parameters. |
| Quantize | When checked, delays the data source startup so that polls align to even time boundaries (e.g., polling every 10 seconds occurs at :00, :10, :20, etc.). When unchecked, polling begins immediately. |
| Timeout | Milliseconds to wait for the web server to return a response before considering the request failed. |
| Retries | Number of times to retry a failed request. If all retries are exhausted, a data source exception event is raised. |
TLS/SSL Settings (HTTPS)
When the URL uses HTTPS, additional TLS/SSL settings are available:
| Setting | Description |
|---|---|
| Verify server certificate | When checked, the HTTP client verifies that the server's certificate was signed by a trusted root CA and is not expired. |
| Verify hostname | When checked, the HTTP client verifies that the server's certificate matches the hostname in the URL. |
| Use PKI CA certificate | When checked, uses the CA certificate from Mango's built-in PKI services as the trusted root. Configure PKI services in the Mango system settings. |
| Trusted certificate | A PEM-formatted X.509 certificate to use as the trusted root for verifying the server. If left blank and PKI is not enabled, the HTTP client trusts all public root CA certificates from the Java trust store. |
These settings allow Mango to connect to endpoints with self-signed certificates or private CAs by providing the appropriate trusted certificate. For production environments, it is recommended to keep certificate and hostname verification enabled.
Data Point Configuration
Each data point extracts a value from the HTTP response content using regular expressions.
| Setting | Description |
|---|---|
| Data type | The Mango data type for this point (Binary, Numeric, Alphanumeric, Multistate). The extracted text is automatically converted to this type. |
| Value RegEx | A regular expression applied to the response content to locate the point's value. Use capture groups (parentheses) to isolate the desired portion. |
| Value capture group | The regex group number to extract. Group 0 is the entire match; group 1 is the first set of parentheses, etc. |
| Ignore if missing | When checked, a missing value on the page does not raise a data source exception event -- the point is simply not updated. When unchecked, a missing value is treated as an error. |
Data Type-Specific Settings
| Data Type | Additional Setting | Description |
|---|---|---|
| Binary | Binary 0 value | The text value that maps to false (0). If the extracted value matches this text, the point is set to 0; otherwise it is set to 1. |
| Numeric | Number format | A Java DecimalFormat pattern that describes how the extracted text should be parsed as a number. Useful for values with locale-specific formatting (e.g., commas as decimal separators). |
Time Override
| Setting | Description |
|---|---|
| Time RegEx | A regular expression to locate the value's timestamp in the response content. If defined, a time format must also be provided. |
| Time format | A Java SimpleDateFormat pattern describing how the extracted time text should be parsed. |
| Time capture group | The regex group number to extract for the timestamp. |
When a time override is configured, the extracted timestamp is used as the point value's time instead of the poll time. This is useful when the HTTP response includes timestamps indicating when values were actually measured.
Testing Regular Expressions
Click the validate value button on the data point configuration to test your regular expression against the current content at the configured URL. This is useful for verifying that the regex matches the expected portion of the response before enabling the data source.
Common Patterns
Extracting Values from a REST API
Many modern devices and services expose JSON REST APIs. To extract a numeric value from a JSON response like {"temperature": 72.5, "humidity": 45.2}:
- Value RegEx:
"temperature"\s*:\s*([0-9.]+) - Value capture group: 1
- Data type: Numeric
This captures the numeric value after the "temperature" key. Create a separate data point for each value to extract.
Scraping an HTML Web Page
For equipment that provides a web interface but no API, the HTTP Retriever can parse HTML. For example, to extract a value displayed as <span id="temp">72.5</span>:
- Value RegEx:
<span id="temp">([^<]+)</span> - Value capture group: 1
- Data type: Numeric
Monitoring a Weather API
Configure the URL to point to a public weather API endpoint. Set the update period to match the API's refresh rate (often 5-15 minutes for free tiers). Extract temperature, humidity, wind speed, and other values using separate data points, each with its own regex tailored to the response format.
Polling with Authentication
For endpoints that require authentication:
- Basic/Digest auth: Include credentials in the URL:
https://user:password@host/path - API key: Include the key as a query parameter in the URL:
https://api.example.com/data?key=YOUR_KEY - Bearer token or custom headers: The HTTP Retriever data source supports static headers for including Authorization headers or other custom headers with each request.
Using Time Overrides
When the HTTP response includes a timestamp for each value, use the time regex to extract it. For example, if the response contains "timestamp": "2026-02-16T14:30:00Z":
- Time RegEx:
"timestamp"\s*:\s*"([^"]+)" - Time capture group: 1
- Time format:
yyyy-MM-dd'T'HH:mm:ss'Z'
This records the point value with the server's timestamp rather than the Mango poll time.
Troubleshooting
Connection Failures
- URL incorrect -- verify the URL is accessible from the Mango server. Test by opening it in a browser on the same machine, or use
curlfrom the command line. - Timeout too short -- if the remote server is slow, increase the timeout value. Some APIs take several seconds to respond.
- DNS resolution failure -- ensure the Mango server can resolve the hostname in the URL. Check DNS configuration.
- Firewall blocking -- verify that outbound HTTP/HTTPS traffic (ports 80/443) is allowed from the Mango server.
- SSL/TLS errors -- for HTTPS endpoints with self-signed certificates, provide the server's CA certificate in the trusted certificate field, or disable certificate verification for testing.
No Values Extracted
- Regex does not match -- use the validate value button to test the regex against the actual response. Common issues include unexpected whitespace, HTML entities, or response format changes.
- Wrong capture group -- verify the capture group number matches the parenthesized portion of the regex that contains the desired value.
- Response format changed -- if the remote endpoint was updated, the response format may have changed. Check the raw response content and update the regex.
- Encoding issues -- non-ASCII characters in the response may affect regex matching. Ensure the regex accounts for the response encoding.
Wrong or Unexpected Values
- Data type mismatch -- if the extracted text cannot be converted to the configured data type, the point will not update. Ensure the regex extracts only the value portion (e.g., "72.5" not "Temperature: 72.5").
- Number format needed -- if numeric values use locale-specific formatting (e.g., "1.234,56" with comma as decimal separator), configure the number format accordingly.
- Binary 0 value mismatch -- for binary points, verify the binary 0 value matches exactly what the regex extracts (case-sensitive).
Performance Concerns
- Poll period vs. API rate limits -- many APIs enforce rate limits. If the update period is too frequent, the API may reject requests or return errors. Match the poll period to the API's allowed rate.
- Large response pages -- if the HTTP response is very large (e.g., a full web page), the regex evaluation may be slow. Consider using a more targeted URL or API endpoint that returns only the needed data.
- Retry storms -- if the endpoint is down and retries are set high, each poll cycle may take a long time. Keep retries to 1-3 for most scenarios.
Related Pages
- Data Sources Overview — General data source and data point concepts
- HTTP Receiver Data Source — Receive pushed data from external systems via HTTP POST
- HTTP Image Data Source — Retrieve and store images from HTTP endpoints
- Data Source Performance — Tuning poll intervals for HTTP-based data collection