@robinpath/html
0.1.1Node.jsPublicParse, extract, escape, and manipulate HTML content using regex-based processing
HTML
Parse, extract, escape, and manipulate HTML content using regex-based processing
Package: @robinpath/html | Category: Utility | Type: Utility
Authentication
No authentication required. All functions are available immediately.
Use Cases
Use the html module when you need to:
- Remove all HTML tags from a string, returning plain text -- Use
html.stripTagsto perform this operation - Extract the text content of all matching tags by tag name -- Use
html.extractTextto perform this operation - Extract all links (href and text) from anchor tags -- Use
html.extractLinksto perform this operation - Extract all image sources and alt text from img tags -- Use
html.extractImagesto perform this operation - Extract attribute values from all matching tags -- Use
html.getAttributeto perform this operation
Quick Reference
| Function | Description | Returns |
|---|---|---|
stripTags | Remove all HTML tags from a string, returning plain text | Plain text with all HTML tags removed |
extractText | Extract the text content of all matching tags by tag name | Array of text contents from all matching tags |
extractLinks | Extract all links (href and text) from anchor tags | Array of objects with href and text properties |
extractImages | Extract all image sources and alt text from img tags | Array of objects with src and alt properties |
getAttribute | Extract attribute values from all matching tags | Array of attribute values from matching tags |
escape | HTML-escape special characters (&, <, >, ", ') | The escaped string safe for use in HTML |
unescape | Reverse HTML escaping (& < > " ') | The unescaped string with HTML entities converted back |
extractMeta | Extract meta tag name-content pairs from HTML | Object mapping meta tag names to their content values |
getTitle | Extract the text content of the <title> tag | The title text, or null if no <title> tag is found |
extractTables | Extract HTML tables as arrays of rows and cells | Array of tables, each as an array of rows, each row as an array of cell strings |
wrap | Wrap text in an HTML tag with optional attributes | The HTML string with text wrapped in the specified tag |
minify | Minify HTML by removing extra whitespace and newlines between tags | The minified HTML string |
Functions
stripTags
Remove all HTML tags from a string, returning plain text
Module: html | Returns: string -- Plain text with all HTML tags removed
html.stripTags "<p>Hello <b>world</b></p>"
| Parameter | Type | Required | Description |
|---|---|---|---|
htmlString | string | Yes | The HTML string to strip tags from |
extractText
Extract the text content of all matching tags by tag name
Module: html | Returns: array -- Array of text contents from all matching tags
html.extractText "<p>One</p><p>Two</p>" "p"
| Parameter | Type | Required | Description |
|---|---|---|---|
htmlString | string | Yes | The HTML string to search |
tagName | string | Yes | The tag name to match (e.g. "p", "h1", "span") |
extractLinks
Extract all links (href and text) from anchor tags
Module: html | Returns: array -- Array of objects with href and text properties
html.extractLinks "<a href=\"https://example.com\">Example</a>"
| Parameter | Type | Required | Description |
|---|---|---|---|
htmlString | string | Yes | The HTML string to extract links from |
extractImages
Extract all image sources and alt text from img tags
Module: html | Returns: array -- Array of objects with src and alt properties
html.extractImages "<img src=\"photo.jpg\" alt=\"A photo\">"
| Parameter | Type | Required | Description |
|---|---|---|---|
htmlString | string | Yes | The HTML string to extract images from |
getAttribute
Extract attribute values from all matching tags
Module: html | Returns: array -- Array of attribute values from matching tags
html.getAttribute "<div class=\"a\"></div><div class=\"b\"></div>" "div" "class"
| Parameter | Type | Required | Description |
|---|---|---|---|
htmlString | string | Yes | The HTML string to search |
tagName | string | Yes | The tag name to match (e.g. "div", "input") |
attributeName | string | Yes | The attribute name to extract (e.g. "class", "id") |
escape
HTML-escape special characters (&, <, >, ", ')
Module: html | Returns: string -- The escaped string safe for use in HTML
html.escape "<script>alert(1)</script>"
| Parameter | Type | Required | Description |
|---|---|---|---|
string | string | Yes | The string to HTML-escape |
unescape
Reverse HTML escaping (& < > " ')
Module: html | Returns: string -- The unescaped string with HTML entities converted back
html.unescape "<p>Hello</p>"
| Parameter | Type | Required | Description |
|---|---|---|---|
string | string | Yes | The HTML-escaped string to unescape |
extractMeta
Extract meta tag name-content pairs from HTML
Module: html | Returns: object -- Object mapping meta tag names to their content values
html.extractMeta "<meta name=\"description\" content=\"A page\">"
| Parameter | Type | Required | Description |
|---|---|---|---|
htmlString | string | Yes | The HTML string to extract meta tags from |
getTitle
Extract the text content of the <title> tag
Module: html | Returns: string -- The title text, or null if no <title> tag is found
html.getTitle "<html><head><title>My Page</title></head></html>"
| Parameter | Type | Required | Description |
|---|---|---|---|
htmlString | string | Yes | The HTML string to extract the title from |
extractTables
Extract HTML tables as arrays of rows and cells
Module: html | Returns: array -- Array of tables, each as an array of rows, each row as an array of cell strings
html.extractTables "<table><tr><td>A</td><td>B</td></tr></table>"
| Parameter | Type | Required | Description |
|---|---|---|---|
htmlString | string | Yes | The HTML string containing table(s) |
wrap
Wrap text in an HTML tag with optional attributes
Module: html | Returns: string -- The HTML string with text wrapped in the specified tag
html.wrap "Hello" "p" {"class": "greeting"}
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes | The text content to wrap |
tagName | string | Yes | The tag name to wrap with (e.g. "div", "span") |
attributes | object | No | Optional object of attribute key-value pairs |
minify
Minify HTML by removing extra whitespace and newlines between tags
Module: html | Returns: string -- The minified HTML string
html.minify "<div>\n <p> Hello </p>\n</div>"
| Parameter | Type | Required | Description |
|---|---|---|---|
htmlString | string | Yes | The HTML string to minify |
Error Handling
All functions throw on failure. Common errors:
| Error | Cause |
|---|---|
| (standard errors) | Check function parameters and authentication |
@desc "Strip tags and validate result"
do
set $result as html.stripTags "<p>Hello <b>world</b></p>"
if $result != null
print "Success"
else
print "No result"
end
enddo
Recipes
1. List and iterate Attribute
Retrieve all items and loop through them.
@desc "Get attribute and iterate results"
do
set $result as html.getAttribute "<div class=\"a\"></div><div class=\"b\"></div>" "div" "class"
each $item in $result
print $item
end
enddo
2. Multi-step HTML workflow
Chain multiple html operations together.
@desc "Strip tags, extract text, and more"
do
set $r_stripTags as html.stripTags "<p>Hello <b>world</b></p>"
set $r_extractText as html.extractText "<p>One</p><p>Two</p>" "p"
set $r_extractLinks as html.extractLinks "<a href=\"https://example.com\">Example</a>"
print "All operations complete"
enddo
3. Safe stripTags with validation
Check results before proceeding.
@desc "Strip tags and validate result"
do
set $result as html.stripTags "<p>Hello <b>world</b></p>"
if $result != null
print "Success: " + $result
else
print "Operation returned no data"
end
enddo
Related Modules
- json -- JSON module for complementary functionality
Versions (1)
| Version | Tag | Published |
|---|---|---|
| 0.1.1 | latest | 1 months ago |
Related Modules
@robinpathv0.1.4
SMTP email sending and address parsing for RobinPath
hash
JS@robinpathv0.1.3
Cryptographic hashing utilities: MD5, SHA family, HMAC, CRC32, file hashing, UUID v5 generation, secure random bytes, and content fingerprinting
csv
JS@robinpathv0.1.2
Parse and stringify CSV data
apollo
JS@robinpathv0.1.2
Apollo module for RobinPath.
$ robinpath add @robinpath/html
