Modules@robinpath/html
html

@robinpath/html

0.1.1Node.jsPublic

Parse, extract, escape, and manipulate HTML content using regex-based processing

HTML

Parse, extract, escape, and manipulate HTML content using regex-based processing

Package: @robinpath/html | Category: Utility | Type: Utility

Authentication

No authentication required. All functions are available immediately.

Use Cases

Use the html module when you need to:

  • Remove all HTML tags from a string, returning plain text -- Use html.stripTags to perform this operation
  • Extract the text content of all matching tags by tag name -- Use html.extractText to perform this operation
  • Extract all links (href and text) from anchor tags -- Use html.extractLinks to perform this operation
  • Extract all image sources and alt text from img tags -- Use html.extractImages to perform this operation
  • Extract attribute values from all matching tags -- Use html.getAttribute to perform this operation

Quick Reference

FunctionDescriptionReturns
stripTagsRemove all HTML tags from a string, returning plain textPlain text with all HTML tags removed
extractTextExtract the text content of all matching tags by tag nameArray of text contents from all matching tags
extractLinksExtract all links (href and text) from anchor tagsArray of objects with href and text properties
extractImagesExtract all image sources and alt text from img tagsArray of objects with src and alt properties
getAttributeExtract attribute values from all matching tagsArray of attribute values from matching tags
escapeHTML-escape special characters (&, <, >, ", ')The escaped string safe for use in HTML
unescapeReverse HTML escaping (& < > " ')The unescaped string with HTML entities converted back
extractMetaExtract meta tag name-content pairs from HTMLObject mapping meta tag names to their content values
getTitleExtract the text content of the <title> tagThe title text, or null if no <title> tag is found
extractTablesExtract HTML tables as arrays of rows and cellsArray of tables, each as an array of rows, each row as an array of cell strings
wrapWrap text in an HTML tag with optional attributesThe HTML string with text wrapped in the specified tag
minifyMinify HTML by removing extra whitespace and newlines between tagsThe minified HTML string

Functions

stripTags

Remove all HTML tags from a string, returning plain text

Module: html | Returns: string -- Plain text with all HTML tags removed

html.stripTags "<p>Hello <b>world</b></p>"
ParameterTypeRequiredDescription
htmlStringstringYesThe HTML string to strip tags from

extractText

Extract the text content of all matching tags by tag name

Module: html | Returns: array -- Array of text contents from all matching tags

html.extractText "<p>One</p><p>Two</p>" "p"
ParameterTypeRequiredDescription
htmlStringstringYesThe HTML string to search
tagNamestringYesThe tag name to match (e.g. "p", "h1", "span")

extractLinks

Extract all links (href and text) from anchor tags

Module: html | Returns: array -- Array of objects with href and text properties

html.extractLinks "<a href=\"https://example.com\">Example</a>"
ParameterTypeRequiredDescription
htmlStringstringYesThe HTML string to extract links from

extractImages

Extract all image sources and alt text from img tags

Module: html | Returns: array -- Array of objects with src and alt properties

html.extractImages "<img src=\"photo.jpg\" alt=\"A photo\">"
ParameterTypeRequiredDescription
htmlStringstringYesThe HTML string to extract images from

getAttribute

Extract attribute values from all matching tags

Module: html | Returns: array -- Array of attribute values from matching tags

html.getAttribute "<div class=\"a\"></div><div class=\"b\"></div>" "div" "class"
ParameterTypeRequiredDescription
htmlStringstringYesThe HTML string to search
tagNamestringYesThe tag name to match (e.g. "div", "input")
attributeNamestringYesThe attribute name to extract (e.g. "class", "id")

escape

HTML-escape special characters (&, <, >, ", ')

Module: html | Returns: string -- The escaped string safe for use in HTML

html.escape "<script>alert(1)</script>"
ParameterTypeRequiredDescription
stringstringYesThe string to HTML-escape

unescape

Reverse HTML escaping (& < > " ')

Module: html | Returns: string -- The unescaped string with HTML entities converted back

html.unescape "&lt;p&gt;Hello&lt;/p&gt;"
ParameterTypeRequiredDescription
stringstringYesThe HTML-escaped string to unescape

extractMeta

Extract meta tag name-content pairs from HTML

Module: html | Returns: object -- Object mapping meta tag names to their content values

html.extractMeta "<meta name=\"description\" content=\"A page\">"
ParameterTypeRequiredDescription
htmlStringstringYesThe HTML string to extract meta tags from

getTitle

Extract the text content of the <title> tag

Module: html | Returns: string -- The title text, or null if no <title> tag is found

html.getTitle "<html><head><title>My Page</title></head></html>"
ParameterTypeRequiredDescription
htmlStringstringYesThe HTML string to extract the title from

extractTables

Extract HTML tables as arrays of rows and cells

Module: html | Returns: array -- Array of tables, each as an array of rows, each row as an array of cell strings

html.extractTables "<table><tr><td>A</td><td>B</td></tr></table>"
ParameterTypeRequiredDescription
htmlStringstringYesThe HTML string containing table(s)

wrap

Wrap text in an HTML tag with optional attributes

Module: html | Returns: string -- The HTML string with text wrapped in the specified tag

html.wrap "Hello" "p" {"class": "greeting"}
ParameterTypeRequiredDescription
textstringYesThe text content to wrap
tagNamestringYesThe tag name to wrap with (e.g. "div", "span")
attributesobjectNoOptional object of attribute key-value pairs

minify

Minify HTML by removing extra whitespace and newlines between tags

Module: html | Returns: string -- The minified HTML string

html.minify "<div>\n  <p> Hello </p>\n</div>"
ParameterTypeRequiredDescription
htmlStringstringYesThe HTML string to minify

Error Handling

All functions throw on failure. Common errors:

ErrorCause
(standard errors)Check function parameters and authentication
@desc "Strip tags and validate result"
do
  set $result as html.stripTags "<p>Hello <b>world</b></p>"
  if $result != null
    print "Success"
  else
    print "No result"
  end
enddo

Recipes

1. List and iterate Attribute

Retrieve all items and loop through them.

@desc "Get attribute and iterate results"
do
  set $result as html.getAttribute "<div class=\"a\"></div><div class=\"b\"></div>" "div" "class"
  each $item in $result
    print $item
  end
enddo

2. Multi-step HTML workflow

Chain multiple html operations together.

@desc "Strip tags, extract text, and more"
do
  set $r_stripTags as html.stripTags "<p>Hello <b>world</b></p>"
  set $r_extractText as html.extractText "<p>One</p><p>Two</p>" "p"
  set $r_extractLinks as html.extractLinks "<a href=\"https://example.com\">Example</a>"
  print "All operations complete"
enddo

3. Safe stripTags with validation

Check results before proceeding.

@desc "Strip tags and validate result"
do
  set $result as html.stripTags "<p>Hello <b>world</b></p>"
  if $result != null
    print "Success: " + $result
  else
    print "Operation returned no data"
  end
enddo

Related Modules

  • json -- JSON module for complementary functionality

Versions (1)

VersionTagPublished
0.1.1latest1 months ago
Install
$ robinpath add @robinpath/html

Collaborators

Dumitru Balaban
Dumitru Balaban
@dumitru
View all @robinpath modules
Version0.1.1
LicenseMIT
Unpacked Size5.7 KB
Versions1
Weekly Downloads26
Total Downloads26
Stars0
Last Publish1 months ago
Created1 months ago

Keywords

Category

utilities