Class: Raif::Utils::HtmlFragmentProcessor
- Inherits:
-
Object
- Object
- Raif::Utils::HtmlFragmentProcessor
- Defined in:
- lib/raif/utils/html_fragment_processor.rb
Overview
Utility class for processing HTML fragments with various cleaning and transformation operations.
This class provides methods for sanitizing HTML content, converting markdown links to HTML, processing existing HTML links (adding target=“_blank”, stripping tracking parameters), and removing tracking parameters from URLs.
Constant Summary collapse
- TRACKING_PARAMS =
List of common tracking parameters to remove from URLs
%w[ utm_source utm_medium utm_campaign utm_term utm_content utm_id ]
Class Method Summary collapse
-
.clean_html_fragment(html, allowed_tags: nil, allowed_attributes: nil) ⇒ String
Cleans and sanitizes an HTML fragment by removing empty text nodes and dangerous content.
-
.convert_markdown_links_to_html(text) ⇒ String
Converts markdown-style links to HTML anchor tags with target=“_blank” and rel=“noopener”.
-
.process_links(html, add_target_blank:, strip_tracking_parameters:) ⇒ String
Processes existing HTML links by optionally adding target=“_blank” and stripping tracking parameters.
-
.strip_tracking_parameters(url) ⇒ String
Removes tracking parameters (UTM parameters) from a URL.
Class Method Details
.clean_html_fragment(html, allowed_tags: nil, allowed_attributes: nil) ⇒ String
Cleans and sanitizes an HTML fragment by removing empty text nodes and dangerous content.
34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# File 'lib/raif/utils/html_fragment_processor.rb', line 34 def clean_html_fragment(html, allowed_tags: nil, allowed_attributes: nil) fragment = html.is_a?(Nokogiri::HTML::DocumentFragment) ? html : Nokogiri::HTML.fragment(html) fragment.traverse do |node| if node.text? && node.text.strip.empty? node.remove end end = .presence || Rails::HTML5::SafeListSanitizer. allowed_attributes = allowed_attributes.presence || Rails::HTML5::SafeListSanitizer.allowed_attributes ActionController::Base.helpers.sanitize(fragment.to_html, tags: , attributes: allowed_attributes).strip end |
.convert_markdown_links_to_html(text) ⇒ String
Converts markdown-style links to HTML anchor tags with target=“_blank” and rel=“noopener”.
Converts [text](url) format to <a href=“url” target=“_blank” rel=“noopener”>text</a>. Also strips tracking parameters from the URLs.
64 65 66 67 68 69 70 71 72 |
# File 'lib/raif/utils/html_fragment_processor.rb', line 64 def convert_markdown_links_to_html(text) # Convert markdown links [text](url) to HTML links <a href="url" target="_blank" rel="noopener">text</a> text.gsub(/\[([^\]]*)\]\(([^)]+)\)/) do |_match| text = ::Regexp.last_match(1) url = ::Regexp.last_match(2) clean_url = strip_tracking_parameters(url) %(<a href="#{CGI.escapeHTML(clean_url)}" target="_blank" rel="noopener">#{CGI.escapeHTML(text)}</a>) end end |
.process_links(html, add_target_blank:, strip_tracking_parameters:) ⇒ String
Processes existing HTML links by optionally adding target=“_blank” and stripping tracking parameters.
This method provides fine-grained control over link processing with configurable options for both target=“_blank” addition and tracking parameter removal.
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/raif/utils/html_fragment_processor.rb', line 99 def process_links(html, add_target_blank:, strip_tracking_parameters:) fragment = html.is_a?(Nokogiri::HTML::DocumentFragment) ? html : Nokogiri::HTML.fragment(html) fragment.css("a").each do |link| if add_target_blank link["target"] = "_blank" link["rel"] = "noopener" end if strip_tracking_parameters link["href"] = strip_tracking_parameters(link["href"]) end end fragment.to_html end |
.strip_tracking_parameters(url) ⇒ String
Removes tracking parameters (UTM parameters) from a URL.
Preserves all non-tracking query parameters and handles various URL formats including relative URLs, absolute URLs, and malformed URLs gracefully.
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/raif/utils/html_fragment_processor.rb', line 139 def strip_tracking_parameters(url) return url unless url.include?("?") begin uri = URI.parse(url) return url unless uri.query # Only process URLs that have a valid scheme and host, or are relative URLs unless uri.scheme || url.start_with?("/", "#") return url end # Parse query parameters and filter out tracking ones params = URI.decode_www_form(uri.query) clean_params = params.reject { |param, _| TRACKING_PARAMS.include?(param.downcase) } # Rebuild the URL uri.query = if clean_params.empty? nil else URI.encode_www_form(clean_params) end uri.to_s rescue URI::InvalidURIError # If URL parsing fails, return the original URL url end end |