Home Parse 'ul' and 'ol' tags
Reply: 0

Parse 'ul' and 'ol' tags

user4031
1#
user4031 Published in June 19, 2018, 2:21 pm

I have to handle deep nesting of ul, ol, and li tags. I need to give the same view as we are giving in the browser. I want to achieve the following example in a pdf file:

 text = "
<body>
    <ol>
        <li>One</li>
        <li>Two

            <ol>
                <li>Inner One</li>
                <li>inner Two

                    <ul>
                        <li>hey

                            <ol>
                                <li>hiiiiiiiii</li>
                                <li>why</li>
                                <li>hiiiiiiiii</li>
                            </ol>
                        </li>
                        <li>aniket </li>
                    </li>
                </ul>
                <li>sup </li>
                <li>there </li>
            </ol>
            <li>hey </li>
            <li>Three</li>
        </li>
    </ol>
    <ol>
        <li>Introduction</li>
        <ol>
            <li>Introduction</li>
        </ol>
        <li>Description</li>
        <li>Observation</li>
        <li>Results</li>
        <li>Summary</li>
    </ol>
    <ul>
        <li>Introduction</li>
        <li>Description

            <ul>
                <li>Observation

                    <ul>
                        <li>Results

                            <ul>
                                <li>Summary</li>
                            </ul>
                        </li>
                    </ul>
                </li>
            </ul>
        </li>
        <li>Overview</li>
    </ul>
</body>"

I have to use prawn for my task. But prawn doesn't support HTML tags. So, I came up with a solution using nokogiri:. I am parsing and later removing the tags with gsub. The below solution I have written for a part of the above content but the problem is ul and ol can vary.

     RULES = {
  ol: {
    1 => ->(index) { "#{index + 1}. " },
    2 => ->(index) { "#{}" },
    3 => ->(index) { "#{}" },
    4 => ->(index) { "#{}" }
  },
  ul: {
    1 => ->(_) { "\u2022 " },
    2 => ->(_) { "" },
    3 => ->(_) { "" },
    4 => ->(_) { "" },
  }
}

def ol_rule(group, deepness: 1)
  group.search('> li').each_with_index do |item, i|
    prefix = RULES[:ol][deepness].call(i)
    item.prepend_child(prefix)
    descend(item, deepness + 1)
  end
end

def ul_rule(group, deepness: 1)
  group.search('> li').each_with_index do |item, i|
    prefix = RULES[:ul][deepness].call(i)
    item.prepend_child(prefix)
    descend(item, deepness + 1)
  end
end

def descend(item, deepness)
  item.search('> ol').each do |ol|
    ol_rule(ol, deepness: deepness)
  end
  item.search('> ul').each do |ul|
    ul_rule(ul, deepness: deepness)
  end
end

doc = Nokogiri::HTML.fragment(text)

doc.search('ol').each do |group|
  ol_rule(group, deepness: 1)
end

doc.search('ul').each do |group|
  ul_rule(group, deepness: 1)
end


  puts doc.inner_text


1. One
2. Two

1. Inner One
2. inner Two

• hey

1. hiiiiiiiii
2. why
3. hiiiiiiiii


• aniket 


3. sup 
4. there 

3. hey 
4. Three



1. Introduction

1. Introduction

2. Description
3. Observation
4. Results
5. Summary



• Introduction
• Description

• Observation

• Results

• Summary






• Overview

Problem

1) What I want to achieve is how to handle space when working with ul and ol tags
2) How to handle deep nesting when li come inside ul or li come inside ol

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.321378 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO