HTML parsing using gem Nokogiri in Rails

parsing html is easy using Nokogiri, to do follow the steps

    Step 1 : Open the terminal and go to your rails application folder.

  • $ cd yourRailsApp
    add  the following line to your Gemfile
    gem  ‘nokogiri’
    First install the dependencies

    $  sudo apt-get install libxml2 libxml2-dev libxslt1-dev
    

    For Fedora do

    $ sudo yum install libxml2-devel libxslt-devel
    

    For CentOS:

    $ sudo yum install -y gcc ruby-devel libxml2 libxml2-devel libxslt libxslt-devel
    

    do bundle install
    /yourRailsApp $ bundle install
    Your nokogiri installation is completed. Parsing HTML documents look like
    doc = Nokogiri::HTML(html_document) and parsing XML documents look like
    doc = Nokogiri::XML(xml_document)
    Nokogiri converts HTML and XML documents into a tree data structure. Nokogiri extracts data using this tree. Xpath and CSS are small languages used for tree traversal. Xpath is a language that was written to traverse an XML tree struture, but we can use it with HTML tree as well.

  • Lets try an example take the folowing html code

    Step 2 : In views/form.erb add

  • Add a text field in your form
    <%= text_field_tag(“my[url]”, @user.url) %> and enter the sample code in this text field.
  • Step 3 : In controller add

  • @user.url = params[“my”][“url”]
    call the function
    @user.get_value_from_url
    write the function in the model
    def get_value_from_url
    node = Nokogiri::XML(self.url)
    node.xpath(‘//param[@name=”pic”]’).each do |node|
    @result = node.attribute(“value”).to_s
    end
    @result
    end

Then you get the value : “http://www.my_site.com/v/xedFamzsaaxo7hc0?fs=1&amp;hl=en_US&#8221; in the result. Look its so easy to use nokogiri. Its Wonderful !!

You can also view Aaron Patterson’s (creator of Nokogiri) blog from here.

Unknown's avatar

Author: Abhilash

Hi, I’m Abhilash! A seasoned web developer with 15 years of experience specializing in Ruby and Ruby on Rails. Since 2010, I’ve built scalable, robust web applications and worked with frameworks like Angular, Sinatra, Laravel, Node.js, Vue and React. Passionate about clean, maintainable code and continuous learning, I share insights, tutorials, and experiences here. Let’s explore the ever-evolving world of web development together!

Leave a comment