HTML parsing using gem Nokogiri in Rails

parsing html is easy using Nokogiri, to do follow the steps

    Step 1 : Open the terminal and go to your rails application folder.

  • $ cd yourRailsApp
    add  the following line to your Gemfile
    gem  ‘nokogiri’
    First install the dependencies

    $  sudo apt-get install libxml2 libxml2-dev libxslt1-dev
    

    For Fedora do

    $ sudo yum install libxml2-devel libxslt-devel
    

    For CentOS:

    $ sudo yum install -y gcc ruby-devel libxml2 libxml2-devel libxslt libxslt-devel
    

    do bundle install
    /yourRailsApp $ bundle install
    Your nokogiri installation is completed. Parsing HTML documents look like
    doc = Nokogiri::HTML(html_document) and parsing XML documents look like
    doc = Nokogiri::XML(xml_document)
    Nokogiri converts HTML and XML documents into a tree data structure. Nokogiri extracts data using this tree. Xpath and CSS are small languages used for tree traversal. Xpath is a language that was written to traverse an XML tree struture, but we can use it with HTML tree as well.

  • Lets try an example take the folowing html code

    Step 2 : In views/form.erb add

  • Add a text field in your form
    <%= text_field_tag(“my[url]”, @user.url) %> and enter the sample code in this text field.
  • Step 3 : In controller add

  • @user.url = params[“my”][“url”]
    call the function
    @user.get_value_from_url
    write the function in the model
    def get_value_from_url
    node = Nokogiri::XML(self.url)
    node.xpath(‘//param[@name=”pic”]’).each do |node|
    @result = node.attribute(“value”).to_s
    end
    @result
    end

Then you get the value : “http://www.my_site.com/v/xedFamzsaaxo7hc0?fs=1&amp;hl=en_US&#8221; in the result. Look its so easy to use nokogiri. Its Wonderful !!

You can also view Aaron Patterson’s (creator of Nokogiri) blog from here.

Author: Abhilash

Hey! This is Abhilash - A Ruby developer for years, specialised on web development, working as a freelance Ruby Developer. I have experience in other frameworks such as Angular, Sinatra, Laravel etc. Mainly working on Ruby On Rails platform since 2010. This blog is about Ruby, Ruby On Rails and other subjects where I have had a chance to work on. You can contact me here: abhilash.amballur@gmail.com Abhilash AK

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s