Nov
21

Ruby 1.9.1 character encoding, Regular Expressions and Ruby on Rails

on November 21st, 2009 by admin

As many of you already know, a major change in Ruby 1.9 was the introduction of character encodings. It is a fantastic feature, however it is causing problems in the migration of applications.

In recent days I was playing with making Ruby on Rails 2.3.4 work on Ruby 1.9.1 and I got to the point where my legacy rails application starts – did not have a chance to do an extensive testing yet.

One hurdle was in the gem tmail, which is used by the rails component activemailer. Even installing the latest tmail 1.2.3.1 did not help, so I had to figure out what is wrong. The error messages pointed to incorrect character sequences in the Regular Expressions.

This article is here to help you understand what is happening in Ruby 1.9.1.

In 1.9.1 every string has an encoding, and the application also has input and output encodings, and a default internal encoding defined. Suppose that you run your applicaton on UTF-8 encoding. UTF-8 is multi byte, however Regular Expressions often want to operate on a byte level.

This one is simple, and it gives no error:

s = "\x00-\x1f"
puts s.encoding
r = /#{s}/
puts r.to_s

However, as soon as you use a more complex byte range (intentionally not saying character range), it gives an error, “invalid multi byte character”. Rightfully, because the program thinks in UTF-8.

s = "\x00-\x1f\x7f-\xff"
puts s.encoding
r = /#{s}/
puts r.to_s

Very simple solution comes to our rescue, use the Ruby 1.9 method force_encoding to tell the program ‘this string is meant to be a byte string’.

s = "\x00-\x1f\x7f-\xff".force_encoding("ASCII-8BIT")
puts s.encoding
r = /#{s}/
puts r.to_s

So, applying these on Ruby on Rails 2.3.4, do the following:

  • install the gem tmail 1.2.3.1
  • if the utils.rb file gives you errors, then add force_encoding(“ASCII-8BIT”) to lines 112-115
  • if the scanner_r.rb gives you errors, then add force_encoding(“ASCII-8BIT”) to lines 44-46, 49, 51-53.

Related Posts

  • A smart Winamp play-list - the implementation
  • wxRuby application with Ruby threads
  • Surprising element in design
  • Compiling Ruby with MinGW
  • Ruby 1.9 debugger
Tags: programming, ruby
english - RSS Feed

Leave a Reply

Click here to cancel reply.

CAPTCHA Image
Refresh Image

    Recent Posts

    • Star Wars és Indul a bakterház
    • Deep Dive's first busy day
    • Deep Dive for Flickr
    • Brüno
    • Cloud Computing
    • A hangyák akik számolni tudnak
    • Ruby 1.9.1 character encoding, Regular Expressions and Ruby on Rails
    • Emergency
    • A smart Winamp play-list – the implementation
    • Mese

    Archives

    • January 2010 (2)
    • December 2009 (3)
    • November 2009 (6)
    • October 2009 (6)
    • September 2009 (8)
    • August 2009 (14)

    Categories

    • english (25)
    • magyar (14)

    életmód autó cloud fényképészet food gondolatok hírek humor logic logika matematika math mese movie music photography programming ruby social software television travel tudomány utazás web zene

    WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.

    Blogroll

    • Klement Vilmos

    Links

    • Home page
    • Photography

    Meta

    • Log in
Blog software by WordPress
Theme based on blue-fade by Web Considerations, LLC
Entries (RSS) and Comments (RSS)
35 queries. 0.215