Ruby 1.9.1 character encoding, Regular Expressions and Ruby on Rails
on November 21st, 2009 by admin
As many of you already know, a major change in Ruby 1.9 was the introduction of character encodings. It is a fantastic feature, however it is causing problems in the migration of applications.
In recent days I was playing with making Ruby on Rails 2.3.4 work on Ruby 1.9.1 and I got to the point where my legacy rails application starts – did not have a chance to do an extensive testing yet.
One hurdle was in the gem tmail, which is used by the rails component activemailer. Even installing the latest tmail 1.2.3.1 did not help, so I had to figure out what is wrong. The error messages pointed to incorrect character sequences in the Regular Expressions.
This article is here to help you understand what is happening in Ruby 1.9.1.
In 1.9.1 every string has an encoding, and the application also has input and output encodings, and a default internal encoding defined. Suppose that you run your applicaton on UTF-8 encoding. UTF-8 is multi byte, however Regular Expressions often want to operate on a byte level.
This one is simple, and it gives no error:
s = "\x00-\x1f"
puts s.encoding
r = /#{s}/
puts r.to_s
However, as soon as you use a more complex byte range (intentionally not saying character range), it gives an error, “invalid multi byte character”. Rightfully, because the program thinks in UTF-8.
s = "\x00-\x1f\x7f-\xff"
puts s.encoding
r = /#{s}/
puts r.to_s
Very simple solution comes to our rescue, use the Ruby 1.9 method force_encoding to tell the program ‘this string is meant to be a byte string’.
s = "\x00-\x1f\x7f-\xff".force_encoding("ASCII-8BIT")
puts s.encoding
r = /#{s}/
puts r.to_s
So, applying these on Ruby on Rails 2.3.4, do the following:
- install the gem tmail 1.2.3.1
- if the utils.rb file gives you errors, then add force_encoding(“ASCII-8BIT”) to lines 112-115
- if the scanner_r.rb gives you errors, then add force_encoding(“ASCII-8BIT”) to lines 44-46, 49, 51-53.
Kedves Kornél! Nem tudom, hogy mióta vagy itt a freemailon, de ma találtam meg a blogodat, nem is keresve, csak úgy rácsodálkozva. Örülök nagyon. Külön öröm, hogy a gondolataidat megismerhetem, ez nagyszerű élmény. Az élőbeszéd másért kedves. Most nagyban készülök a 3. Sp. utamra a Camino Primitiv-óra.Itt keresgéltem a YouTube-n hangolódom a kemény útra, tervezek,és valahogy rámentem a kedvencekre. Idáig is kedvenc voltál. Legyen boldog minden napod. Nem tudom mit kell tenni. Itt hagyom, majd te belenyulsz. Isten áldjon. PÁ puszi