Merge branch 'master' into rorcz
[girocco.git] / Girocco / Email / Obfuscate.pm
blobb0a8d8a2dac5d7b492b558abaf8746aeb81961ca
1 package HTML::Email::Obfuscate;
3 =pod
5 =head1 NAME
7 HTML::Email::Obfuscate - Obfuscated HTML email addresses that look normal
9 =head1 DESCRIPTION
11 I<"Don't put emails directly on the page, they will be scraped">
13 Stuff that, I'm sick of looking at C<bob at smith dot com>. Why can't we
14 just write emails in a way that looks normal to people, but is very, very
15 difficult to scrape off. Most email scrapers only use very very simple
16 parsing methods. And it isn't as if it is hard to just do.
18 # Before we search for email addresses...
19 $page =~ s/\s+at\s+/@/g;
20 $page =~ s/\s+dot\s+/./g;
22 This is an arms war dammit, and I want nukes!
24 =head2 About this Module
26 This module was written during OSDC/YAPC.AU to demonstrate how quick and
27 easy it is to write a basic module and put it on CPAN. The code was
28 written in about 40 minutes, the documentation was added during a break
29 period before drinks and dinner, and the packing and test files were
30 added during the python keynote (significant whitespace... ew...).
32 =head2 How this works
34 This module starts by applying a fairly basic set of character escapes to
35 avoid the most basic scrapers, and then layers more and more crap on
36 randomly, so that any scraper will need to implement more and more of a
37 full web browser, while keeping the email looking "normal" to anyone
38 browsing.
40 I've only scraped the surface of what we can achieve, and I'll leave it to
41 others to submit patches to improve it from here on.
43 =head2 Using HTML::Email::Obfuscate
45 This is a pretty simple module.
47 First, create an obfuscator object. This is just a simple object that holds
48 some preferences about how extreme you want to be about the obfuscation.
50 # Create a default obfuscation object
51 my $Email = HTML::Email::Obfuscate->new;
53 Now to turn a normal email string into an obfuscated and fully escaped HTML
54 one, just provide it to the escape_html method.
56 # Obfuscate my email address
57 my $html = $Email->escape_html( 'cpan@ali.as' );
59 And we get something like this
61 ***Example here once I get a chance to run it***
63 The defaults are fairly insane, so for people that just want veeeery simple
64 escaping, we'll provide a lite version.
66 # Create a "lite" obfuscator
67 my $Email = HTML::Email::Obfuscate->new( lite => 1 );
69 # Access the lite escape method directly, regardless of the
70 # obfuscator's constructor params.
71 my $html = $Email->escape_html_lite( 'cpan@ali.as' );
73 For the more serious people, we can also add some more extreme measures
74 that are probably not going to be compatible with everything, such as
75 JavaScript. :/
77 # Allow the obfuscator to use JavaScript
78 my $Email = HTML::Email::Obfuscator->new( javascript => 1 );
80 Best not to use that unless you have a JavaScript-capable browser.
82 I think that just about covers it, and my 7 minute lightning talk is
83 probably almost up.
85 =head1 METHODS
87 =cut
89 use 5.005;
90 use strict;
91 use HTML::Entities ();
93 use vars qw{$VERSION @WRAP_METHOD};
94 BEGIN {
95 $VERSION = '1.01';
97 # The list of modifier methods
98 @WRAP_METHOD = qw{
99 _random_modifier_span
100 _random_modifier_comment
101 _random_modifier_javascript
109 #####################################################################
110 # Constructor
112 =pod
114 =head2 new $param => $value [, ... ]
116 The C<new> constructor creates a new obfuscation object, which use can
117 then use to obfuscate as many email addresses as you like, at whatever
118 severity you want it to be done.
120 It takes two optional parameters.
122 If you set the C<'javascript'> param, the obfuscator will add JavaScript
123 obfuscation (possibly, and randomly) to the mix of obfuscation routines.
125 If you set the C<'lite'> param, the obfuscator will only use the most
126 basic form of escaping, which will only fool scanner that don't do
127 HTML entity decoding. Setting 'lite' implies that JavaScript should not
128 be used, even if you explicitly try to turn it on.
130 Returns a new C<HTML::Email::Obfuscate> object.
132 =cut
134 sub new {
135 my $class = shift;
136 my %args = ref $_[0] eq 'HASH' ? %{shift()} : @_;
137 %args = map { lc $_ } %args;
139 # Create the defailt HTML generation object
140 my $self = bless {
141 lite => '',
142 javascript => '',
143 }, $class;
145 # Flag control
146 $self->{javascript} = 1 if $args{javascript};
147 $self->{javascript} = '' if $args{lite};
148 $self->{lite} = 1 if $args{lite};
150 $self;
153 =pod
155 =head2 escape_html_lite $email
157 On an otherwise normal obfuscator, the C<escape_html_lite> method provides
158 direct access to the lite method for obfuscating emails.
160 Returns a HTML string, or C<undef> if passed no params, or and undefined
161 param.
163 =cut
165 sub escape_html_lite {
166 my $either = shift;
167 my $email = defined $_[0] ? shift : return undef;
168 my $self = ref($either) ? $either : $either->new(@_) or return undef;
170 # Just escape @ and add a single HTML comment
171 $email =~ s/\@/<!-- \@ -->&#x40;/sg;
173 $email;
176 =pod
178 =head2 escape_html $email
180 The C<escape_html> method obfuscates an email according to the params
181 provided to the constructor.
183 Returns a HTML string, or C<undef> if passed no params, or and undefined
184 param.
186 =cut
188 sub escape_html {
189 my $either = shift;
190 my $email = defined $_[0] ? shift : return undef;
191 my $self = ref $either ? $either : $either->new(@_) or return undef;
193 # Split into a set of characters
194 my @chars = split //, $email;
196 foreach my $char ( @chars ) {
197 # Escape individual characters
198 $char = $self->_escape_char($char);
200 # Randomly wrap 20% of characters
201 next unless rand(1) < 0.1;
202 $char = $self->_random_modifier($char);
205 # Join and return
206 join '', @chars;
209 sub _escape_char {
210 my $self = shift;
211 my $char = shift;
213 # Handle various characters
214 return '<!-- @ -->&#x40;' if $char eq '@';
215 return '<b>&#x2e;</b>' if $char eq '.';
217 # Force the numberic escape of 20% of the characters.
218 # Allow the remaining 80% to escape by the normal rules.
219 return (rand(1) < 0.2)
220 ? HTML::Entities::encode_numeric($char, '^ ')
221 : HTML::Entities::encode_numeric($char);
224 sub _random_modifier {
225 my $self = shift;
227 # Which wrap style do we want to use?
228 my $max = $self->{javascript} ? 2 : 1;
229 my $method = $WRAP_METHOD[int(rand($max))];
230 $self->$method(shift);
233 sub _random_modifier_span {
234 "<span>$_[1]</span>";
237 sub _random_modifier_comment {
238 (rand > 0.5) ? "<!-- @ -->$_[1]" : "$_[1]<!-- @ -->";
241 sub _random_modifier_javascript {
242 my $self = shift;
243 my $html = shift;
244 $html =~ s/'/&quot;/g;
245 qq~<script language="JavaScript">document.write('$html')</script>~;
250 =pod
252 =head1 TO DO
254 OK, other than compile testing, I admit that I haven't really done
255 anything significant in the way of testing. I mean, there was B<SUCH>
256 an interesting python talk on, and how on earth do you test something
257 that has randomised output. :/
259 So yeah, it would be nice to write some better tests.
261 =head1 SUPPORT
263 Bugs should be reported via the CPAN bug tracker at
265 L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTML-Email-Obfuscate>
267 For other issues, or commercial enhancement or support, contact the author.
269 =head1 AUTHORS
271 Adam Kennedy E<lt>adamk@cpan.orgE<gt>
273 Thank you to Phase N (L<http://phase-n.com/>) for permitting
274 the open sourcing and release of this distribution.
276 =head1 COPYRIGHT
278 Copyright 2004 - 2006 Adam Kennedy.
280 This program is free software; you can redistribute
281 it and/or modify it under the same terms as Perl itself.
283 The full text of the license can be found in the
284 LICENSE file included with this module.
286 =cut