1 package HTML
::Email
::Obfuscate
;
7 HTML::Email::Obfuscate - Obfuscated HTML email addresses that look normal
11 I<"Don't put emails directly on the page, they will be scraped">
13 Stuff that, I'm sick of looking at C<bob at smith dot com>. Why can't we
14 just write emails in a way that looks normal to people, but is very, very
15 difficult to scrape off. Most email scrapers only use very very simple
16 parsing methods. And it isn't as if it is hard to just do.
18 # Before we search for email addresses...
19 $page =~ s/\s+at\s+/@/g;
20 $page =~ s/\s+dot\s+/./g;
22 This is an arms war dammit, and I want nukes!
24 =head2 About this Module
26 This module was written during OSDC/YAPC.AU to demonstrate how quick and
27 easy it is to write a basic module and put it on CPAN. The code was
28 written in about 40 minutes, the documentation was added during a break
29 period before drinks and dinner, and the packing and test files were
30 added during the python keynote (significant whitespace... ew...).
34 This module starts by applying a fairly basic set of character escapes to
35 avoid the most basic scrapers, and then layers more and more crap on
36 randomly, so that any scraper will need to implement more and more of a
37 full web browser, while keeping the email looking "normal" to anyone
40 I've only scraped the surface of what we can achieve, and I'll leave it to
41 others to submit patches to improve it from here on.
43 =head2 Using HTML::Email::Obfuscate
45 This is a pretty simple module.
47 First, create an obfuscator object. This is just a simple object that holds
48 some preferences about how extreme you want to be about the obfuscation.
50 # Create a default obfuscation object
51 my $Email = HTML::Email::Obfuscate->new;
53 Now to turn a normal email string into an obfuscated and fully escaped HTML
54 one, just provide it to the escape_html method.
56 # Obfuscate my email address
57 my $html = $Email->escape_html( 'cpan@ali.as' );
59 And we get something like this
61 ***Example here once I get a chance to run it***
63 The defaults are fairly insane, so for people that just want veeeery simple
64 escaping, we'll provide a lite version.
66 # Create a "lite" obfuscator
67 my $Email = HTML::Email::Obfuscate->new( lite => 1 );
69 # Access the lite escape method directly, regardless of the
70 # obfuscator's constructor params.
71 my $html = $Email->escape_html_lite( 'cpan@ali.as' );
73 For the more serious people, we can also add some more extreme measures
74 that are probably not going to be compatible with everything, such as
77 # Allow the obfuscator to use JavaScript
78 my $Email = HTML::Email::Obfuscator->new( javascript => 1 );
80 Best not to use that unless you have a JavaScript-capable browser.
82 I think that just about covers it, and my 7 minute lightning talk is
91 use HTML
::Entities
();
93 use vars
qw{$VERSION @WRAP_METHOD};
97 # The list of modifier methods
100 _random_modifier_comment
101 _random_modifier_javascript
109 #####################################################################
114 =head2 new $param => $value [, ... ]
116 The C<new> constructor creates a new obfuscation object, which use can
117 then use to obfuscate as many email addresses as you like, at whatever
118 severity you want it to be done.
120 It takes two optional parameters.
122 If you set the C<'javascript'> param, the obfuscator will add JavaScript
123 obfuscation (possibly, and randomly) to the mix of obfuscation routines.
125 If you set the C<'lite'> param, the obfuscator will only use the most
126 basic form of escaping, which will only fool scanner that don't do
127 HTML entity decoding. Setting 'lite' implies that JavaScript should not
128 be used, even if you explicitly try to turn it on.
130 Returns a new C<HTML::Email::Obfuscate> object.
136 my %args = ref $_[0] eq 'HASH' ?
%{shift()} : @_;
137 %args = map { lc $_ } %args;
139 # Create the defailt HTML generation object
146 $self->{javascript
} = 1 if $args{javascript
};
147 $self->{javascript
} = '' if $args{lite
};
148 $self->{lite
} = 1 if $args{lite
};
155 =head2 escape_html_lite $email
157 On an otherwise normal obfuscator, the C<escape_html_lite> method provides
158 direct access to the lite method for obfuscating emails.
160 Returns a HTML string, or C<undef> if passed no params, or and undefined
165 sub escape_html_lite
{
167 my $email = defined $_[0] ?
shift : return undef;
168 my $self = ref($either) ?
$either : $either->new(@_) or return undef;
170 # Just escape @ and add a single HTML comment
171 $email =~ s/\@/<!-- \@ -->@/sg;
178 =head2 escape_html $email
180 The C<escape_html> method obfuscates an email according to the params
181 provided to the constructor.
183 Returns a HTML string, or C<undef> if passed no params, or and undefined
190 my $email = defined $_[0] ?
shift : return undef;
191 my $self = ref $either ?
$either : $either->new(@_) or return undef;
193 # Split into a set of characters
194 my @chars = split //, $email;
196 foreach my $char ( @chars ) {
197 # Escape individual characters
198 $char = $self->_escape_char($char);
200 # Randomly wrap 20% of characters
201 next unless rand(1) < 0.1;
202 $char = $self->_random_modifier($char);
213 # Handle various characters
214 return '<!-- @ -->@' if $char eq '@';
215 return '<b>.</b>' if $char eq '.';
217 # Force the numberic escape of 20% of the characters.
218 # Allow the remaining 80% to escape by the normal rules.
219 return (rand(1) < 0.2)
220 ? HTML
::Entities
::encode_numeric
($char, '^ ')
221 : HTML
::Entities
::encode_numeric
($char);
224 sub _random_modifier
{
227 # Which wrap style do we want to use?
228 my $max = $self->{javascript
} ?
2 : 1;
229 my $method = $WRAP_METHOD[int(rand($max))];
230 $self->$method(shift);
233 sub _random_modifier_span
{
234 "<span>$_[1]</span>";
237 sub _random_modifier_comment
{
238 (rand > 0.5) ?
"<!-- @ -->$_[1]" : "$_[1]<!-- @ -->";
241 sub _random_modifier_javascript
{
244 $html =~ s/'/"/g;
245 qq~<script language
="JavaScript">document
.write('$html')</script
>~;
254 OK, other than compile testing, I admit that I haven't really done
255 anything significant in the way of testing. I mean, there was B<SUCH>
256 an interesting python talk on, and how on earth do you test something
257 that has randomised output. :/
259 So yeah, it would be nice to write some better tests.
263 Bugs should be reported via the CPAN bug tracker at
265 L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTML-Email-Obfuscate>
267 For other issues, or commercial enhancement or support, contact the author.
271 Adam Kennedy E<lt>adamk@cpan.orgE<gt>
273 Thank you to Phase N (L<http://phase-n.com/>) for permitting
274 the open sourcing and release of this distribution.
278 Copyright 2004 - 2006 Adam Kennedy.
280 This program is free software; you can redistribute
281 it and/or modify it under the same terms as Perl itself.
283 The full text of the license can be found in the
284 LICENSE file included with this module.