1 | =head1 NAME |
---|
2 | |
---|
3 | perlmod - Perl modules (packages and symbol tables) |
---|
4 | |
---|
5 | =head1 DESCRIPTION |
---|
6 | |
---|
7 | =head2 Packages |
---|
8 | |
---|
9 | Perl provides a mechanism for alternative namespaces to protect |
---|
10 | packages from stomping on each other's variables. In fact, there's |
---|
11 | really no such thing as a global variable in Perl . The package |
---|
12 | statement declares the compilation unit as being in the given |
---|
13 | namespace. The scope of the package declaration is from the |
---|
14 | declaration itself through the end of the enclosing block, C<eval>, |
---|
15 | or file, whichever comes first (the same scope as the my() and |
---|
16 | local() operators). Unqualified dynamic identifiers will be in |
---|
17 | this namespace, except for those few identifiers that if unqualified, |
---|
18 | default to the main package instead of the current one as described |
---|
19 | below. A package statement affects only dynamic variables--including |
---|
20 | those you've used local() on--but I<not> lexical variables created |
---|
21 | with my(). Typically it would be the first declaration in a file |
---|
22 | included by the C<do>, C<require>, or C<use> operators. You can |
---|
23 | switch into a package in more than one place; it merely influences |
---|
24 | which symbol table is used by the compiler for the rest of that |
---|
25 | block. You can refer to variables and filehandles in other packages |
---|
26 | by prefixing the identifier with the package name and a double |
---|
27 | colon: C<$Package::Variable>. If the package name is null, the |
---|
28 | C<main> package is assumed. That is, C<$::sail> is equivalent to |
---|
29 | C<$main::sail>. |
---|
30 | |
---|
31 | The old package delimiter was a single quote, but double colon is now the |
---|
32 | preferred delimiter, in part because it's more readable to humans, and |
---|
33 | in part because it's more readable to B<emacs> macros. It also makes C++ |
---|
34 | programmers feel like they know what's going on--as opposed to using the |
---|
35 | single quote as separator, which was there to make Ada programmers feel |
---|
36 | like they knew what's going on. Because the old-fashioned syntax is still |
---|
37 | supported for backwards compatibility, if you try to use a string like |
---|
38 | C<"This is $owner's house">, you'll be accessing C<$owner::s>; that is, |
---|
39 | the $s variable in package C<owner>, which is probably not what you meant. |
---|
40 | Use braces to disambiguate, as in C<"This is ${owner}'s house">. |
---|
41 | |
---|
42 | Packages may themselves contain package separators, as in |
---|
43 | C<$OUTER::INNER::var>. This implies nothing about the order of |
---|
44 | name lookups, however. There are no relative packages: all symbols |
---|
45 | are either local to the current package, or must be fully qualified |
---|
46 | from the outer package name down. For instance, there is nowhere |
---|
47 | within package C<OUTER> that C<$INNER::var> refers to |
---|
48 | C<$OUTER::INNER::var>. It would treat package C<INNER> as a totally |
---|
49 | separate global package. |
---|
50 | |
---|
51 | Only identifiers starting with letters (or underscore) are stored |
---|
52 | in a package's symbol table. All other symbols are kept in package |
---|
53 | C<main>, including all punctuation variables, like $_. In addition, |
---|
54 | when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, |
---|
55 | ARGVOUT, ENV, INC, and SIG are forced to be in package C<main>, |
---|
56 | even when used for other purposes than their built-in one. If you |
---|
57 | have a package called C<m>, C<s>, or C<y>, then you can't use the |
---|
58 | qualified form of an identifier because it would be instead interpreted |
---|
59 | as a pattern match, a substitution, or a transliteration. |
---|
60 | |
---|
61 | Variables beginning with underscore used to be forced into package |
---|
62 | main, but we decided it was more useful for package writers to be able |
---|
63 | to use leading underscore to indicate private variables and method names. |
---|
64 | $_ is still global though. See also L<perlvar/"Technical Note on the |
---|
65 | Syntax of Variable Names">. |
---|
66 | |
---|
67 | C<eval>ed strings are compiled in the package in which the eval() was |
---|
68 | compiled. (Assignments to C<$SIG{}>, however, assume the signal |
---|
69 | handler specified is in the C<main> package. Qualify the signal handler |
---|
70 | name if you wish to have a signal handler in a package.) For an |
---|
71 | example, examine F<perldb.pl> in the Perl library. It initially switches |
---|
72 | to the C<DB> package so that the debugger doesn't interfere with variables |
---|
73 | in the program you are trying to debug. At various points, however, it |
---|
74 | temporarily switches back to the C<main> package to evaluate various |
---|
75 | expressions in the context of the C<main> package (or wherever you came |
---|
76 | from). See L<perldebug>. |
---|
77 | |
---|
78 | The special symbol C<__PACKAGE__> contains the current package, but cannot |
---|
79 | (easily) be used to construct variables. |
---|
80 | |
---|
81 | See L<perlsub> for other scoping issues related to my() and local(), |
---|
82 | and L<perlref> regarding closures. |
---|
83 | |
---|
84 | =head2 Symbol Tables |
---|
85 | |
---|
86 | The symbol table for a package happens to be stored in the hash of that |
---|
87 | name with two colons appended. The main symbol table's name is thus |
---|
88 | C<%main::>, or C<%::> for short. Likewise symbol table for the nested |
---|
89 | package mentioned earlier is named C<%OUTER::INNER::>. |
---|
90 | |
---|
91 | The value in each entry of the hash is what you are referring to when you |
---|
92 | use the C<*name> typeglob notation. In fact, the following have the same |
---|
93 | effect, though the first is more efficient because it does the symbol |
---|
94 | table lookups at compile time: |
---|
95 | |
---|
96 | local *main::foo = *main::bar; |
---|
97 | local $main::{foo} = $main::{bar}; |
---|
98 | |
---|
99 | You can use this to print out all the variables in a package, for |
---|
100 | instance. The standard but antequated F<dumpvar.pl> library and |
---|
101 | the CPAN module Devel::Symdump make use of this. |
---|
102 | |
---|
103 | Assignment to a typeglob performs an aliasing operation, i.e., |
---|
104 | |
---|
105 | *dick = *richard; |
---|
106 | |
---|
107 | causes variables, subroutines, formats, and file and directory handles |
---|
108 | accessible via the identifier C<richard> also to be accessible via the |
---|
109 | identifier C<dick>. If you want to alias only a particular variable or |
---|
110 | subroutine, assign a reference instead: |
---|
111 | |
---|
112 | *dick = \$richard; |
---|
113 | |
---|
114 | Which makes $richard and $dick the same variable, but leaves |
---|
115 | @richard and @dick as separate arrays. Tricky, eh? |
---|
116 | |
---|
117 | This mechanism may be used to pass and return cheap references |
---|
118 | into or from subroutines if you won't want to copy the whole |
---|
119 | thing. It only works when assigning to dynamic variables, not |
---|
120 | lexicals. |
---|
121 | |
---|
122 | %some_hash = (); # can't be my() |
---|
123 | *some_hash = fn( \%another_hash ); |
---|
124 | sub fn { |
---|
125 | local *hashsym = shift; |
---|
126 | # now use %hashsym normally, and you |
---|
127 | # will affect the caller's %another_hash |
---|
128 | my %nhash = (); # do what you want |
---|
129 | return \%nhash; |
---|
130 | } |
---|
131 | |
---|
132 | On return, the reference will overwrite the hash slot in the |
---|
133 | symbol table specified by the *some_hash typeglob. This |
---|
134 | is a somewhat tricky way of passing around references cheaply |
---|
135 | when you won't want to have to remember to dereference variables |
---|
136 | explicitly. |
---|
137 | |
---|
138 | Another use of symbol tables is for making "constant" scalars. |
---|
139 | |
---|
140 | *PI = \3.14159265358979; |
---|
141 | |
---|
142 | Now you cannot alter $PI, which is probably a good thing all in all. |
---|
143 | This isn't the same as a constant subroutine, which is subject to |
---|
144 | optimization at compile-time. This isn't. A constant subroutine is one |
---|
145 | prototyped to take no arguments and to return a constant expression. |
---|
146 | See L<perlsub> for details on these. The C<use constant> pragma is a |
---|
147 | convenient shorthand for these. |
---|
148 | |
---|
149 | You can say C<*foo{PACKAGE}> and C<*foo{NAME}> to find out what name and |
---|
150 | package the *foo symbol table entry comes from. This may be useful |
---|
151 | in a subroutine that gets passed typeglobs as arguments: |
---|
152 | |
---|
153 | sub identify_typeglob { |
---|
154 | my $glob = shift; |
---|
155 | print 'You gave me ', *{$glob}{PACKAGE}, '::', *{$glob}{NAME}, "\n"; |
---|
156 | } |
---|
157 | identify_typeglob *foo; |
---|
158 | identify_typeglob *bar::baz; |
---|
159 | |
---|
160 | This prints |
---|
161 | |
---|
162 | You gave me main::foo |
---|
163 | You gave me bar::baz |
---|
164 | |
---|
165 | The C<*foo{THING}> notation can also be used to obtain references to the |
---|
166 | individual elements of *foo, see L<perlref>. |
---|
167 | |
---|
168 | Subroutine definitions (and declarations, for that matter) need |
---|
169 | not necessarily be situated in the package whose symbol table they |
---|
170 | occupy. You can define a subroutine outside its package by |
---|
171 | explicitly qualifying the name of the subroutine: |
---|
172 | |
---|
173 | package main; |
---|
174 | sub Some_package::foo { ... } # &foo defined in Some_package |
---|
175 | |
---|
176 | This is just a shorthand for a typeglob assignment at compile time: |
---|
177 | |
---|
178 | BEGIN { *Some_package::foo = sub { ... } } |
---|
179 | |
---|
180 | and is I<not> the same as writing: |
---|
181 | |
---|
182 | { |
---|
183 | package Some_package; |
---|
184 | sub foo { ... } |
---|
185 | } |
---|
186 | |
---|
187 | In the first two versions, the body of the subroutine is |
---|
188 | lexically in the main package, I<not> in Some_package. So |
---|
189 | something like this: |
---|
190 | |
---|
191 | package main; |
---|
192 | |
---|
193 | $Some_package::name = "fred"; |
---|
194 | $main::name = "barney"; |
---|
195 | |
---|
196 | sub Some_package::foo { |
---|
197 | print "in ", __PACKAGE__, ": \$name is '$name'\n"; |
---|
198 | } |
---|
199 | |
---|
200 | Some_package::foo(); |
---|
201 | |
---|
202 | prints: |
---|
203 | |
---|
204 | in main: $name is 'barney' |
---|
205 | |
---|
206 | rather than: |
---|
207 | |
---|
208 | in Some_package: $name is 'fred' |
---|
209 | |
---|
210 | This also has implications for the use of the SUPER:: qualifier |
---|
211 | (see L<perlobj>). |
---|
212 | |
---|
213 | =head2 Package Constructors and Destructors |
---|
214 | |
---|
215 | Four special subroutines act as package constructors and destructors. |
---|
216 | These are the C<BEGIN>, C<CHECK>, C<INIT>, and C<END> routines. The |
---|
217 | C<sub> is optional for these routines. |
---|
218 | |
---|
219 | A C<BEGIN> subroutine is executed as soon as possible, that is, the moment |
---|
220 | it is completely defined, even before the rest of the containing file |
---|
221 | is parsed. You may have multiple C<BEGIN> blocks within a file--they |
---|
222 | will execute in order of definition. Because a C<BEGIN> block executes |
---|
223 | immediately, it can pull in definitions of subroutines and such from other |
---|
224 | files in time to be visible to the rest of the file. Once a C<BEGIN> |
---|
225 | has run, it is immediately undefined and any code it used is returned to |
---|
226 | Perl's memory pool. This means you can't ever explicitly call a C<BEGIN>. |
---|
227 | |
---|
228 | An C<END> subroutine is executed as late as possible, that is, after |
---|
229 | perl has finished running the program and just before the interpreter |
---|
230 | is being exited, even if it is exiting as a result of a die() function. |
---|
231 | (But not if it's polymorphing into another program via C<exec>, or |
---|
232 | being blown out of the water by a signal--you have to trap that yourself |
---|
233 | (if you can).) You may have multiple C<END> blocks within a file--they |
---|
234 | will execute in reverse order of definition; that is: last in, first |
---|
235 | out (LIFO). C<END> blocks are not executed when you run perl with the |
---|
236 | C<-c> switch. |
---|
237 | |
---|
238 | Inside an C<END> subroutine, C<$?> contains the value that the program is |
---|
239 | going to pass to C<exit()>. You can modify C<$?> to change the exit |
---|
240 | value of the program. Beware of changing C<$?> by accident (e.g. by |
---|
241 | running something via C<system>). |
---|
242 | |
---|
243 | Similar to C<BEGIN> blocks, C<INIT> blocks are run just before the |
---|
244 | Perl runtime begins execution, in "first in, first out" (FIFO) order. |
---|
245 | For example, the code generators documented in L<perlcc> make use of |
---|
246 | C<INIT> blocks to initialize and resolve pointers to XSUBs. |
---|
247 | |
---|
248 | Similar to C<END> blocks, C<CHECK> blocks are run just after the |
---|
249 | Perl compile phase ends and before the run time begins, in |
---|
250 | LIFO order. C<CHECK> blocks are again useful in the Perl compiler |
---|
251 | suite to save the compiled state of the program. |
---|
252 | |
---|
253 | When you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and |
---|
254 | C<END> work just as they do in B<awk>, as a degenerate case. As currently |
---|
255 | implemented (and subject to change, since its inconvenient at best), |
---|
256 | both C<BEGIN> and<END> blocks are run when you use the B<-c> switch |
---|
257 | for a compile-only syntax check, although your main code is not. |
---|
258 | |
---|
259 | =head2 Perl Classes |
---|
260 | |
---|
261 | There is no special class syntax in Perl, but a package may act |
---|
262 | as a class if it provides subroutines to act as methods. Such a |
---|
263 | package may also derive some of its methods from another class (package) |
---|
264 | by listing the other package name(s) in its global @ISA array (which |
---|
265 | must be a package global, not a lexical). |
---|
266 | |
---|
267 | For more on this, see L<perltoot> and L<perlobj>. |
---|
268 | |
---|
269 | =head2 Perl Modules |
---|
270 | |
---|
271 | A module is just a set of related function in a library file a Perl |
---|
272 | package with the same name as the file. It is specifically designed |
---|
273 | to be reusable by other modules or programs. It may do this by |
---|
274 | providing a mechanism for exporting some of its symbols into the |
---|
275 | symbol table of any package using it. Or it may function as a class |
---|
276 | definition and make its semantics available implicitly through |
---|
277 | method calls on the class and its objects, without explicitly |
---|
278 | exportating anything. Or it can do a little of both. |
---|
279 | |
---|
280 | For example, to start a traditional, non-OO module called Some::Module, |
---|
281 | create a file called F<Some/Module.pm> and start with this template: |
---|
282 | |
---|
283 | package Some::Module; # assumes Some/Module.pm |
---|
284 | |
---|
285 | use strict; |
---|
286 | use warnings; |
---|
287 | |
---|
288 | BEGIN { |
---|
289 | use Exporter (); |
---|
290 | our ($VERSION, @ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS); |
---|
291 | |
---|
292 | # set the version for version checking |
---|
293 | $VERSION = 1.00; |
---|
294 | # if using RCS/CVS, this may be preferred |
---|
295 | $VERSION = do { my @r = (q$Revision: 1.1.1.2 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r }; # must be all one line, for MakeMaker |
---|
296 | |
---|
297 | @ISA = qw(Exporter); |
---|
298 | @EXPORT = qw(&func1 &func2 &func4); |
---|
299 | %EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ], |
---|
300 | |
---|
301 | # your exported package globals go here, |
---|
302 | # as well as any optionally exported functions |
---|
303 | @EXPORT_OK = qw($Var1 %Hashit &func3); |
---|
304 | } |
---|
305 | our @EXPORT_OK; |
---|
306 | |
---|
307 | # non-exported package globals go here |
---|
308 | our @more; |
---|
309 | our $stuff; |
---|
310 | |
---|
311 | # initialize package globals, first exported ones |
---|
312 | $Var1 = ''; |
---|
313 | %Hashit = (); |
---|
314 | |
---|
315 | # then the others (which are still accessible as $Some::Module::stuff) |
---|
316 | $stuff = ''; |
---|
317 | @more = (); |
---|
318 | |
---|
319 | # all file-scoped lexicals must be created before |
---|
320 | # the functions below that use them. |
---|
321 | |
---|
322 | # file-private lexicals go here |
---|
323 | my $priv_var = ''; |
---|
324 | my %secret_hash = (); |
---|
325 | |
---|
326 | # here's a file-private function as a closure, |
---|
327 | # callable as &$priv_func; it cannot be prototyped. |
---|
328 | my $priv_func = sub { |
---|
329 | # stuff goes here. |
---|
330 | }; |
---|
331 | |
---|
332 | # make all your functions, whether exported or not; |
---|
333 | # remember to put something interesting in the {} stubs |
---|
334 | sub func1 {} # no prototype |
---|
335 | sub func2() {} # proto'd void |
---|
336 | sub func3($$) {} # proto'd to 2 scalars |
---|
337 | |
---|
338 | # this one isn't exported, but could be called! |
---|
339 | sub func4(\%) {} # proto'd to 1 hash ref |
---|
340 | |
---|
341 | END { } # module clean-up code here (global destructor) |
---|
342 | |
---|
343 | ## YOUR CODE GOES HERE |
---|
344 | |
---|
345 | 1; # don't forget to return a true value from the file |
---|
346 | |
---|
347 | Then go on to declare and use your variables in functions without |
---|
348 | any qualifications. See L<Exporter> and the L<perlmodlib> for |
---|
349 | details on mechanics and style issues in module creation. |
---|
350 | |
---|
351 | Perl modules are included into your program by saying |
---|
352 | |
---|
353 | use Module; |
---|
354 | |
---|
355 | or |
---|
356 | |
---|
357 | use Module LIST; |
---|
358 | |
---|
359 | This is exactly equivalent to |
---|
360 | |
---|
361 | BEGIN { require Module; import Module; } |
---|
362 | |
---|
363 | or |
---|
364 | |
---|
365 | BEGIN { require Module; import Module LIST; } |
---|
366 | |
---|
367 | As a special case |
---|
368 | |
---|
369 | use Module (); |
---|
370 | |
---|
371 | is exactly equivalent to |
---|
372 | |
---|
373 | BEGIN { require Module; } |
---|
374 | |
---|
375 | All Perl module files have the extension F<.pm>. The C<use> operator |
---|
376 | assumes this so you don't have to spell out "F<Module.pm>" in quotes. |
---|
377 | This also helps to differentiate new modules from old F<.pl> and |
---|
378 | F<.ph> files. Module names are also capitalized unless they're |
---|
379 | functioning as pragmas; pragmas are in effect compiler directives, |
---|
380 | and are sometimes called "pragmatic modules" (or even "pragmata" |
---|
381 | if you're a classicist). |
---|
382 | |
---|
383 | The two statements: |
---|
384 | |
---|
385 | require SomeModule; |
---|
386 | require "SomeModule.pm"; |
---|
387 | |
---|
388 | differ from each other in two ways. In the first case, any double |
---|
389 | colons in the module name, such as C<Some::Module>, are translated |
---|
390 | into your system's directory separator, usually "/". The second |
---|
391 | case does not, and would have to be specified literally. The other |
---|
392 | difference is that seeing the first C<require> clues in the compiler |
---|
393 | that uses of indirect object notation involving "SomeModule", as |
---|
394 | in C<$ob = purge SomeModule>, are method calls, not function calls. |
---|
395 | (Yes, this really can make a difference.) |
---|
396 | |
---|
397 | Because the C<use> statement implies a C<BEGIN> block, the importing |
---|
398 | of semantics happens as soon as the C<use> statement is compiled, |
---|
399 | before the rest of the file is compiled. This is how it is able |
---|
400 | to function as a pragma mechanism, and also how modules are able to |
---|
401 | declare subroutines that are then visible as list or unary operators for |
---|
402 | the rest of the current file. This will not work if you use C<require> |
---|
403 | instead of C<use>. With C<require> you can get into this problem: |
---|
404 | |
---|
405 | require Cwd; # make Cwd:: accessible |
---|
406 | $here = Cwd::getcwd(); |
---|
407 | |
---|
408 | use Cwd; # import names from Cwd:: |
---|
409 | $here = getcwd(); |
---|
410 | |
---|
411 | require Cwd; # make Cwd:: accessible |
---|
412 | $here = getcwd(); # oops! no main::getcwd() |
---|
413 | |
---|
414 | In general, C<use Module ()> is recommended over C<require Module>, |
---|
415 | because it determines module availability at compile time, not in the |
---|
416 | middle of your program's execution. An exception would be if two modules |
---|
417 | each tried to C<use> each other, and each also called a function from |
---|
418 | that other module. In that case, it's easy to use C<require>s instead. |
---|
419 | |
---|
420 | Perl packages may be nested inside other package names, so we can have |
---|
421 | package names containing C<::>. But if we used that package name |
---|
422 | directly as a filename it would makes for unwieldy or impossible |
---|
423 | filenames on some systems. Therefore, if a module's name is, say, |
---|
424 | C<Text::Soundex>, then its definition is actually found in the library |
---|
425 | file F<Text/Soundex.pm>. |
---|
426 | |
---|
427 | Perl modules always have a F<.pm> file, but there may also be |
---|
428 | dynamically linked executables (often ending in F<.so>) or autoloaded |
---|
429 | subroutine definitions (often ending in F<.al> associated with the |
---|
430 | module. If so, these will be entirely transparent to the user of |
---|
431 | the module. It is the responsibility of the F<.pm> file to load |
---|
432 | (or arrange to autoload) any additional functionality. For example, |
---|
433 | although the POSIX module happens to do both dynamic loading and |
---|
434 | autoloading, but the user can say just C<use POSIX> to get it all. |
---|
435 | |
---|
436 | =head1 SEE ALSO |
---|
437 | |
---|
438 | See L<perlmodlib> for general style issues related to building Perl |
---|
439 | modules and classes, as well as descriptions of the standard library |
---|
440 | and CPAN, L<Exporter> for how Perl's standard import/export mechanism |
---|
441 | works, L<perltoot> and L<perltootc> for an in-depth tutorial on |
---|
442 | creating classes, L<perlobj> for a hard-core reference document on |
---|
443 | objects, L<perlsub> for an explanation of functions and scoping, |
---|
444 | and L<perlxstut> and L<perlguts> for more information on writing |
---|
445 | extension modules. |
---|