website/programming/perl5essentialtraining/fileio.html

<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>My Learning Website</title>
<link href="/styles/styles.css" rel="stylesheet" type="text/css">	
<link href="/programming/styles/styles.css" rel="stylesheet" type="text/css">
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
      <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
</head>
<body>
	
<div class="banner">
	<h1 class="courselink">Perl 5 Essential Traning</h1>
	<h2 class="lecturer">LinkedIn Learning : Bill Weinman</h2>
	<h2 class="episodetitle">File IO</h2>	
</div>
	
<article>
	<h2 class="sectiontitle">Understanding Streams and Files</h2>
	<p>We tend to see a file as a block of data on some kind of media such as a hard drive or an SD card.  An operating system would tend to view a file as a stream of data and this allows the system to buffer the data and to provide it to client processes in manageable chunks.</p>
	<p>One consequence of this, is that really anything that provides a stream of data can be thought of as being a file.  This includes a number of things that we might not normally consider to be a file and these are shown in figure 124.</p>
	<img src="images/image5.png" alt="Diferent things that are treated as files">
	<p class =>Figure 124 - different things that the Operating System will treat as a file</p>
	<p>So, Perl handles these things in ways that are similar or even indistinguishable.  From the perspective of code, this means that they all work in pretty much the same way.  It is also important to remember that in this course, and particularly in this chapter, the term file will generally refer to one of these things as an input stream.</p>
	<h2 class="sectiontitle">Using File Handles</h2>
	<p>A file handle is pretty similar to a reference and it points to a file.  This is demonstrated in figure 125.</p>
	<pre class="inset">
		1.	#!/usr/bin/perl
		2.	# filehandle.pl by Bill Weinman &lt;http://bw.org/contact/&gt;
		3.	 
		4.	use 5.28.0;
		5.	use warnings;
		6.	 
		7.	my $filename = 'lines.txt';
		8.	 
		9.	open(my $fh, '&lt;', $filename) or die "Cannot open file: $!";
		10.	print while &lt;$fh&gt;;
		11.	close $fh;</pre>
	<p class="caption">Figure 125 - filehandle.pl, a simple demonstration of the use of a file handle</p>
	<p>On line 7, we have a scalar variable which represents a filename.  On line 8, we use the built-in function to open the file and this takes three arguments as follows:</p>
	<pre class="inset">
		$fh			This is the file handle
		'&lt;'		This is the mode for opening the file, the left bracket is used to open the file to read and we would use the right bracket, '&gt;', if we wanted to open it to write to it
		$filename	This is the name of the file we want to open</pre>
	<p>Note that die is another built-in function that will print an error message and exit the program if the file can’t be opened.  So, the result is that we will either open the file and continue to execute the code or the program will terminate with the error message.</p>
	<p>To demonstrate the latter first, if we change the name of the filename on line 7 to, say, lines2.txt which doesn’t exist, when we run the code, we see the following output.</p>
	<p class="inset">Cannot open file: No such file or directory at filehandle.pl line 9.</p>
	<p>Here, we have the string from line 9 output, including the system error message which is represented by $! in the code.</p>
	<p>Assuming the file exists and there are no errors, the while loop on line 10 outputs each line of the file in turn.  Note that it is using the file handle to do that so the filehandle is acting as a reference to the file.  The brackets around the file handle cause it to be treated as a list so every time it is read, it returns a single line and this is what the while loop is doing.</p>
	<p>On line 11, we use the file handle again, this time with the built-in function, close, which closes the file.</p>
	<p>We can combine the last two parameters of the open function into a single string so this would look like</p>
	<p class="inset">1.	open(my $fh, “&lt; $filename”) or die "Cannot open file: $!";</p>
	<p>Of course, we need to use double-quotes in to access string interpolation to convert the variable name to the actual name of the file.  This might be a tricky error to spot but if you use single quotes, the filename will be interpreted as $filename rather than lines.txt and we will get the error message again</p>
	<p class="inset">Cannot open file: No such file or directory at filehandle.pl line 9.</p>
	<p>If we use double quotes, it works correctly and we see exactly the same output that we saw before.</p>
	<p>Another interesting point is that each line is output using the print function.  You may recall that the difference between print and say is that say tags a new line character on the end of the string so each line of output is displayed on a new line.</p>
	<p>The print function doesn’t do that.  However, since we are reading these lines from a file, each line of the file has its own new line character, so we don’t need to provide this and so print gives us the desired neat output</p>
	<pre class="inset">
		01: This is line 1
		02: This is line 2
		03: This is line 3
		04: This is line 4
		05: This is line 5
		06: This is line 6
		07: This is line 7
		08: This is line 8
		09: This is line 9
		10: This is line 10</pre>
	<p class="caption">Figure 126 - the output we get when printing a file line by line with the print function</p>
	<p>This also means that if we had used say instead of print, we would get an extra new line character for each line and so the output would look like this</p>
	<pre class="inset">
		01: This is line 1

		02: This is line 2

		03: This is line 3

		04: This is line 4

		05: This is line 5

		06: This is line 6

		07: This is line 7

		08: This is line 8

		09: This is line 9

		10: This is line 10</pre>
	<p class="caption">Figure 127  - the output we get when printing a file line by line with the say function</p>
	<p>So, we can use print and we will get the neat sort of output we normally use say to achieve.  Normally, of course, the difference between the two is that we might use print when we don’t want a line feed at the end of every line we send to output.  When we are reading from a text file, we can still use print in that way, but it takes a little extra work.</p>
	<p>First of all, we will change the while loop on line 10 to</p>
	<p class="inset">while (my $line = &lt;$fh&gt; )  {</p>
	<p>Here, we want to do some processing on each individual line of text so each line that is read is assigned to the variable, $line.  Also, note that since we are doing a little bit more inside the while loop, the body of the loop is enclosed in curly braces.</p>
	<p>Inside the loop, we are going to remove the linefeed character at the end of every line and we do that with the chomp function.</p>
	<p class="inset">chomp $line</p>
	<p>The chomp function is quite clever.  It can handle files on different platforms.  What I mean by this is that Linux files (that is, text files at least) normally have a new line character at the end of a line.  Windows uses an older system whereby there are two characters at the end of each line, a carriage return character and a line feed character which, combined, have the same effect as a new line character.</p>
	<p>In either scenario, chomp will remove the new line character or the carriage return/line feed characters and if there is no such character at the end of the line, it will not remove anything.</p>
	<p>Now, having removed the new line character, we can use say for the output</p>
	<p class="inset">say $line;</p>
	<p>which will give us the same output that we saw in figure 126.  If use the print function in this scenario, our output will be</p>
	<p class="inset">01: This is line 102: This is line 203: This is line 304: This is line 405: This is line 506: This is line 607: This is line 708: This is line 809: This is line 910: This is line 10</p>
	<p>Going back to line 9</p>
	<p class="inset">open(my $fh, "&lt; $filename") or die "Cannot open file: $!";</p>
	<p>I mentioned the fact that the left bracket is used to open a file for read.  The line</p>
	<p class="inset">open(my $fh, "&gt; $filename") or die "Cannot open file: $!";</p>
	<p>would open a file for the purpose of writing to it.  This will create the file if it doesn’t already exist but it is important to remember that if the file does already exist, this would overwrite that existing file.</p>
	<p class="inset">open(my $fh, "&gt;&gt; $filename") or die "Cannot open file: $!";</p>
	<p>This is similar to the single right bracket so, again, the file is being opened to write, but in this case, if the file already exists, any data we write to it is appended to the end of the file.</p>
	<p>If we add a plus sign to the left bracket</p>
	<p class="inset">open(my $fh, "+&lt; $filename") or die "Cannot open file: $!";</p>
	<p>this will open a file to read and write but it won’t overwrite anything.  It will simply allow data to be written to the end of the file.</p>
	<p>Simlarly, if we add the plus sign to the right bracket</p>
	<p class="inset">open(my $fh, "+&gt; $filename") or die "Cannot open file: $!";</p>
	<p>this also opens a file for read and write, but if the file already exists, any data previously in the file will be overwritten.</p>
	<p>In figure 128, we have another version of filehandle.pl which is similar to the version we created to use the chomp function</p>
	<pre class="inset">
		1.	#!/usr/bin/perl
		2.	# filehandle.pl by Bill Weinman &lt;http://bw.org/contact/&gt;
		3.	 
		4.	use 5.28.0;
		5.	use warnings;
		6.	 
		7.	my $filename = 'lines.txt';
		8.	 
		9.	open(FH, "&lt; $filename") or die "Cannot open file: $!";
		10.	while ( my $line = &lt;FH&gt; ) {
		11.	        chomp $line;
		12.	        say $line;
		13.	}
		14.	close FH;</pre>
	<p class="caption">Figure 128 - filehandle.pl using a different type of variable for the file handle</p>
	<p>This version of the code is not too different from previous versions, but we are using a special variable, FH, for the file handle.</p>
	<p>The most interesting thing about this version is that the variable name looks like a bare word and we haven’t used the keyword, my.  You might recall that my creates a scope for the variable and FH actually defaults to a global variable which is why we don’t use my with it.</p>
	<p>This is an older Perl construct, mentioned here because you may see it in older code.  Having a global variable can cause some problems so it is better to create your file handle with a scoped variable, which is what we have been doing.  This would work perfectly fine as well, it’s just not recommended but it may be useful to know about it because you may come across it.</p>
	<p>I’ll change these back to a scoped variable and now we will open a second file to write to.  Before we do that, we will give the filename a variable reference with</p>
	<p class="inset">my $outfile = 'newfile.txt';</p>
	<p>We want to write to this file so we will open it with the right bracket and we will call it’s file handle $fh2 (we also change the name of the existing file handle from $fh to $fh1).</p>
	<p class="inset">open(my $fh2, "&gt; $outfile") or die "Cannot open output file: $!";</p>
	<p>Notice that we have given this one a slightly different error message if the file can’t be opened so if there is a problem, we will be able to see which of the two files we have the problem with.</p>
	<p>It may be worth noting that it is less likely that we would see a problem here.  If we try to open a file to read and it doesn’t exist, that’s probably the most common reason for not being able to open the file.  If we are opening it to write, it doesn’t matter and in fact, may be preferable, if the file doesn’t already exist so this won’t give an error.  In fact, aside from something like a disk or a directory not being accessible (for example if we try to open a file for write purposes in a directory where we don’t have write permissions), it is difficult to think of a straightforward or common scenario where this would give an error.</p>
	<p>The syntax inside the loop here might look a little confusing and this is something that you can do with either say or print</p>
	<p class="inset">print $fh2 $line;</p>
	<p>Here, we are telling print to send its output to file handle 2 and we are also telling it what to print (in this case $line).  So, in essence, we are passing two arguments to the function, but we are not separating them with a comma.</p>
	<p>We can go ahead and run this and the output will go to the file we are writing to rather than the screen.  Since nothing is being sent to the screen, it can be reassuring to see a confirmation that the file did execute so we can add the line</p>
	<p class="inset">say ‘Done!’;</p>
	<p>This line of code isn’t reading or writing to the files so we can do this before or after the files are closed.  The completed version of this code, filehandle2.pl, is shown in figure 129.</p>
	<pre class="inset">
		1.	#!/usr/bin/perl
		2.	# filehandle.pl by Bill Weinman &lt;http://bw.org/contact/&gt;
		3.	 
		4.	use 5.28.0;
		5.	use warnings;
		6.	 
		7.	my $filename = 'lines.txt';
		8.	my $outfile = 'newfile.txt';
		9.	 
		10.	open(my $fh1, "&lt; $filename") or die "Cannot open file: $!";
		11.	open(my $fh2, "&gt; $outfile") or die "Cannot open file: $!";
		12.	        while ( my $line = &lt;$fh1&gt; ) {
		13.	        print $fh2 $line;
		14.	}
		15.	say "Done!";
		16.	 
		17.	close $fh1;
		18.	close $fh2;</pre>
	<p class="caption">Figure 129 - filehandle.pl which demonstrates both reading from and writing to a file</p>
	<p>As a point of interest, I noticed that the course video mentions that both say and print can use the syntax shown on line 13 where we don’t have a comma between the arguments.  However, if we put a comma in, the output we get (bearing in mind that only line 15 should output anything to the screen) is</p>
	<pre class="inset">
		GLOB(0x2219a20)01: This is line 1
		GLOB(0x2219a20)02: This is line 2
		GLOB(0x2219a20)03: This is line 3
		GLOB(0x2219a20)04: This is line 4
		GLOB(0x2219a20)05: This is line 5
		GLOB(0x2219a20)06: This is line 6
		GLOB(0x2219a20)07: This is line 7
		GLOB(0x2219a20)08: This is line 8
		GLOB(0x2219a20)09: This is line 9
		GLOB(0x2219a20)10: This is line 10
		Done!</pre>
	<p>I am assuming that GLOB(0x2219a20) is a reference to the file handle but the output here is going to the screen only, not to the file.  In fact, if we check newfile.txt after running the code with</p>
	<p class="inset">print $fh2, $line;</p>
	<p>we will see that it is empty and this would be true whether it already contained some text or not.  This is because we have opened it to write something to it using just a single right bracket which means that we will overwrite anything in the file.  We have then sent the output to the screen rather than the file, so we are not writing anything to it, therefore it is an empty file.</p>
	<p>This is something to be careful of because you might expect (as I did) that if you do open a file in this way, but don’t write anything to it, you will not overwrite the file but this is obviously not true.  To confirm this, I have corrected the code so that the output is written to it and then confirmed that newfile.txt has the same contents as lines.txt.</p>
	<p>I then commented out the print statement in the while loop so that the code just opens the file, prints ‘Done!’ and then closes the file.  I then looked at newfile.txt again and it was empty.  So this is something you may need to be careful with if you are working with files that already have some data in them.  Simply opening the file for write is enough to clear out the current contents.</p>
	<p>In general, working with files using the file handles is pretty simple.  Perl also has an object-oriented interface for accessing files and we will look at that next.</p>
	<h2 class="sectiontitle">Using the OO Interface for Files</h2>
	<p>The object-oriented interface is the most common way of handling files in Perl and it uses the IO::File module to do that.  This is the first time we have seen a module in our code and we will cover them in chapter 14.  For the time being, the code in figure 130 demonstrates this.</p>
	<pre class="inset">
		1.	#!/usr/bin/perl
		2.	# iofile.pl by Bill Weinman &lt;http://bw.org/contact/&gt;
		3.	 
		4.	use 5.28.0;
		5.	use warnings;
		6.	use IO::File;
		7.	 
		8.	my $filename = 'lines.txt';
		9.	 
		10.	my $file = IO::File-&gt;new("&lt; $filename") or die "Cannot open file: $!";
		11.	print while &lt;$file&gt;;
		12.	$file-&gt;close();</pre>
	<p class="caption">Figure 130 - iofile.pl which demonstrates file handling via the object-oriented interface</p>
	<p>The code, here is opening up the file, lines.txt, outputting each line to the screen, one at a time and then closing the file.  So, it does exactly the same thing that the code in figure 125 does, but without using file handles.</p>
	<p>The first thing to note is line 6.</p>
	<p class="inset">use IO::File;</p>
	<p>This is how we import a module in Perl.</p>
	<p>As before, we have created a variable to hold the filename on line 8.  On line 10, we are calling the method constructor to create an object of type file and assigning it to the variable, $file.  We are passing a string to the constructor which contains the access mode (in this example, read) and the name of the file.  Note that the file is opened when it is created so there is no need to explicitly open it.  Actually, although we have a close statement on line 12, this is also not needed because the file object only exists while it is in scope (the file object is a scoped variable) so when the file is closed, the object is discarded in any case.</p>
	<p>We can use this type of syntax to copy of a file as well and this code is shown in figure 131.</p>
	<pre class="inset">
		1.	#!/usr/bin/perl
		2.	# iofile.pl by Bill Weinman &lt;http://bw.org/contact/&gt;
		3.	 
		4.	use 5.28.0;
		5.	use warnings;
		6.	use IO::File;
		7.	 
		8.	my $filename = 'lines.txt';
		9.	my $outfile = 'newfile.txt';
		10.	 
		11.	my $file1 = IO::File-&gt;new("&lt; $filename") or die "Cannot open file: $!";
		12.	my $file2 = IO::File-&gt;new("&gt; $outfile") or die "Cannot open output file: $!";
		13.	 
		14.	while ( my $line = $file1-&gt;getline() )  {
		15.	        $file2-&gt;print($line);
		16.	}
		17.	 
		18.	say "Done!";</pre>
	<p class="caption">Figure 131 - iofile2.pl which uses the object-oriented interface to copy a file</p>
	<p>This is similar to the code shown in figure 129 and it does exactly the same thing.  We declare two variables to hold our filenames (lines 8 and 9).  The first file is opened for read on line 11 and the second for write on line 12, in both cases, using object-oriented syntax.</p>
	<p>The while loop is similar to the loop we saw in figure 129 and it does the same thing using object-oriented syntax.  In the condition on line 14, we have a file object ($file1) and we are using it’s getline method to retrieve lines one at a time, assigning it to the $line variable and the loop continues while there are lines to be read from the file.</p>
	<p>Inside the loop, we have the second file object ($file2) and we use the file’s print method to write the line to the file.</p>
	<p>Note that we haven’t taken the step of closing the files which is unnecessary since this happens automatically when the variable pointing to the file is no longer in scope and we don’t get any output from the while loop so we conclude the program by outputting Done! when the loop has terminated.</p>
	<p>The code in figure 131 is probably a bit more verbose than the code in figure 129.  The trade-off is that the object-oriented code in figure 131 is more explicit.  For instance, it is clearer from scanning the code (visually) that the file is being read in line by line and that it is being written, line by line, to the second file.</p>
	<p>There is documentation relating to IO:File on Perldoc at <a href="https://perldoc.perl.org/IO::File">https://perldoc.perl.org/IO::File</a>. There’s not a huge amount of info here, we have details of the constructor and the open method but not much else.</p>
	<p>However, under Description, we can see that IO::File inherits from IO::Handle and IO::Seekable and the documentation for these can be found at <a href="https://perldoc.perl.org/IO::Handle">https://perldoc.perl.org/IO::Handle</a> and <a href="https://perldoc.perl.org/IO::Seekable">https://perldoc.perl.org/IO::Seekable</a>.</p>
	<p>Under the documentation for IO::Handle we have details of the getline and print functions amongst others.  The reason for this kind of separation is that IO::File is used for files in the conventional sense, that is files on a disk drive for example, but as was mentioned previously, Perl treats a lot of different things as though they were files so the methods used (such as print and getline) apply to more than just this type of file.</p>
	<h2 class="sectiontitle">Working with Binary Files</h2>
	<p>Linux, in common with other Unix-based operating systems does not make a distinction between binary and text-based files, unlike, for example, Windows which does.</p>
	<p>The code shown in figure 132 copies an image file (binary).  We will run this on Windows to see what happens when we do that on a system that makes a distinction between binary and non-binary files and also on Linux to see what happens on a system that doesn’t.</p>
	<pre class="inset">
		1.	#!/usr/bin/perl
		2.	# copyfile.pl by Bill Weinman &lt;http://bw.org/contact/&gt;
		3.	 
		4.	use 5.18.0;
		5.	use warnings;
		6.	use IO::File;
		7.	 
		8.	my $fn1 = 'train-station.jpg';
		9.	my $fn2 = 'copy.jpg';
		10.	 
		11.	my $file1 = IO::File-&gt;new("&lt; $fn1") or die "Cannot open file: $!";
		12.	my $file2 = IO::File-&gt;new("&gt; $fn2") or die "Cannot open output file: $!";
		13.	 
		14.	my $buffer;
		15.	while (my $len = $file1-&gt;read($buffer, 102400)) {
		16.	    $file2-&gt;print($buffer);
		17.	}
		18.	 
		19.	say "Done.";</pre>
	<p class="caption">Figure 132 - copy.pl which demonstrates copying a binary file.</p>
	<p>Here, we have two file references pointing to an input file (train-station.jpg) and an output file (copy.jpg).  Both these files are opened on line 11, where train-station.jpg is opened for read and line 12, where copy.jpg is opened for write.</p>
	<p>The while loop looks a little strange, but essentially, as long as there is info left in the file, it is read into a buffer and when it is full, this is written to the output file.</p>
	<p>We can open the original file (train-station.jpg) which, unsurprisingly, is a photo of a train station.  It is worth noting that the size of this file is 356,020 bytes.  When we run the program, the size of the copied file is 357,375 bytes so it is a little larger.</p>
	<p>If we try to open copy.jpg, we can see that this file appears to be corrupted, see figure 133.</p>
	<img src="images/image6.png" alt="Trying to view the copied file">
	<p class="caption">Figure 133 - the result of trying to view the copy we made (on Windows) of train-station.jpg</p>
	<p>Compare this with the copied file (shown in figure 134) if we run this on Linux.</p>
	<img src="images/image7.jpg" alt="Photo of a train station in Bangkok">
	<p class="caption">Figure 134 - copy.jpg which has been copied on a Linux system</p>
	<p>Actually, the image shown in figure 134 is the original file to save me the trouble of copying a file from my Raspberry Pi to my Windows machine, but when you view copy.jpg on the Linux system, it looks exactly the same.</p>
	<p>The obvious question is, why is there a difference and there is a clue in the larger file size of copy.jpg on the Windows system (on a Linux based system, copy.jpg is an exact copy of train-station.jpg so the file size is obviously the same).</p>
	<p>Recall that Windows makes a distinction between binary and text files.  In Windows, each line in a text file is terminated by a carriage return and a line feed.  When we copied the file in Windows, every time a line feed is encountered, Windows translates this to a line feed and a carriage return so it’s adding a byte and this is why we end up with a slightly larger file size.</p>
	<p>This is also, of course, corrupting the file so this is why we see an error when we try to view it.</p>
	<p>We can fix this in Windows by enabling binary mode on the files before we copy them (we also do this after the files have been opened.  Going back to the code shown in figure 132, this means that we will be placing these commands between lines 12 and 14, giving us a revised version of the code which we can see in figure 135.</p>
	<pre class="inset">
		1.	#!/usr/bin/perl
		2.	# copyfile.pl by Bill Weinman &lt;http://bw.org/contact/&gt;
		3.	 
		4.	use 5.18.0;
		5.	use warnings;
		6.	use IO::File;
		7.	 
		8.	my $fn1 = 'train-station.jpg';
		9.	my $fn2 = 'copy.jpg';
		10.	 
		11.	my $file1 = IO::File-&gt;new("&lt; $fn1") or die "Cannot open file: $!";
		12.	my $file2 = IO::File-&gt;new("&gt; $fn2") or die "Cannot open output file: $!";
		13.	 
		14.	$file1-&gt;binmode;
		15.	$file2-&gt;binmode;
		16.	 
		17.	my $buffer;
		18.	while (my $len = $file1-&gt;read($buffer, 102400)) {
		19.	    $file2-&gt;print($buffer);
		20.	}
		21.	 
		22.	say "Done.";</pre>
	<p class="caption">Figure 135 - the revised version of copyfile.pl which uses bimary mode to allow our image file to be correctly copied in Windows</p>
	<p>When we run this, we can check the file sizes again and we will see that copy.jpg is the same size as train-station.jpg.  We can also open it and it opens correctly, showing the image of the train station.</p>
	<p>So, when writing code like this, we need to remember to switch on binary mode if we want the code to work correctly on a Windows system.  However, if we are running the code on a system that doesn’t make that distinction between binary and text files (in other words, Unix, Linux or MacOS and so on), the code has no effect.  As such, it is probably better to include it so that if the code is ported over to a Windows system (or, indeed, any system where this distinction is being made) it is still going to work.</p>
</article>
	
<div class="btngroup">
	<button class="button" onclick="window.location.href='references.html';">
			Previous Chapter - References and Structures
	</button>
	<button class="button" onclick="window.location.href='builtin.html';">
			Next Chapter - Built-In Functions
	</button>
	<button class="button" onclick="window.location.href='perl5essentialtraining.html'">
			Course Contents 
	</button>
	<button class="button" onclick="window.location.href='/programming/programming.html'">
			Programming Page
	</button>
	<button class="button" onclick="window.location.href='/index.html'">
			Home
	</button>
</div>
	
</body>
</html>