Monday, 11 December 2017

linux - How to count lines in a document?

wc -l does not count
lines.


Yes, this answer may be a bit late to the party,
but I haven't found anyone document a more robust solution in the answers
yet.


Contrary to popular belief, POSIX does not require
files to end with a newline character at all. Yes, the definition of a href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206"
rel="nofollow noreferrer">POSIX 3.206 Line is as
follows:



A
sequence of zero or more non- characters plus a terminating
character.



However, what many
people are not aware of is that POSIX also defines href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206"
rel="nofollow noreferrer">POSIX 3.195 Incomplete Line
as:



A
sequence of one or more non- characters at the end of the
file.



Hence, files without a
trailing LF are perfectly
POSIX-compliant.


If you choose not to support
both EOF types, your program is not
POSIX-compliant.


As an example, let's have
look at the following file.


1 This is the
first line.
2 This is the second
line.

No matter the EOF, I'm sure
you would agree that there are two lines. You figured that out by looking at how many
lines have been started, not by looking at how many lines have been terminated. In other
words, as per POSIX, these two files both have the same amount of
lines:


1 This is the first
line.\n
2 This is the second
line.\n

1 This is
the first line.\n
2 This is the second
line.

The man page is relatively
clear about wc counting newlines, with a newline just being a
0x0a
character:


NAME
wc - print
newline, word, and byte counts for each
file

Hence,
wc doesn't even attempt to count what you might call a "line".
Using wc to count lines can very well lead to miscounts,
depending on the EOF of your input file.


POSIX-compliant
solution


You can use grep to
count lines just as in the example above. This solution is both
more robust and precise, and it supports all the different flavors of what a line in
your file could be:


$ grep -c ^
FILE

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...