[RndTbl] PHP undefined vars / array indices

Adam Thompson athompso at athompso.net
Thu Jan 13 16:31:55 CST 2022


I don’t really have a horse in this race, but I think John make one factual error.  It’s minor and doesn’t change his point, but:
    if(!i++) { ... }
is perfectly valid in C, where 0==false.  It’s equivalent to (I think…):
    if(i==0) { i++; ... } else { i++; ... };
IIRC, C has the comma operator, albeit thankfully rarely used, so if you really wanted to do this, I think it could be slightly better written:
    if(i==0, i++) { ... }
I could be wrong, I haven’t attempted to write C code in 25+ years.  It’s not something you would want to see, but you certainly could.

FWIW, I agree with both sides in the original debate: PHP’s ultra-weak typing and initialization lay pervasive traps for programmers, whether they be incompetent, lazy, tired, or merely distracted.  (And based on my lived experience, the overwhelming majority of programmers – nay, *people* in general – are NOT “highly competent”: they’re “good enough”.  But at the same time, this is a seemingly-gratuitous language change that reminds me VERY strongly of system all of a sudden being shoved down everyone’s throats, even though systemd introduces some good features that were previously lacking.  And list($a,$b,$c)=… is a useful notation in PHP.

Perhaps this is like Perl 6, where it’s nearly a completely different language from the previous version, sharing the name and basic syntax?  But at least there’s still a team patching security holes in Perl 5.  I have no confidence that would happen with PHP.
-Adam

From: Roundtable <roundtable-bounces at muug.ca> On Behalf Of John Lange
Sent: Thursday, January 13, 2022 2:18 PM
To: Continuation of Round Table discussion <roundtable at muug.ca>
Subject: Re: [RndTbl] PHP undefined vars / array indices

As a former professional PHP programmer and current hobbyist programmer (not in PHP though), I agree with Trevor. (disclaimer: I did not go back and re-read all the PHP threads on this topic).

PHP made a fundamental change to the way the language works which breaks backwards compatibility and has not provided any concrete evidence that supports the published justification for this change. "style" and "best practice" really are just opinions.

I also don't see any reason why PHP could not have defined a global "use_strict= true/false" parameter similar to the approach that Perl took years back. The default could even be "true", if they want to emphasize the importance of it.

However, I do agree that it's not good programming practice. Consider this example:

function Foo {
 while ( $i < 5 )
  if (!$i++) {}
  // ... (a whole bunch more lines of code go here) ..

 while ( $i < 5 ) // inadvertently using the same variable because $i is your favorite 'counter' and you forgot you already used it
  if (!$i++) {  } // This line never runs
}

On a side note, you would only ever see "!$i++" syntax in non-declarative languages like PHP. It makes no sense otherwise since integers can never be false. Aside from that, my personal preference is to code for readability and I find the statement is hard to interpret. So I prefer:

If ($i == false) {
  $i++
  ... other stuff
}

But never the less the point is I agree that PHP should not have broken backward compatibility. By doing so it will force many sites to remain on PHP 7.x thereby opening up the very real possibility that a 7.x security vulnerability will get exploited and cause mass-grief (log4j anyone?).

John


On Thu, Jan 13, 2022 at 3:04 AM Trevor Cordes <trevor at tecnopolis.ca<mailto:trevor at tecnopolis.ca>> wrote:
On 2022-01-10 J. King wrote:
> I sympathize with your plight as you're dealing with an older code-

Thank you for your constructive and well-reasoned reply.  It is nice to
have another person to discuss this important issue with.  (I apologize
in advance for the length of this reply.)

I think it's helpful to dissect the issue into three parts: the
risk posed by the original status quo, and the pain of change, and the
freedom of the programmer.  In advance, I will readily grant that the
uninitialized value (UV) problem is more often a possible "mistake" than
the uninitialized array index (UAI) one.  I would also guess it is much
more rare.  At least that is what my attempts to "fix" my code has
revealed.  But everything I discuss applies pretty much equally to both.
And the PHP8 RFC, though it separated the issues, ended up lumping them
into the same result: turning them both into warnings.  Thus it follows
that the arguments used to vilify one (UV) must closely or somewhat
apply to the other (UAI).

I've been using PHP (in production) since 1999 and v3.  I'm nearly
positive UV and UAIs were not even a notice back then.  In fact, with
the (yes, evil) register_globals they got rid of a long time ago (more
below), the whole idea of knowing if a var was initialized or not was
impossible (without jumping through the isset hoops which almost nobody
did, especially since ?? didn't yet exist).

So let's establish that PHP from v1 through v3 not only didn't care, it
kind of mandated the programmer not check for uninitialized variables
(UV).  In other words, it not only said it was "ok", it said "this is
the way it's done".  I can go into any of my ancient PHP3 books (from
many various publishers) and guarantee you I will not find one example
that bothers to initialize a variable when there was no reason to.
(See example #0 at bottom.)

And that's ok, because many other loose, untyped, scripting languages
that were popular around 1999 were exactly the same way (e.g. perl).  In
fact, I would posit that they were designed this way!  It wasn't an
oversight or laziness: it was the desire to have the programming
language do more of the work for you, and to reduce the code line count
(vs C) required to get a job done (hence why the scripting languages
were considered "rapid" and (often) "prototype").

Were the developers of PHP in 1999 ignorant?  Or were they trying to
create a language suited for a purpose in a way that was similar to its
peers and required "less work" than the heavy alternatives?

So what's really the problem here?  All I hear from the "pro" side is
"times have changed", "it's not good practice", "it's not robust",
"legacy", "possible logic errors", "doing things wrong".  And they are
saying the original makers of PHP, all the book authors from back then,
and a generation of programmers who used it that way were/are wrong and
must change (certainly by the time the "warning" is turned into a
runtime error).  OK, someone could have that opinion, I get it,
especially someone using PHP for less than 5-10 years.  But before they
make me change 20k+ actual lines of code to make it uglier and less
readable (IMO) they better have some really good, concrete arguments.
The onus should be on the ones making the sweeping changes.

As an aside, I have a B.Sc in CompSci (around the Java era, though I
rarely used Java) and took every major related course, learning about 10
languages.  Not once was I told I had to "initialize my variables",
and certainly not check for UAI.  Not once did I lose a mark for not
doing so.  If the language being used required it, then you did it.  If
it didn't, then all that mattered was having a correct, readable
program.  Maybe that has changed and all CompSci teachers now mandate
initializing all variables in all programs in all languages.  However,
that kind of proves that this choice is rather arbitrary and more a
function of "style".

I looked for, but do not seem to have access to, the internal PHP
discussions generated by that RFC, so I cannot know what arguments
people were using on either side.  I know what my main argument is:
prove to me it's better, or safer, or any of the other condescending
descriptors used in favor of banning UVs/UAIs.  I have not been able to
find a single concrete example of how UVs actually achieve this level of
devilish behavior that will bring down the entire internet if left as
notices instead of warnings.  I would love to see some sample code a
real, half-competent programmer would use that could result in a
security hole.  I can assure you that not one single program I've
written in 30 years of script-language usage suffers from a bug caused
by this.  To bastardize Jerry McGuire: Show Me The Bugs!

I would guess/hope a competent programmer would think like I do (and the
original creators of PHP clearly did), and at every point they are going
to use a variable (that they aren't staring at a previous use of on the
same code page/scope) they say to themselves "this variable might be
unset".  The base assumption is *always* the variable is unset.
Further, if you stick with the general paradigm that a value of 0 in
your program is a negative indication (as is false, null, etc.) then
unset is just as good!  That is the precise reason one uses an
untyped/loose-typed language!  The language does the work of following
its rules to massage $x into the form needed for if(!$x) or ++$x!

I've been told often in situations like this that "not all programmers
are competent" and so need such "hand holding" to protect themselves.
I can understand that point, but why should it be *forced* on all of
the competent programmers?  Why are we *forcing* the lowest common
denominator?  Why aren't we allowing people and projects that have such
people or needs to turn on a configuration option (or even make it the
default)?  Why don't we give those who don't need their hands held the
option to stick with the multi-decade status quo?  Perl did.  Even
insane-for-backwards-compatibility python did not institute a change
between v2 and v3 that required changes to 5-20% of a codebase!  At the
bottom of the "I know better", "hold their hands" slope lies Logo. That
is not useful to me.

I have a few personal philosophies.  One is "always forward, never
back", especially when it comes to computers.  I still use many perl
scripts I wrote in 1992+ and haven't modified since.  I use many
others that required a line or two changed once in a while when the OS
upgraded; in perl, php, python, js, etc.  Same reason I use
Linux/XFCE/sawfish instead of Windows or GNOME.  If I'm going to spend
a day coding or configuring, it's going to be to get something new done
(i.e. make a new program) that will move me forward and build on my
past efforts, not fighting with some arbitrary change foisted upon me
that breaks everything in a horrific manner just to get me back to
where I was yesterday!  Those types of OSs and languages and software I
expelled from my life ages ago.  When PHP makes UV/UAI a runtime error,
it will be the first time PHP has broken my rule, out of 23 years.
(Perl has never broken this rule, out of 30+ years.)

Register_globals makes for an interesting comparison.  It was on by
default, encouraged, and used everywhere until, what, PHP5 (or 4)?
That feature has plainly obvious and trivial examples as to why it can
be a risk and a security hole.  In fact, the thing that made it a risk
is precisely that it broke the promise that a var you didn't initialize
was going to be unset.  You had no clue what any random $identifier was
going to contain, because it was externally controlled.  The fact that
the feature was killed like a decade ago proves it was a massively
bigger risk than UV/UAI is now.

So what about the fix/mitigation/pain aspect?  Well, that was easy/quick
relative to the mess UV/UAI causes.  Just look at the handful of
get/post (and cookie) vars your page expects (something that can be
grepped!) and change them to $_GET[], et al, or "init" them yourself at
the top with $expected=$_GET['expected'].  A handful of lines to edit
with a very low probability of introducing bugs.  I remember when I had
to do this to every script I ever wrote and it was quick and easy and
painless.  So the overall equation of minimizing risk vs the pain of
the solution was extremely favorable.  Now apply that same calculation
to UV/UAI.  I challenge you to illustrate for me that the two variables
of the equation are similar to the register_globals situation.

Likewise, I was appalled, and then relieved when the PHP devs said they
wanted to get rid of <? ?> short tags but decided against it.  It's
very similar to the UV/UAI issue because it's a solution in search of a
problem, something that has been used since day 1, actively encouraged,
and the risk/pain equation is horrifically lopsided.  And at least that
problem can be grepped!  UAI really cannot.  That <? ?> came as close
as it did to being deprecated/errored indicates that there is a real
problem with the PHP leadership/voters' mindset where many really don't
care at all about existing codebases or valuable programmer time.

> I disagree with your characterization of the 36 who voted in favour of
> an error exception as tyrants, though I do agree it would have been a
> step too far.
[...]
> notices as no big deal, and you're probably setting yourself up for
> more tears by not adapting now.

You agree it's a "step too far" but then hint in the second part what
we all know is coming: one day (soon) PHP devs will change UV into a
full blown error.  I bet many are itching to make UAI one too!  Give it
a few extra years.

However, I won't be crying, because I can, and will, tweak a few lines
of PHP source code and compile my own (rpm tools on RH-based systems
make this exceptionally easy).  I've decided UV/UAI is insanity,
especially if no one can explain how this is even a small security hole
for *me*, and the path of least pain with equal gain is to maintain my
own version, perhaps called "sane-php" or "freedom-php".

I know for a fact I won't be the only one.  UV/UAI and <? ?> nonsense is
precisely what prompts people to flee a project, or fork.  The reason
there's not an uproar yet is that (some estimates) have PHP8 usage at
only around 1% at the moment.  Just wait until RHEL ships it by default
and all the LTS Deb/Ubuntus with PHP7 go EOL.  Heck, even Fedora had to
delay PHP8 2 or 3 releases vs planned because it caused so much grief.

One more thought: this UV/UAI "fix" could actually cause more security
holes and less attention to notices/warnings overall because, as of
now, the easy fix is to disable logging of warnings.  If your site gets
reams of traffic, you'll practically be forced to go ~E_WARNING,
otherwise your log disks will fill up and you'll kill your SSD
lifespan.  That will cause "real" warnings (which I do want to see!) to
go completely unnoticed.  How is that a better result?  I won't be the
only one... It'll make dev a disaster because on a dev box, which will
have display_errors=on, I won't see any of the egregious warnings
in-page, making bug-free development that much harder.  If the goal is
to get more attention to warnings/notices, UV/UAI might actually do the
opposite.

> php > error_reporting(\E_ALL);
> php > $v = ["ook", "eek"];
> php > [$ook, $eek, $ack] = $v;
> PHP Warning:  Undefined array key 2 in php shell code on line 1
>
> That notices have historically been hidden does not excuse you from
> checking your inputs and/or correcting them where needed.

What if the function returns 2 values in some cases, and 3 in others?
Yes, that might not be a great design choice, but it's a legitimate one
in that it can be logically correct.  You are punishing the caller with
a warning for a design choice inside the function (they may not have
written).  And you've given them no way work around it without a whole
whack of extra, ugly lines of ?? or isset on a temporary result array
variable.  All that pain for what gain?  Show me the security hole.

> I'll grant you that dealing with undefined array indices used to be
> kind of awkward, such that you had to pepper your code with a lot of
> isset() or array_key_exists() (depending on whether null is an
> expected value for an array member), but that's nevertheless what you
> had to do to have robust logic.

Why?  As above, prove it's "robust" vs the alternative.  The RFC and
random forum/bug discussions I've seen about this do nothing but bandy
about the condescending descriptors with nary a shred of evidence of
said "unrobustness".

> > No tyrrany.  Freedom and choice.  Perl, like PHP has never been
> > strictly
> > typed or declared.  Perl will never force you to use strict.
> > Never. So
> > why is PHP?
>
> In my opinion you proceed from a false assumption. As I stated above,
> accessing an undefined variable has always been an error (at least
> since PHP 4, if not earlier), albeit a putatively mild one. It is now
> merely less-mild.

What is the false assumption?  It is irrelevant that they made it a
"notice" in PHP2, 3, 4 or 5, and a warning in 8.  The main problem,
which you don't address, is the freedom to choose.  Why can't we have
that freedom?  Why do the 36 get to choose for the world and why do
they seem so uniquely oblivious to existing codebases?  I always
thought python was bad, but at least they admit, that by design, they
will screw you over every major release and always have!

We are not talking about a change that allows the PHP language
developers to rip out thousands of C lines of cruft or make the language
perform 30% faster.  We are talking about, I'm guessing, 1 to 10 lines
of C code changed and 0% speedup.  All for what?

> Again, I sympathize since you're dealing with an old code-base. I
> nevertheless believe you've been doing things wrong by treating
> notices as no big deal, and you're probably setting yourself up for
> more tears by not adapting now.

I always code perl with -w on, but strict off.  That's the equivalent
of seeing all notices/warnings in PHP.  In the olden days of PHP I'd log
E_ALL because only "sane" things were warnings... kind of like perl.  I
don't know where along the way I ~E_NOTICEd, but over the years PHP has
made enough non-problems into notices that, yes, I completely ignore
notices now.  Unlike in perl, they don't seem to ever match up with an
"oopsy" moment: they simply moan about reams of style choices that I
disagree with.

I trusted the PHP devs to make good decisions as to what they were
going to warn or deprecate, and I really have agreed with their choices
on everything... until now.  Now it feels like they've jumped on the
"change for change's sake" bandwagon like some other FLOSS projects; or
worse, the "our style is right and you'll like it" attitude.  But at
least most of those other projects, if forced to still use them,
provided some gravy!  There is zero upside *for me* for the PHP UV/UAI
changes.

Since I have no logic error or security hole in my program because of
UV/UAI, the only "doing things wrong" I've done is to trust that the PHP
braintrust will keep making sane choices and/or maintain my freedom to
choose.

I will end with a lame appeal to authority, but one that is meaningful
to me as a longtime fan:  Rasmus Lerdorf, inventor of PHP, UV vote "keep
notice", UAI vote: "keep notice".  Rasmus gets it.  (Oh ya, and Linus
would never allow a change like this in the Linux Kernel without
massively obvious and good reasons.  And...)


APPENDIX:

Example #0:

function Foo {
 while ( ... )
  if (!$i++) {}
}

Is a perfectly reasonable/concise, if lazy, paradigm for detecting the
first time in a loop and keeping a count of iterations as a bonus.
This is now a warning in PHP8.  Is anything at all gained by putting
"$i=0" before the while, maybe at the top of the page or the start of a
function?  If I wanted to be forced to do that, I'd be using a language
that wants me to.  And even if one thinks i=0 should be required, it's
not universal, as perl happily accepts that without warning even with
warnings on!  perl sees that and says "the programmer has his big boy
boots on and does not need coddling".  Even with strict on, as long as
you declared it with my($i) (even without initializing it!) perl
doesn't complain.

_______________________________________________
Roundtable mailing list
Roundtable at muug.ca<mailto:Roundtable at muug.ca>
https://muug.ca/mailman/listinfo/roundtable


--
John Lange
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://muug.ca/pipermail/roundtable/attachments/20220113/2aca42b4/attachment-0001.htm>


More information about the Roundtable mailing list