[RndTbl] PHP undefined vars / array indices

J. King jking at jkingweb.ca
Mon Jan 10 12:43:24 CST 2022


On Sat, 2022-01-08 at 03:34 -0600, Trevor Cordes wrote:
> Any PHP coders out there?

Hi, Trevor. PHP coder out here. A subscriber friend of mine pointed me
to your message, and was kind enough to send it to me unmodified so as
not to break threading (hopefully it works).

> So PHP really did it in 8.0: They really made accessing undefined
> vars
> *and* array indices a full warning rather than a notice.  I had
> written in
> past newsletters that I hoped this didn't mean what it sounded like
> it
> meant.
> 
> But it does.
> 
> They really did it.
> 
> https://wiki.php.net/rfc/engine_warnings
> 
> I finally had to face the music with the upgraded MUUG server which
> now
> uses PHP8.  It was coming for me in Fedora 25 anyhow (maybe 4 more
> months
> to postpone that change).  The MUUG server is about 80% code that
> Allan
> Pollard wrote (thanks Allan!) and 20% my own code.  Neither wrote
> according to PHP8's magic new requirements.
> 
> Anyone here who does PHP and hasn't treated it as a typed / declared
> language, because it is not one, is in for a great deal of pain when
> they
> are forced to upgrade to PHP 8.  I'm not sure I've ever come across
> existing PHP code where the dev treated it as a language where you
> must
> pre-declare everything.  Of course, such people must exist, as
> exhibited
> by the voting record for this PHP RFC.  Perhaps they are people who
> only
> recently took up PHP and/or are coming from stricter languages.

I sympathize with your plight as you're dealing with an older code-
base, but I sympathize only so much. Notices are not and have never
been spurious messages which are safe to ignore: they are indications
of possible logic errors, and a well-disciplined PHP programmer ought
to be using error_reporting(\E_ALL) when testing and addressing each
and every notice (and deprecation warning!) they encounter. Moreover,
recommended configuration today is to print no error messages
whatsoever (they should all go to a log file) when deploying to the
public, so the change really shouldn't have any effect on a host which
is properly configured; you'll just have more warning to look at in
logs instead of notices.

In my view it was a mistake for PHP deployments in the PHP 4 shared-
host days to typically suppress notices; you had to go out of your way
to even know whether you were writing robust code, and it's given the
impression that a lot of coding behaviour which has always been
considered suspect is actually perfectly fine.

> And you just know that while, at the moment, this change is limited
> to
> making it a warning, the tyrants are itching to make this a full
> explosion
> error. You can see it in the voting ("Change undefined variable
> severity
> to?": vote 36 to 28).  Luckily for us PHP requires a 2/3 majority to
> pass
> an RFC.

I disagree with your characterization of the 36 who voted in favour of
an error exception as tyrants, though I do agree it would have been a
step too far. The elimination of simple-majority votes on RFCs is,
funnily enough, perhaps one of the best changes to PHP in recent
memory: it holds its stewards to a higher standard of feature design,
which means we've tended to get better features since.

I should point out (since I don't believe you did) that the RFC had
separate votes for "undefined variable" and "undefined array index",
and throwing an exception was never a voting option for the latter.
Thus I suspect it was recognized that the former would be easier to
adapt to, and the latter impractical. It was presumably also a
recognition that a missing array index is more likely to be a data
quality problem, whereas reading an undefined variable is (almost)
always going to be a logic error on the part of the programmer.

> To "fix" these errors in code, you can in most cases use a lot of
> tricks,
> like massive use of ?? everywhere.  But that makes the code look
> really
> ugly and I start to question why?  And if you use list() to capture
> results of a function which can return variable type of variable
> length
> arrays, you're really in trouble as there's no current syntax to
> combine
> ?? or isset() with that.

Your example with list() is a good one. At its core this is a run-of-
the-mill input validation concern, though. Example:

[jking at odin ~]$ php -a
Interactive shell

php > error_reporting(\E_ALL);
php > $v = ["ook", "eek"];
php > [$ook, $eek, $ack] = $v;
PHP Warning:  Undefined array key 2 in php shell code on line 1
php > [$ook, $eek, $ack] = array_pad($v, 3, null);
php > 

That notices have historically been hidden does not excuse you from
checking your inputs and/or correcting them where needed.

> Sure, you can go in and turn your PHP into C by pre-declaring every
> variable, if you can spot them all in complex functions or global
> scope
> code.  But that only helps you with the "undefined variables" thing,
> not
> the "undefined array index" thing.  The latter is far more insidious
> because your index can be a variable itself!  So for that case you
> must
> always use isset() or ?? or some other trick.
> 
> I could go into more examples of what I had to do to the MUUG code if
> anyone is interested.

I'll grant you that dealing with undefined array indices used to be
kind of awkward, such that you had to pepper your code with a lot of
isset() or array_key_exists() (depending on whether null is an expected
value for an array member), but that's nevertheless what you had to do
to have robust logic. The ?? operator is there to make correct code
more concise, not just to make PHP shut up, and I was very happy about
its introduction since I could eliminate a lot of wordy branches.

> No tyrrany.  Freedom and choice.  Perl, like PHP has never been
> strictly
> typed or declared.  Perl will never force you to use strict.  Never. 
> So
> why is PHP?

In my opinion you proceed from a false assumption. As I stated above,
accessing an undefined variable has always been an error (at least
since PHP 4, if not earlier), albeit a putatively mild one. It is now
merely less-mild.

PHP is not strictly typed (though it does have a strict-type mode since
PHP 7.0), but it has always preferred that one declares one's
variables, and that preference has now been made stronger with PHP 8.0.

> I have some projects with over 100k lines of PHP code.  The simple
> fact
> that "fixing" this will make my code look extremely ugly, harder to
> read,
> less maintainable, and probably more prone to bugs is all the reason
> I
> need to decide that I will never "fix" my code.  Not to mention it
> will be
> probably 20k lines of code that need to be "fixed", and it's a
> pointless
> pursuit because in the end I'll never find them all.  (grep -P '\['
> ... ya
> right.)
> 
> So it will be log & ignore, or not log at all (though I still will
> need to
> see legit warnings somehow); or git clone php and change 2 lines of
> code
> and maintain my own version forever.  In fact, it may be time to do a
> fork
> of PHP: sane-php or something.

Again, I sympathize since you're dealing with an old code-base. I
nevertheless believe you've been doing things wrong by treating notices
as no big deal, and you're probably setting yourself up for more tears
by not adapting now.

> Lastly, I'm willing to debate the philosophical points, especially
> using
> the RFC's wording as a basis; however, I'm not sure anyone is
> interested
> in taking the other side (please, hardcore PHP programmers only). 

I present as my bona fides the following:

- A news aggregator which implements four different client-server sync
protocols:
<https://thearsse.com/>
- An HTML parser which actually conforms to the specification:
<https://packagist.org/packages/mensbeam/html-parser>
- A character decoder which decodes bytes in various encodings into
code points or UTF-8 characters:
<https://packagist.org/packages/mensbeam/intl>
- A Content-Type header-field parser:
<https://packagist.org/packages/mensbeam/mimesniff>


The latter two support the HTML parser, and the HTML parser will
eventually support the news aggregator. 

It may interest you to know that the character decoder does blindly
access beyond the end of a string (itself a warning since time
immemorial), but suppresses the error at the call site (see
<https://github.com/mensbeam/intl/blob/07d26e3f45c3a3167eb6389572419d3bda7ff5e1/lib/Encoding/AbstractEncoding.php#L75
>). This is something I should probably change since it can be a
problem if a user of the library implements a custom error handler.

Further, the tokenizer of the HTML parser is one giant loop (see
<https://github.com/mensbeam/HTML-Parser/blob/37f0fa8647ead67e5e9f356efe91df8825094ee8/lib/Parser/Tokenizer.php#L245
>), and not-yet-emitted tokens can be created in one iteration of the
loop and manipulated in others. In the tokenizer an access of an
uninitialized variable is unquestionably an error: a branch which
manipulates a token should never have been encountered before the
branch which initializes the token. Not having an indication of this
(be it a notice or warning) would have made debugging harder, not
easier.

The news aggregator has required me to write tons of code to sanitize
input, including a function to give me more reasonable type juggling
(see
<https://github.com/mensbeam/arsse/blob/b6605080096918c8d6f8378aa6c31db0dbc2b5bb/lib/Misc/ValueInfo.php#L62
>) and a function for each sync protocol which makes sure the input is
never missing any expected associative array keys and always has the
correct type (see
<https://github.com/mensbeam/arsse/blob/b6605080096918c8d6f8378aa6c31db0dbc2b5bb/lib/REST/TinyTinyRSS/API.php#L15
> for the Tiny Tiny RSS implementation)

All of this is validated by tons and tons of automated tests which
cover every single line of code (unless impractical or impossible to
cover with PHPUnit). None of it should ever emit a notice (though one
or two have slipped through over the years), and this discipline has
unquestionably made my code better.

-- 
J. King <jking at jkingweb.ca>


More information about the Roundtable mailing list