Elmar Klausmeier's Blog

UNIX Process Substitution

Sat, 26 Oct 2024 17:00:00 +0200

Process substitution is available in ksh, bash, and zsh. More examples can be found here: Process Substitution.

Process substitution uses /dev/fd/ files to send the results of the process(es) within parentheses to another process.

Piping the output of one program to the input of another program is a very powerful way to run multiple programs at once without any auxiliary temporary files.

1. Sequential flow. Simple case:

program1 | program2 | program3 > file1

flowchart LR A(program1) B(program2) C(program3) U@{ shape: flag, label: "file1" } A --> B --> C --> U

2. Read from multiple programs. This is a quite common case.

program1 <(program2) <(program3)

flowchart TD A(program1) B(program2) C(program3) B --> A C --> A

It looks very similar to the usual command line:

program1 file1 file2

3. Pipe to multiple programs. One program, here program2, takes multiple files as command-line arguments, i.e., it has multiple outputs.

program1 | program2 >(program4 > file2) >(program5 > file3) \
         | program3 > file1

flowchart LR A(program1) B(program2) C(program3) D(program4) E(program5) U@{ shape: flag, label: "file1" } V@{ shape: flag, label: "file2" } W@{ shape: flag, label: "file3" } A --> B --> C --> U B --> D --> V B --> E --> W

Here is a real-world example for generating the statistics for this web-server:

time blogconcatlog 57 | pv | tee /tmp/a2 | accesslogFilter -o >(blogstatcnt > /srv/http/statcnt.html) | tee /tmp/a1 | blogurlcnt -m70 > /srv/http/urlstat2-m100.html

Vodafone Internet Outage #2

Mon, 14 Oct 2024 16:23:00 +0200

Today, 14-Oct-2024, starting at 16:23 (CET), internet provided by Vodafone was unavailable.

At 20:51 internet was available again. So for almost five hours I had no internet and the blog went dark. Luckily, the IP address stayed the same, so I had no delay in DNS.

DOCSIS signal noise ratio is now good again.

1. Contact support. I tried to call the hotline of Vodafone but gave up after waiting in the line for more than 25 minutes. According to their own website they have a major outage in my region. They say they need three days to fix it. This means: my homepage, i.e., this blog, is unavailable for three days.

I used 5G to connect to their website and request an SMS, when internet is available again. I never received any SMS from them.

2. Google fonts. I now learned the hard way that not only my blog is not available outside my home, but also inside my home the blog experiences a severe and annoying delay. Reason: As I use Google fonts, and Google is not available for me, everything is lagging. This makes me think about:

That way there might be a better buffering of my content.

3. History of outages. I blogged about previous internet outages:

In 2022 the Vodafone internet router was defective.
No internet on 08-Jan-2024.

4. Uptime monitors. The uptime monitors correctly diagnosed the problem.

Hetrix
- Down: Noticed at: 2024-10-14 15:31:00 (UTC+01:00)
- Up: Noticed at: 2024-10-14 20:00:42 (UTC+01:00)
BetterUptime
- Down: Started at: 14 Oct 2024 at 04:46pm CEST
- Up: Resolved at: 14 Oct 2024 at 10:15pm CEST (automatically)
UptimeRobot
- Down: Incident started at 2024-10-14 17:06:17
- Up: Resolved at 2024-10-14 20:05:22
ping-failure-test every 30 minutes

Mon Oct 14 04:39:20 PM CEST 2024
Mon Oct 14 05:09:06 PM CEST 2024
Mon Oct 14 05:39:06 PM CEST 2024
Mon Oct 14 06:09:06 PM CEST 2024
Mon Oct 14 06:39:06 PM CEST 2024
Mon Oct 14 07:09:06 PM CEST 2024
Mon Oct 14 07:39:06 PM CEST 2024
Mon Oct 14 08:09:06 PM CEST 2024
Mon Oct 14 08:39:06 PM CEST 2024

Obviously, Hetrix is the most detailed monitor.

Hosting Static Content with DigitalOcean

Sun, 13 Oct 2024 16:00:00 +0200

I wrote about hosting static sites on various platforms:

Hosting Static Content with surge.sh
Hosting Static Content with now.sh, now.sh renamed themself to vercel.app
Hosting Static Content with netlify.app
Hosting Static Content with Cloudflare
Hosting Static Content with Neocities
Hosting Static Content with GitLab
Hosting Static Content with GitHub

DigitalOcean offers what they call "App Platform". It is a "Platform-as-a-Service (PaaS)". They have "apps" for React, Django, Go, Jekyll, Next.js, Python, Ruby, Hugo, TypeScript, Hexo, PHP, Node.js, Gatsby, and some others.

Here, I confine myself to the "Sample App for Static HTML Assets". I only show how to use GitHub as source for static files. GitLab would be totally similar.

1. Steps to setup. Here are the steps to create a static site on DigitalOcean.

Go to "App Platform"
Click "Create App"
Select one of the repositories, GitHub in this case

Provide access to your GitHub account, if you haven't done so
Choose proper GitHub repo

Whenever changes in the GitHub repo, or you change the settings in DigitalOcean, you initiate a new build.

Unfortunately, the name used for DNS, here eklausmeier-5257s.ondigitalocean.app cannot be changed. It is partly autogenerated.

2. Build configuration. The default build configuration for static sites is:

lerts:
- rule: DEPLOYMENT_FAILED
- rule: DOMAIN_FAILED
features:
- buildpack-stack=ubuntu-22
ingress:
  rules:
  - component:
      name: eklausme-github-io
    match:
      path:
        prefix: /
name: eklausmeier
region: fra
static_sites:
- environment_slug: html
  github:
    branch: master
    deploy_on_push: true
    repo: eklausme/eklausme.github.io
  name: eklausme-github-io
  source_dir: /

3. Limitations. DigitalOcean does have some annoying limitations. My blog cannot be fully hosted on the DigitalOcean app. If I remove all images then DigitalOcean can host it. So it seems there is some undocumented limit on the number of files.

In the free version you can only have up to three apps.

On The Stability Of The Solar System

Tue, 08 Oct 2024 20:20:00 +0200

Our solar system is not stable when considerung time ranges of several Gigayears. These are the results of simulations done by Laskar and Gastineau.

1. Nomenclature

Notions for ellipses, see below figure:

center M
semi-minor axis b
semi-major axis a
(linear) eccentricity e, numerical eccentricity $\varepsilon$

$$ \pmatrix{x(t)\cr y(t)} = \pmatrix{a \cos(t)\cr b \sin(t)}, \qquad \varepsilon = {e\over a} = {\sqrt{a^2-b^2}\over a} = \sqrt{1 - \left({b\over a}\right)^2} $$

The numerical eccentricity describes by "how much" the shape of the ellipse differs from a circle: a value of zero means it is a circle, anything large than zero becomes more deformed.

Below are the numerical eccentricities of the planets in our solar system. Semi-minor and semi-major axes length are given in AU.

Nr.	Planet	Eccentricity	a	b	#moons
1	Mercury	0.206	0.38700	0.37870	0
2	Venus	0.007	0.72300	0.72298	0
3	Earth	0.017	1.00000	0.99986	1
4	Mars	0.093	1.52400	1.51740	2
5	Jupiter	0.049	5.20440	5.19820	95
6	Saturn	0.057	9.58250	9.56730	146
7	Uranus	0.046	19.21840	19.19770	28
8	Neptune	0.010	30.11000	30.10870	16

2. The Jovian Planets

Below text is from The Jovian Planets:

The four Jovian planets — Jupiter, Saturn, Uranus, and Neptune — are also called "giant planets". The Jovian planets occupy orbits in the outer solar system at distances ranging from 5 (Jupiter) to 30 (Neptune) times the Earth’s distance from the Sun. ...

Unlike the terrestrial planets that make up our inner solar system — Mercury, Venus, Earth, and Mars — the Jovian planets do not have solid surfaces. Instead, they are composed primarily of hydrogen and helium, with traces of methane, ammonia, water, and other gases in their atmospheres. These gases make up a deep atmosphere and become tightly compressed around relatively tiny cores of rock. At great depths within Jupiter, for example, the hydrogen gas is compacted so tightly that it exists in a rare metallic form.

3. Jovian problem

Below text is from N-body simulations: the performance of eleven integrators by P.W. Sharp.

The Jovian problem has the Sun, Jupiter, Saturn, Uranus and Neptune interacting through classical Newtonian gravitational forces. Let ${\bf r}_i$ denote the position of the i-th body, where the bodies are ordered Sun to Neptune and the coordinate system is three-dimensional Cartesian with the origin at the barycenter of the bodies. G is the gravitational constant, $m_i$ is the i-th mass. The differential equation is

$$ \ddot{\bf r}_i(t) = \sum_{j=1\atop j\ne i}^5 { G m_j ({\bf r}_j(t) - {\bf r}_i(t) \over \left\Vert {\bf r}_j(t) - {\bf r}_i(t) \right\Vert^2 }, \qquad i=1,\ldots,5. $$

Except for the emission of Pluto and a change in the coordinate system, above equation is problem C5 from Nonstiff DETEST.

This problem becomes particularly demanding when the integration interval is long, e.g., ten million years.

A simple test for correctness is to use the total energy:

$$ E = {1\over2} \left[ \sum_{i=1}^5 \left( m_i {\bf r}_i^2 - \sum_{j=1\atop j\ne i}^5 { G m_i m_j \over \left\Vert {\bf r}_j - {\bf r}_i \right\Vert } \right) \right]. $$

Total energy E must be constant over all time t. It is a not a very accurate measure for correctness.

Above paper gives more involved differential equations for:

Nine planet problem
Spin Axis problem
DE102 problem

4. Evolution of planetary orbits

Below text is from Laskar: Stability of the solar system.

For all external planets, the maximum eccentricity is almost constant. That reflects the fact that these trajectories are very close to regular and quasiperiodic trajectories; possible instabilities are insensitive with the scale of the drawing.

For Venus and the Earth, one observes moderated variations, but still significant. The maximum eccentricity of the Earth reached through chaotic diffusion reaches about 0.08, whereas its current variations are approximately 0.06. It is about the same for Venus.

It should however be noted that to arrive at this possible collision between Mercury and Venus, the model was used beyond its rigorous field of validity, which does not includes the vicinity of collisions. In addition, the solution was carefully chosen, so in any case, it is surely not a very probable one, and the majority of the solutions of close initial conditions will not lead to this possible collision.

Concerning the system of the outer planets, the things are appreciably different, because the direct gravitational short period perturbations are more significant. The recent numerical simulations show that particles placed among the outer planets do not remain beyond a few hundreds of million years, apart for some particular zones of stability or beyond Neptune, in the Kuiper belt, where objects explicitly were found.

Finally, these observations also make it possible to have an idea of the general aspect of a planetary system around a star. Indeed, if the process of formation planetary from planetesimals is correct, it becomes possible that the planetary systems will always be in a state of marginal stability, like our own Solar system. At the end of the phase of formation of the system, a large number of bodies can remain, but in this case the system is strongly unstable, which will led to a collision or an ejection. After this event, the system becomes more stable, with constantly, a time of stability comparable with its age.

5. Instability of solar system after 3 Gyr

Mogavero, Hoang and Laskar use below Hamiltonian

$$ \hat H = - \sum_{i=1}^8 \left[ \sum_{\ell=1}^{i-1} \left\langle { G m_i m_\ell \over \left\Vert {\bf r}_i - {\bf r}_\ell \right\Vert } \right\rangle + { 3 G^2 m_0^2 m_i \over c^2 a_i^2 \sqrt{1-\varepsilon_i^2} } \right] . $$

Notation:

The $a_i$ are the semi-major axes.
$m_0$ and $m_i$ are the masses of Sun and the planets.
$\varepsilon_i$ are the eccentricities of the planets.
The ${\bf r}_i$ are the heliocentric positions of the planets.
c is the speed of light.
The bracket operator represents the averaging over the mean longitudes resulting from the elimination of non-resonant Fourier harmonics of the N-body Hamiltonian.

See Mogavero+Laskar: Long-term dynamics of the inner planets in the Solar System.

The text Stability of the solar system shows below results for the eccentricity of Mercury after 1-5 Gyr.

Thanks to relativity the eccentricities of Mercury stay way lower than ignoring relativity. Nevertheless, after around 1Gyr the solar system becomes destabilized by Mercury crashing into Venus, statistically speaking, i.e., in 1% of cases this can happen.

Beyond this spectacular aspect, these results also validated the methods of semi-analytical averaging developed for more than 20 years and which had allowed to show the possibility of collision between Mercury and Venus (Laskar, 1994). These results also answer to the question raised more than 300 years ago by Newton, by showing that collisions among planets or ejections are actually possible within the life expectancy of the Sun, that is, in less than 5 Gyr. The main surprise that comes from the numerical simulations of the recent years is that the probability for this catastrophic events to occur is relatively high, of the order of 1%, and thus not just a mathematical curiosity with extremely low probability values. At the same time, 99% of the trajectories will behave in a similar way as in the recent past millions of years, which is coherent with our common understanding that the Solar System has not much evolved in the past 4 Gyr. What is more surprising is that if we consider a pure Newtonian world, starting with the present initial conditions, the probability of collisions within 5 Gyr grows to 60%, which can thus be considered as an additional indirect confirmation of general relativity.

Also see:

Long-term instability of the inner Solar system: numerical experiments
Timescales of Chaos in the Inner Solar System: Lyapunov Spectrum and Quasi-integrals of Motion by Federico Mogavero, Nam H. Hoang,and Jacques Laskar

XHProf with NGINX

Tue, 01 Oct 2024 18:00:00 +0200

XHProf is a PHP profiler. I had written on XHProf here:

XHProf was originally written by Facebook engineers. Even though it severely slows down your PHP program during execution, it is nevertheless a valuable tool to get an understanding, where your PHP program is spending its time.

Once XHProf has generated its data after running your PHP program, you want to see the reports on the data. One easy approach is to use the builtin web-server into PHP:

php -S 0:8000 -t /usr/share/webapps/xhprof/xhprof_html

This will start a web-server at port 8000. In the browser type something like below into the URL field:

http://localhost/index.php?run=66fbf551c7dca&source=saaze

This should show you the report of your run.

This post is about using NGINX to see the report, instead of the builtin web-server of PHP itself. The naive approach by just using the default does not work.

1. Changing directory for XHProf output. By default XHProf uses sys_get_temp_dir() to find the temporary directory. On Arch Linux this is /tmp. That needs to be changed. First, create /srv/http/tmp. Second, edit php.ini.

extension=xhprof
xhprof.output_dir = /srv/http/tmp

2. Symbolic link. XHProf contains a number of PHP programs and JavaScript files. Create a symbolic link under the web-server root:

ln -s /usr/share/webapps/xhprof/xhprof_html xhprof

3. Activate PHP in NGINX. That has probably already been done. Nevertheless, it is repeated here.

location ~ \.php$ {
    try_files $fastcgi_script_name =404;

    # default fastcgi_params
    include fastcgi_params;

    # fastcgi settings
    fastcgi_pass                        unix:/run/php-fpm/php-fpm.sock;
    #fastcgi_index                      index.php;
    fastcgi_buffers                     8 16k;
    fastcgi_buffer_size         32k;

    # fastcgi params
    fastcgi_param DOCUMENT_ROOT $realpath_root;
    fastcgi_param SCRIPT_FILENAME       $realpath_root$fastcgi_script_name;

    fastcgi_cache_bypass $no_cache;
    fastcgi_no_cache $no_cache;
}

The location section is the third in the hierarchy:

# http ## server ### location ### ... ## server ## ...

4. Result. Your reports are now here:

http://localhost/xhprof/index.php?run=66fbf551c7dca&source=saaze

It looks like this:

XHProf: Hierarchical Profiler ReportNo XHProf runs specified in the URL.

Existing runs:

66fbf551c7dca.saaze.xhprof 2024-10-01 15:12:49
66fbc637d92df.saaze.xhprof 2024-10-01 11:51:51
66fbb7b47132d.saaze.xhprof 2024-10-01 10:49:56

5. AUR package. I maintain an AUR package: XHProf.

Direct st-connectivity with few paths is in quantum logspace

Tue, 03 Sep 2024 22:30:00 +0200

Authors: Roman Edenhofer and Simon Apers

Université Paris Cité, CNRS, IRIF, Paris, France

Abstract
1. Introduction and summary
2. Space-bounded computation
- 2.1 Turing machines
- 2.2 Complete (promise) problems
3. Fewness language in $\mathsf{BQL}$
- 3.1 Unambiguity and fewness
- 3.2 Proof of theorem 1.4
4. Counting few paths in quantum logspace
- 4.1 Effective pseudoinverse
Acknowledgements
References
Appendix: Effective pseudoinversion in quantum logspace

Abstract

We present a $\mathsf{BQSPACE}(O(\log n))$-procedure to count $st$-paths on directed graphs for which we are promised that there are at most polynomially many paths starting in $s$ and polynomially many paths ending in $t$. For comparison, the best known classical upper bound in this case just to decide $st$-connectivity is $\mathsf{DSPACE}(O(\log^2 n/ \log \log n))$. The result establishes a new relationship between $\mathsf{BQL}$ and unambiguity and fewness subclasses of $\mathsf{NL}$. Further, some preprocessing in our approach also allows us to verify whether there are at most polynomially many paths between any two nodes in $\mathsf{BQSPACE}(O(\log n))$. This yields the first natural candidate for a language problem separating $\mathsf{BQL}$ from $\mathsf{L}$ and $\mathsf{BPL}$. Until now, all candidates separating these classes were promise problems.

1. Introduction and summary

Graph connectivity is a central problem in computational complexity theory, and it is of particular importance in the space-bounded setting. Given a graph $G$ and two vertices $s$ and $t$, the task is to decide whether there is a path from $s$ to $t$. For undirected graphs the problem is denoted as $\mathrm{USTCON}$. Aleliunas, Karp, Lipton, Lovász and Rackoff [AKL+79] showed that doing a random walk for a polynomial number of steps can solve it in randomized logspace, $\mathsf{RL}$. After a long line of work, Reingold [Rei05] was able to derandomize the result and showed that the problem is already contained in deterministic logspace, $\def\L{\mathsf{L}}\L$, and is in fact complete for that class. In the directed graph setting the problem is denoted as $\mathrm{STCON}$, and it is complete for non-deterministic logspace, $\mathsf{NL}$. The best known deterministic algorithm for $\mathrm{STCON}$ in terms of space complexity is due to Savitch [Sav70] and runs in space $O(\log^2 n)$. We have the following well-known inclusions

$$ \mathsf{L} \subseteq \mathsf{RL} \subseteq \mathsf{NL} \subseteq \mathsf{DET} \subseteq \mathsf{L}^2 $$

where $\mathsf{DET}$, introduced by Cook [Coo85], is the class of languages that are $\mathsf{NC}^1$ Turing reducible to the computation of the determinant of an integer matrix and $\mathsf{L}^2$ is the class of languages decidable in deterministic space $O(\log^2 n)$. We refer to [AB06] for further details on reductions and basic complexity classes.

The most studied quantum space-bounded complexity class is bounded error quantum logspace, denoted $\mathsf{BQL}$. This is the class of languages decided in $\mathsf{BQSPACE}(O(\log n))$, i.e., decided by a quantum Turing machine with error $1/3$ running in space $O(\log n)$ and time $2^{O(\log n)}$. The quantum Turing machine has unrestricted access to the input on a classical input tape and can write on a uni-directional output tape which does not count towards the space complexity. The class $\mathsf{BQL}$ lies in between $\mathsf{RL}$ and $\mathsf{DET}$. In fact, it was recently shown by Fefferman and Remscrim [FR21] that certain restricted versions of the standard $\mathsf{DET}$-complete matrix problems are complete for $\mathsf{prBQL}$, where $\mathsf{prBQL}$ is the promise version of $\mathsf{BQL}$, i.e. the class of promise problems decided in $\mathsf{BQSPACE}(O(\log n))$. This extended the earlier work of Ta-Shma [Ta-13], who showed how to invert well-conditioned matrices in $\mathsf{BQSPACE}(O(\log n))$ building on the original idea of Harrow, Hassidim and Lloyd [HHL09]. We restate two of Ta-Shma's main results:

(Compare Theorem 5.2 in [Ta-13]) Given a matrix $M \in \mathbb{C}^{n \times n}$ with $|M|_2 \leq \mathop{\rm poly}(n)$, we can output all of its singular values and their respective multiplicities up to $1/\mathop{\rm poly}(n)$ additive accuracy in $\mathsf{BQSPACE}(O(\log n))$. In particular, we can determine $\dim(\ker(M))$ if all non-zero singular values have inverse polynomial distance from zero.
(Compare Theorem 1.1 in [Ta-13]) Given two indices $s,t\in [n]$ and a matrix $M\in\mathbb{C}^{n\times n}$ which is poly-conditioned, by this we mean that its singular values satisfy \begin{align*} \mathop{\rm poly}(n)\geq \sigma_1(M) \geq \cdots \geq \sigma_n(M) \geq 1/\mathop{\rm poly}(n), \end{align*} we can estimate $M^{-1}(s,t)$ up to $1/\mathop{\rm poly}(n)$ additive accuracy in $\mathsf{BQSPACE}(O(\log n))$.

The first result above directly implies a $\mathsf{BQL}$-procedure for deciding $\mathrm{USTCON}$ (alternatively, this follows from the containment $\mathsf{RL}\subseteq \mathsf{BQL}$). To see this, note that (i) $\mathrm{USTCON}$ can be reduced to counting the number of connected components, (ii) the dimension of the kernel of the random walk Laplacian $I-P$ is equal to the number of connected components, and (iii) for undirected graphs the spectral gap of $I-P$, that is its smallest non-zero eigenvalue, is inverse polynomially bounded from zero. Here $I$ is the identity and $P$ is the transition matrix of a random walk. This ties to the fact that a random walk on an undirected graph takes polynomial time to traverse the graph (see, e.g., Chapter 12 in [LPW08]). Unfortunately, for directed graphs the situation is more complicated. Importantly, the smallest non-zero singular value of the random walk Laplacian can be inverse exponentially small, and similarly the time it takes a random walk to find connected nodes can be exponential. Hence, it is not obvious how Ta-Shma's results should be of any help in this setting.

Counting few paths

Somewhat surprisingly, we show that by analyzing a different matrix which we call the counting Laplacian $L = I-A$ we can use Ta-Shma's second result to solve instances of $\mathrm{STCON}$ in $\mathsf{BQSPACE}(O(\log n))$ that seem hard classically. Here $A$ denotes the adjacency matrix of a graph. While the random walk Laplacian is spectrally well-behaved if a random walk is efficient, we find that the counting Laplacian is spectrally well-behaved if the underlying graph is acyclic and contains only few paths. We remark that the number of paths in a graph can be totally unrelated to the success probability of a random walk finding specific nodes. Even if there exist very few paths, it can happen that a random walk has an extreme bias to only pick paths we are not interested in. Consider for instance the following graph:

The number of paths between any two nodes is at most one. Nonetheless, a random walk starting at node $1$ only has probability $1/2^{n-1}$ to reach node $2n$.

Remark: There does exist a folklore algorithm running in $\mathsf{DSPACE}(O(\log n))$ for solving $\mathrm{STCON}$ on this graph, and directed trees more generally, as mentioned in, e.g., [kannan2008stcon]. The algorithm builds on a divide-and-conquer strategy that is specifically tailored to these graphs.

Given a directed graph $G=(V,E)$ with nodes $i,j\in V$ let us denote by $N(i,j)$ the number of paths from $i$ to $j$. By a path from $i$ to $j$ we mean a sequence of edges $(e_1,\ldots,e_k)$ which joins a sequence of nodes $(v_0,\ldots,v_k)$ such that $e_i = (v_{i-1},v_i)$ for all $i\in [k]$.

Remark: Some authors call this a walk and disallow a path to visit any vertex more than once. We follow [AL98,kannan2008stcon] in our definition. By convention we also count the empty sequence as a path from any node to itself such that $N(i,i)\geq 1$ for all $i\in V$.

As our first result, and as a primer for the rest of the paper, we show that we can count the number of $st$-paths on directed graphs for which there are at most polynomially many paths between any two nodes in $\mathsf{BQSPACE}(O(\log n))$. In particular, this allows to decide $\mathrm{STCON}$ on such graphs.

Theorem 1.1. Fix a polynomial $p:\mathbb{N} \rightarrow \mathbb{N}$. Let $G$ be a directed graph with $|V(G)|=n$ nodes such that

$\forall i,j \in V(G): N(i,j)\leq p(n)$.

There is an algorithm running in $\mathsf{BQSPACE}(O(\log n))$ that, given access to the adjacency matrix $A$ of $G$ and $s,t\in V(G)$, returns the number of paths from $s$ to $t$.

Proof: Note that $A^k(i,j)$ is equal to the number of paths of length $k$ from node $i$ to node $j$. Since that number is finite for all pairs of nodes by assumption, we find that the graph is acyclic. In other words, $A$ is nilpotent, i.e., $A^{n}=0$. As a consequence we obtain that the inverse of the counting Laplacian exists and is equal to

$$ L^{-1} = (I-A)^{-1} = I + A + A^2 + \cdots + A^{n-1}. $$

Clearly, its entries simply count the total number of paths from $i$ to $j$, i.e. $L^{-1}(i,j) = N(i,j)$. By assumption, we thus have

$$\|L^{-1}\|_{\max} = \max_{i,j\in [n]} |L^{-1}(i,j)| \leq p(n).$$

Now observe that the $\max$-norm of a matrix can be used to bound its spectral norm. More precisely, we have that

$$\|L^{-1}\|_2 \leq n \cdot \|L^{-1}\|_{\max}.$$

Using also that $| L |_{\max} = 1$, we find bounds for the largest and smallest singular value of $L$,

$$ n \geq \sigma_1(L) \text{ and } \sigma_n(L) = \frac{1}{\|L^{-1}\|_2} \geq \frac{1}{n\cdot\|L^{-1}\|_{\max}} \geq \frac{1}{n\cdot p(n)}. $$

Thus, the counting Laplacian is poly-conditioned and we can apply Ta-Shma's matrix inversion algorithm, the second restated result above, to approximate entry $L^{-1}(s,t) = N(s,t)$ up to additive error $1/3$ in $\mathsf{BQSPACE}(O(\log n))$. Rounding to the closest integer gives the number of paths from $s$ to $t$. ☐

The proof of theorem 1.1 uses a reduction of summing matrix powers to matrix inversion. The idea for this is well-known, and appeared in this context in early work by Cook, and more recently in for instance FR21. Though, to the best of our knowledge, no study of the implications to $\mathrm{STCON}$ has been made yet.

In section 3 we push the algorithmic idea of theorem 1.1 further by doing a more fine-grained analysis of the counting Laplacian's singular values and singular vectors. We show that we can already count $st$-paths in $\mathsf{BQSPACE}(O(\log n))$ on directed graphs for which only the number of paths starting from $s$ and the number of paths ending in $t$ are polynomially bounded. We state the formal result here.

Theorem 1.2. Fix a polynomial $p: \mathbb{N} \rightarrow \mathbb{N}$. Let $G$ be a directed graph with $|V(G)|=n$ nodes such that

$\forall j \in V(G) : N(s,j) \leq p(n)$ and $N(j,t) \leq p(n)$.

There is an algorithm running in $\mathsf{BQSPACE}(O(\log n))$ that, given access to the adjacency matrix $A$ of $G$ and $s,t\in V(G)$, returns the number of paths from $s$ to $t$.

Classically (deterministically or randomly), the best known space bound just to decide $\mathrm{STCON}$ on graphs promised to satisfy a polynomial bound on the number of paths between any two nodes as in theorem 1.1 or only on the number of paths starting from $s$ and on the number of paths ending in $t$ as in theorem 1.2 is $\mathsf{DSPACE}(O(\log^2(n)/\log\log(n)))$ [AL98,GSTV11]. Alternatively, as noticed in [Lan97,BJLR91] we can also solve such $\mathrm{STCON}$ instances simulatenously in deterministic polynomial time and space $O(\log^2 n)$. We elaborate on these bounds in the next section.

Unambiguity, Fewness and a language in $\mathsf{BQL}$ (maybe) not in $\mathsf{BPL}$

Previous works studied restrictions on the path count in connection to "unambiguity" of configuration graphs of space-bounded Turing machines. The notion of unambiguity was introduced to interpolate between determinism and non-determinism, and gives rise to complexity classes between $\L$ and $\mathsf{NL}$. We formally introduce these classes in section 2. As a direct consequence of theorem 1.1 we obtain:

Corollary 1.3. $\mathsf{StrongFewL} \subseteq \mathsf{BQL}$

Where $\mathsf{StrongFewL}$ denotes the class of languages which are decidable by an $\mathsf{NL}$-Turing machine whose configuration graphs for all inputs are strongly-few, that is graphs for which there are at most polynomially many paths between any two nodes. Since $\mathsf{StrongFewL}$ is not known to lie in $\mathsf{BPL}$, this already implies the existence of a language in $\mathsf{BQL}$ not known to lie in $\mathsf{BPL}$. However, we can obtain an explicit language by doing some preprocessing in the algorithm of theorem 1.1: we can actually check in $\mathsf{BQSPACE}(O(\log n))$ whether a given graph is strongly-few. Unfortunately, we see no way to do the same for the weaker promise of theorem 1.2.

We obtain the subsequent language containment (notably, the containment of this language in $\mathsf{StrongFewL}$ is not known).

Theorem 1.4. The language \begin{align*} \mathrm{STCON}_{\mathrm{sf}} = { \langle G,s,t,1^k\rangle \text{ }|\text{ }\forall i,j\in V(G): N(i,j)\leq k \text{ and } N(s,t) \geq 1 } \end{align*} is contained in $\mathsf{BQL}$.

These seem to be the first examples of languages in $\mathsf{BQL}$ not known to lie in $\L$ or $\mathsf{BPL}$. As far as we are aware, before this work only promise problems were known that lie in $\mathsf{prBQL}$ but which are potentially not contained in $\mathsf{prBPL}$ [Ta-13,FL18,FR21,GLW23].

We now summarize briefly what is known classically. A language similar to $\mathrm{STCON}_{\mathrm{sf}}$, namely

$$ \mathrm{STCON}_{\mathrm{ru}} = \{ \langle G,s,t\rangle \text{ }|\text{ } \forall j \in V(G): N(s,j) \leq 1 \text{ and } N(s,t) = 1 \} $$

has been studied before. It was first introduced by Lange [Lan97] who showed that it is complete for $\mathsf{ReachUL}$, which is the class of languages decidable by an $\mathsf{NL}$-Turing machine whose configuration graphs for all inputs are reach-unambiguous, meaning that there is a unique computation path from the start configuration to any reachable configuration.

Remark: Note that the $\mathsf{ReachUL}$-hardness of $\mathrm{STCON}_{\mathrm{ru}}$ is trivial but the completeness is not. This is because the uniqueness in the definition of the complexity class is used as a restriction on the machine, while it is used as an acceptance criterion in the definition of the language.

Further, it was noticed in [Lan97, BJLR91] that $\mathsf{ReachUL}$ is contained in $\mathsf{SC}^2$, which is the class of languages decidable simulatenously in deterministic polynomial time and space $O(\log n)$. Additionally, Allender and Lange [AL98] found that $\mathrm{STCON}$ on graphs promised to be reach-unambiguous is solvable in deterministic space $O(\log^2(n)/ \log\log(n))$ implying

$$ \mathsf{ReachUL} \subseteq \mathsf{DSPACE}(O(\log^2(n)/ \log \log(n))). $$

In particular, these results put $\mathrm{STCON}_{\mathrm{ru}}$ in $\mathsf{SC}^2$ and in $\mathsf{DSPACE}(O(\log^2(n)/ \log\log(n)))$. Also, more recently Garvin, Stolee, Tewari and Vinodchandran [GSTV11] proved that $\mathsf{ReachUL}$ is equal to $\mathsf{ReachFewL}$, where $\mathsf{ReachFewL}$ is the class of languages decidable by an $\mathsf{NL}$-Turing machine whose configuration graphs for all inputs are reach-few, i.e., they have at most polynomially many computation paths from its start configuration to any reachable configuration. We thus have \begin{align*} \mathsf{StrongFewL} \subseteq \mathsf{ReachFewL} = \mathsf{ReachUL} \subseteq \mathsf{SC}^2, \quad \mathsf{DSPACE}(O(\log^2(n)/ \log \log(n))). \end{align*} Notably, while [GSTV11] implies a procedure to solve $\mathrm{STCON}$ on graphs promised to be reach-few in $\mathsf{DSPACE}(O(\log^2(n) / \log \log(n)))$ (in particular, this includes the graphs from theorem 1.1 and 1.2), this does not allow to verify whether a general graph satisfies this promise. Indeed, none of the upper bounds for $\mathrm{STCON}_{\mathrm{ru}}$ directly carry over to $\mathrm{STCON}_{\mathrm{sf}}$, for which the best known classical upper bound to date seems to be $\mathsf{DET}$.

Conclusion and open questions

Summarizing, we show that in quantum logspace we can count $st$-paths on graphs for which even solving $st$-connectivity is not known to be possible in classical (deterministic or randomized) logspace. Further, we obtain the first (non-promise) language in $\mathsf{BQL}$ not known to lie in $\mathsf{BPL}$. Our work also yields a number of open questions.

An obvious first question is whether $\mathrm{STCON}_{\mathrm{sf}}$ really separates $\mathsf{BQL}$ and $\mathsf{BPL}$. A first step towards answering this might be tackling the related question, whether we can carry over some of the known upper bounds for $\mathrm{STCON}_{\mathrm{ru}}$ to $\mathrm{STCON}_{\mathrm{sf}}$. That is, is $\mathrm{STCON}_{\mathrm{sf}}$ also contained in $\mathsf{SC}^2$ or in $\mathsf{DSPACE}(O(\log^2(n)/\log\log(n)))$? Note that the known $\mathsf{prBQL}$-complete promise problems are not known to satisfy these upper bounds [Ta-13,FL18,FR21,GLW23].

Further, we wonder whether it is possible to improve the $O(\log^2(n)/ \log \log(n))$ space bound by Allender and Lange. In fact, Allender recently asked this in an article [All23] in which he reflects on some open problems he encountered throughout his career. Allender suspects that for the restricted case of strongly-unambiguous graphs, also called mangroves, for which the number of paths between any two nodes is bounded by one, there should exist an algorithm deciding $\mathrm{STCON}$ running in deterministic space $O(\log n)$. He even offers a $1000 reward for any improvement of their space bound, already for this restricted case. A dequantization of our results would thus yield some good pocket money.

Remark: Even if the dequantization were randomized, i.e. in $\mathsf{BPSPACE}(O(\log n))$, it would still imply a $\mathsf{DSPACE}(O(\log^{3/2} n))$ bound by the inclusion $\mathsf{BPSPACE}(O(\log n)) \subseteq \mathsf{DSPACE}(O(\log^{3/2} n))$ due to Saks and Zhou [SZ99].

Another natural question is whether in $\mathsf{BQSPACE}(O(\log n))$ we can solve $\mathrm{STCON}$ on graphs promised to be reach-few, where only the number of paths from $s$ is polynomially bounded. This relaxes our promise in theorem 1.2. Unfortunately, our current approach of using an effective pseudoinverse seems to require our stronger promise. More generally, we feel that a better understanding of the singular values and vectors of directed graph matrices will yield further insights into the utility of quantum space-bounded computation for solving problems on directed graphs.

Finally, the link between poly-conditionedness and bounds on the path count raises the question of whether some variation of $\mathrm{STCON}$ could be proven complete for $\mathsf{BQL}$.

2. Space-bounded computation

In 2.1 we introduce the Turing machine model of space-bounded computation and define the most important appearing language classes. We mainly follow the definitions from Ta-Shma [Ta-13] and refer to Section 2.2 in [FR21] for a discussion on the equivalence of the quantum Turing machine model and the quantum circuit model. In Complete (promise) problems we mention some complete problems of promise versions of the defined complexity classes.

2.1 Turing machines

A deterministic space-bounded Turing machine (DTM) acts according to a transition function $\delta$ on three semi-infinite tapes: A read-only tape where the input is stored, a read-and-write work tape and a uni-directional write-only tape for the output. The TMs computation time is defined as the number of transition steps it performs on an input, and its computation space is the number of used cells on the work tape, i.e. we do not count the number of cells on the input or output tape towards its computation space. DTMs with space-bound $s(n)$ for inputs of length $n$ give rise to $\mathsf{DSPACE}(s(n))$. We define $\L$ as the class of languages decided in $\mathsf{DSPACE}(O(\log n))$ and $\L^2$ as the class of languages decided in $\mathsf{DSPACE}(O(\log^2 n))$.

A non-deterministic Turing machine (NTM) is similar to a DTM except that it has two transition functions $\delta_0$ and $\delta_1$. At each step in time the machine non-deterministically chooses to apply either one of the two. It is said to accept an input if there is a sequence of these choices so that it reaches an accepting configuration and it is said to reject the input if there is no such sequence of choices. We obtain $\mathsf{NSPACE}(s(n))$. Further, $\mathsf{NL}$ is the class of languages decided in $\mathsf{NSPACE}(O(\log n))$.

A probabilistic space-bounded Turing machine (PTM) is again similar to a DTM but with the additional ability to toss random coins. This can be conveniently formulated by a fourth tape that is uni-directional, read-only and initialized with uniformly random bits at the start of the computation. It does not count towards the space. A language is said to be decided in $\mathsf{BPSPACE}_{a,b}(s(n))$ if there is a PTM running in space $s(n)$ and time $2^{O(s(n))}$ deciding it with completeness error $a\in[0,1]$ and soundness error $b\in[0,1]$, that is every input in the language is accepted with probability at least $a$ and every input not in the language is accepted with probability at most $b$.

Remark: The time bound does not follow from the space-bound and is equivalent to demanding that the TM absolutely halts for all possible assignments of the random coins tape.

Also, we write $\mathsf{BPSPACE}(s(n))$ for $\mathsf{BPSPACE}_{\frac{1}{3},\frac{2}{3}}(s(n))$ and $\mathsf{RSPACE}(s(n))$ for $\mathsf{BPSPACE}_{\frac{1}{2},0}(s(n))$. Further, $\mathsf{BPL}$ is the class of languages decided in $\mathsf{BPSPACE}(O(\log n))$ and $\mathsf{RL}$ is the class of languages decided in $\mathsf{RSPACE}(O(\log n))$.

A quantum space-bounded Turing machine (QTM) is a DTM with a fourth tape for quantum operations instead of a random coins tape. The transition function $\delta$ is still classically described. The tape cells of the quantum tape are qubits and initialized in state $\ket{0}$ at the start of the computation. There are two tape heads moving in both directions on the quantum tape. At each step during the computation, the machine can either perform a measurement to a projection in the standard basis or apply a gate from some universal gate set, say ${\mathrm{HAD},\mathrm{T},\mathrm{CNOT}}$, to the qubits below its tape heads. The used cells on the quantum tape count towards the computation space. As before, we say a language is decided in $\mathsf{BQSPACE}_{a,b}(s(n))$ if there is a QTM running in space $s(n)$ and time $2^{O(s(n))}$ deciding it with completeness error $a$ and soundness error $b$. Also, we mean $\mathsf{BQSPACE}_{\frac{2}{3},\frac{1}{3}}(s(n))$ by $\mathsf{BQSPACE}(s(n))$ if not mentioned otherwise. Finally, $\mathsf{BQL}$ is the class of languages decided in $\mathsf{BQSPACE}(O(\log n))$. The particular choice of the universal gate set does not affect the resulting complexity class due to the space-efficient version of the Solovay-Kitaev theorem of van Melkebeek and Watson [MW10]. The same applies to disallowing intermediate measurements, thanks to the space-efficient "deferred measurement principle" proven by Fefferman and Remscrim [FR21] building on the earlier work of Fefferman and Lin [FL18] (see also [GRZ21,GR21] for an alternative time- and space-efficient version of this principle). Also, the chosen success probability of $2/3$ can be amplified by sequentially repeating computations.

2.2 Complete (promise) problems

In this work we only consider language classes. In particular, we present the first candidate to distinguish $\mathsf{BQL}$ and $\mathsf{BPL}$. Note that candidates for distinguishing the promise versions of these classes, $\mathsf{prBQL}$ and $\mathsf{prBPL}$, are well-known. As mentioned in the introduction, Fefferman and Remscrim [FR21] showed that well-conditioned promise versions of all the standard $\mathsf{DET}$-complete matrix problems are complete for $\mathsf{prBQL}$. Let us highlight that matrix powering is one of these problems. Restricted to stochastic matrices it is easily seen to be complete for $\mathsf{prBPL}$, and restricted to matrices for which the largest singular values of its powers grow at most polynomially, it is complete for $\mathsf{prBQL}$. Indeed, this seems to indicate that powering the adjacency matrix of a graph, to which our approach in theorem 1.1 essentially boils down to, rather than powering the corresponding random walk matrix, truly exploits a quantum advantage. A similar distinction of $\mathsf{prBQL}$ and $\mathsf{prBPL}$ is expected from the approximation of the spectral gap of matrices. Doron, Sarid and Ta-Shma [DST17] showed that a promise decision version of this problem is $\mathsf{prBPL}$-complete for stochastic matrices while it is $\mathsf{prBQL}$-complete for general hermitian matrices. More recently, Le Gall, Liu and Wang [GLW23] presented another group of $\mathsf{prBQL}$-complete problems based on state testing.

3. Fewness language in $\mathsf{BQL}$

In section 3.1 we introduce the notions of unambiguity and fewness in space-bounded computation. In section 3.2 we prove theorem 1.4 presenting a language in $\mathsf{BQL}$ not known to lie in $\mathsf{BPL}$.

3.1 Unambiguity and fewness

The computation of a Turing machine can be viewed as a directed graph on configurations, and certain restrictions on the Turing machine translate to natural restrictions on the corresponding configuration graph. The notions of "unambiguity" and "fewness" of a Turing machine relate to the following graph-theoretic notions.

Definition 3.1. Let $G=(V,E)$ be a directed graph and let $k$ be an integer. Then $G$ is called

$k$-unambiguous with respect to nodes $s,t\in V$ if $N(s,t) \leq k$,
$k$-reach-unambiguous with respect to node $s\in V$ if for all $j \in V,$ $N(s,j) \leq k$,
$k$-strongly unambiguous if for all $i,j \in V,$ $N(i,j) \leq k$.

In the case of $k=1$, we simply say $G$ is unambiguous, reach-unambiguous or strongly unambiguous, respectively. Furthermore, a family of directed graphs ${G_x}_{x\in X}$ is called few-unambiguous with respect to nodes $s_x, t_x \in V(G_x)$, reach-few with respect to nodes $s_x \in V(G_x)$ or strongly-few if there exists a polynomial $p:\mathbb{N} \rightarrow \mathbb{N}$ such that each of the graphs $G_x$ from the family with $|V(G_x)|=n$ nodes is $p(n)$-unambiguous with respect to $s_x$ and $t_x$, $p(n)$-reach-unambiguous with respect to $s_x$ or $p(n)$-strongly unambiguous, respectively.

Consider the following examples of the above definition due to Lange [Lan97]:

The left graph is unambiguous with respect to nodes $1$ and $6$ but not reach-unambiguous with respect to node $1$, the middle one is reach-unambiguous with respect to node $1$ but not strongly unambiguous and the right one is strongly unambiguous.

These notions of unambiguity and fewness naturally give rise to six complexity classes between $\L$ and $\mathsf{NL}$. We follow the original paper of Buntrock, Jenner, Lange and Rossmanith [BJLR91] and define

strongly-unambiguous logspace, $\mathsf{StrongUL}$,
strongly-few logspace, $\mathsf{StrongFewL}$,
reach-unambiguous logspace, $\mathsf{ReachUL}$,
reach-few logspace, $\mathsf{ReachFewL}$,
unambiguous logspace, $\mathsf{UL}$, and
few logspace, $\mathsf{FewL}$

as the classes of languages that are decidable by an $\mathsf{NL}$-Turing machine $M$ with unique accepting configuration whose family of configuration graphs for inputs $x\in\Sigma^*$, $\{G_{M,x}\}_{x\in\Sigma^*}$, satisfies the corresponding unambiguity or fewness restriction with respect to its starting and its accepting configuration.

Recall that theorem 1.1 implies the inclusion:

Corollary 1.3. $\mathsf{StrongFewL} \subseteq \mathsf{BQL}$

Putting this together with the trivial containments and the previously mentioned results in [BJLR91,Lan97,AL98,GSTV11], we obtain the following inclusion diagram to be read from left to right:

3.2 Proof of theorem 1.4

Observe that theorem 1.1 only shows that we can decide $st$-connectivity on directed graphs that are promised to be strongly-few. We now show that we can also check whether this promise holds in $\mathsf{BQSPACE}(O(\log n))$, i.e. we obtain a language containment. We restate the result mentioned in the introduction:

Theorem 1.4. The language \begin{align*} \mathrm{STCON}_{\mathrm{sf}} = \{ \langle G,s,t,1^k\rangle \text{ }|\text{ }\forall i,j\in V(G): N(i,j)\leq k \text{ and } N(s,t) \geq 1 \} \end{align*} is contained in $\mathsf{BQL}$.

The idea for proving the above theorem is to use Ta-Shma's spectrum approximation procedure to estimate the smallest singular value of the counting Laplacian $L$. In case the smallest singular value is below some threshold, this gives us a lower bound for $|L^{-1}|_{\max}$, the maximum number of paths, so that we can correctly reject graphs with too many paths. In case it is higher than the threshold, we know that $L$ is poly-conditioned and we can proceed as in theorem 1.1 to compute the number of paths between any pair of nodes exactly. Note that for this approach to work, the graph needs to be acyclic so that the counting Laplacian is invertible. We ensure this by first mapping the input graph to a layered graph that is guaranteed to be acyclic, similar as in [GSTV11]. We make the following definition.

Definition 3.2. Let $G=(V,E)$ be a directed graph with $|V|=n$ vertices. We define the layered graph $\mathrm{lay}(G)$ on the vertex set $V':=V\times{0,1,...,n}$ with two types of edges:

For all edges $(i,j)\in E$ and all $l\leq n-2$ add an edge from $(i,l)$ to $(j,l+1)$ in $\mathrm{lay}(G)$,
for all $i \in V$ and all $l\leq n-1$ add an edge from $(i,l)$ to $(i,n)$ in $\mathrm{lay}(G)$.

It is easy to see that the paths in the first $n$ layers of $\mathrm{lay}(G)$ directly correspond to paths of length less than $n$ in $G$. The last layer in $\mathrm{lay}(G)$ just serves as a catch basin for all paths of different lengths. Let us now prove the above theorem.

Proof of theorem 1.4. Let $\braket{G,s,t,1^k}$ be a given input graph instance with $|V(G)|=n$ nodes. Without loss of generality assume that $k\leq p(n)$ for some fixed polynomial $p:\mathbb{N}\rightarrow\mathbb{N}$. We first construct $\mathrm{lay}(G)$. Note that this is possible in $\mathsf{AC}^0$. We consider the counting Laplacian $L$ of $\mathrm{lay}(G)$ and run Ta-Shma's spectrum approximation algorithm (compare Theorem 5.2 in [Ta-13]) to approximate its singular values with error $\varepsilon=1/6$ and accuracy $\delta = 1/(2n\cdot k)$. If we obtain a singular value smaller than $\delta$, then we have with probability at least $5/6$,

$$ 2 \delta = \frac{1}{n\cdot k} \gt \sigma_n(L) = \frac{1}{\|L^{-1}\|_2} \geq \frac{1}{n\cdot \|L^{-1}\|_{\max}} = \frac{1}{n\cdot \max_{i,j\in V(\mathrm{lay}(G))} N(i,j)} $$

which implies $\max_{i,j \in V(G)} N(i,j) \geq \max_{i,j\in V(\mathrm{lay}(G))} N(i,j) > k$. In this case we reject the input. Otherwise, if we do not obtain a singular value smaller than $\delta$, then we know with the same probability that the counting Laplacian of $\mathrm{lay}(G)$ is poly-conditioned. Thus, we can proceed as in theorem 1.1 and run Ta-Shma's matrix inversion algorithm to determine all entries of $L^{-1}$ with total error $\varepsilon'=1/6$ and accuracy $1/3$. For all $i\in V(G)$ we check in this way whether $L^{-1}((i,0),(i,n)) \geq 2$. If this is the case for some $i\in V(G)$, then there is a cycle in $G$ containing $i$, i.e. $N(i,i)=\infty$ in $G$, and we reject the input. Otherwise, we further check whether all entries of $L^{-1}$ are upper bounded by $k$ and if $L^{-1}((s,0),(t,n)) \geq 1$. The former implies $N(i,j)\leq k$ for all $i,j\in V(G)$, and the latter implies $N(s,t)\geq 1$ in $G.$ If both conditions are satisfied, we accept the input. Otherwise, we reject it. The total error probability of the algorithm is no higher than $\varepsilon + \varepsilon' = 1/3$. ☐

4. Counting few paths in quantum logspace

In this section we show how to push the algorithmic idea behind theorem 1.1 to obtain theorem 1.2, which shows how to count $st$-paths on graphs with a polynomial bound on the number of paths leaving $s$, and on the number of paths arriving in $t$, in $\mathsf{BQSPACE}(O(\log n))$. For this, we need the notion of an effective pseudoinverse, which we introduce in section 4.1. The proof of theorem 1.2 is in section 4.2.

4.1 Effective pseudoinverse

A close look at Ta-Shma's matrix inversion algorithm shows that it can be easily altered to also handle ill-conditioned matrices as input and only invert them on their well-conditioned part. In order to appropriately state this observation we make the following definition.

Definition 4.1 Effective pseudoinverse. Let $M$ be an $n\times n$ matrix with singular value decomposition $M=\sum_{j=1}^n \sigma_j \ket{u_j} \bra{v_j}$. For $\zeta>0$ we define the $\zeta$-effective pseudoinverse of $M$ as the matrix

$$ M_\zeta^+ := \sum_{\sigma_j \geq \zeta} \sigma_j^{-1} \ket{v_j} \bra{u_j}. $$

Note that in the definition we essentially drop the largest terms of the actual inverse $M^{-1}=\sum_{j=1}^n \sigma_j^{-1} \ket{v_j} \bra{u_j}$. While this produces significant error to approximate $M^{-1}$ as a whole, we find that it can still give a good approximation for some relevant entries. We now state the refined version of Ta-Shma's matrix inversion algorithm which allows us to compute effective pseudoinverses of general matrices.

Theorem 4.2. Fix $\varepsilon(n), \zeta(n), \delta(n) > 0$ and $Z(n) \geq 1$. Let $M$ be an $n \times n$ matrix such that $Z \geq \sigma_1(M) \geq \cdots \geq \sigma_n(M)$. There is an algorithm running in $\mathsf{BQSPACE}(O(\log \frac{nZ}{\epsilon\delta}))$ that given $M$ and two indices $s,t\in [n]$ outputs an $\varepsilon$-additive approximation of the entry $M_{\widetilde{\zeta}}^+(s,t)$, where $\widetilde{\zeta}$ is a random value that is fixed at the beginning of the computation and is $\delta$-close to $\zeta$.

While it is a simple modification of [Ta-13], for completeness we provide a proof of this theorem in the appendix. The fact that we cannot control the exact threshold $\zeta$ of which singular values should be ignored during the effective pseudoinversion is a consequence of the ambiguity of quantum phase estimation, yet its uncertainty is limited by using Ta-Shma's consistent phase estimation procedure.

4.2 Proof of theorem 1.2

We now have the necessary tools to prove our final result which we recall here:

Theorem 1.2. Fix a polynomial $p: \mathbb{N} \rightarrow \mathbb{N}$. Let $G$ be a directed graph with $|V(G)|=n$ nodes such that

$\forall j \in V(G) : N(s,j) \leq p(n)$ and $N(j,t) \leq p(n)$.

The idea for the proof is to approximate the effective pseudoinverse entry of the counting Laplacian $L_{\tilde{\zeta}}^+(s,t)$ for some small enough $\tilde{\zeta} = 1/\mathop{\rm poly}(n)$ instead of the actual inverse entry $L^{-1}(s,t)$. While ignoring the smallest singular values during the effective pseudoinversion normally leaves out the largest terms, we find that our path bounds imply low overlap of $\braket{s|v_j}$ and $\braket{u_j|t}$ for small singular values $\sigma_j$ such that entry $L^+_{\tilde{\zeta}}(s,t)$ is close to $L^{-1}(s,t)$.

Proof: We again consider the counting Laplacian and its singular value decomposition $L=I-A = \sum_{j=1}^n \sigma_j \ket{u_j}\bra{v_j}$. Its inverse is $L^{-1} = \sum_{j=1}^n \sigma_j^{-1} \ket{v_j}\bra{u_j}$. Now consider the vectors $(L^{-1})^T \ket{s}$ and $L^{-1}\ket{t}$. They contain as entries the number of paths starting in $s$ and the number of paths ending in $t$, respectively. Hence, we find for their squared $l_2$-norms:

$$ \begin{align*} \left\|(L^{-1})^T \ket{s}\right\|_2^2 &= \left\| \sum_{j=1}^n \sigma_j^{-1} \ket{u_j} \braket{v_j|s} \right\|_2^2 = \sum_{j=1}^n \sigma_j^{-2} \left|\braket{v_j|s}\right|^2 \leq n \cdot p(n)^2 \qquad \text{ and similarly}\\ \left\|L^{-1}\ket{t}\right\|_2^2 &= \left\| \sum_{j=1}^n \sigma_j^{-1} \ket{v_j}\braket{u_j|t} \right\|_2^2 = \sum_{j=1}^n \sigma_j^{-2} \left|\braket{u_j|t}\right|^2 \leq n \cdot p(n)^2 \end{align*} $$

where the last equalities of both lines follow because the $\{\ket{u_j}\}_{j\in[n]}$ and $\{\ket{v_j}\}_{j\in[n]}$ are orthonormal bases. As a consequence we obtain for each $j \in [n]$

$$ \left|\braket{s|v_j}\right| \leq \sigma_j \sqrt{n} \cdot p(n) \text{ and } \left|\braket{u_j|t}\right| \leq \sigma_j \sqrt{n} \cdot p(n). $$

Combining the two we find as a bound for the $j$-th term of the singular value decomposition of $L^{-1}$,

$$ \sigma_j^{-1} \left|\braket{s|v_j} \braket{u_j|t}\right| \leq \sigma_j n \cdot p(n)^2. $$

This allows us to estimate the error for computing an effective pseudoinverse entry $L_{\widetilde{\zeta}}^+(s,t)=\sum_{\sigma_j \geq \widetilde{\zeta}} \sigma_j^{-1} \left|\braket{s|v_j} \braket{u_j|t}\right|$ instead of the actual inverse entry $L^{-1}(s,t) = N(s,t)$. In fact, for $\widetilde{\zeta}\leq1/(5n^2\cdot p(n)^2)$ we have

$$ \left| L^{-1}(s,t) - L_{\widetilde{\zeta}}^+(s,t) \right| = \left| \sum_{\sigma_j \lt \widetilde{\zeta}}^n \sigma_j^{-1} \braket{s|v_j} \braket{u_j|t} \right| \leq \sum_{\sigma_j \lt \widetilde{\zeta}}^n \sigma_j^{-1} \left| \braket{s|v_j} \braket{u_j|t} \right| \lt \widetilde{\zeta} n^2 \cdot p(n)^2 \leq 1/5. $$

Choosing $Z=n$, $\varepsilon=1/5$ and $\delta = \zeta = 1/(10n^2\cdot p(n)^2)$ in theorem 4.2 ensures $\widetilde{\zeta} \leq 1/(5n^2 \cdot p(n)^2)$ and yields an additive $2/5$ approximation of $L^{-1}(s,t)$ within the desired space complexity. Rounding to the closest integer gives the number of paths from $s$ to $t$. ☐

A natural open question is whether the simultaneous polynomial bound on (i) the number of paths starting from $s$ and on (ii) the number of paths ending in $t$ is really necessary for a $\mathsf{BQSPACE}(O(\log n))$ procedure. Unfortunately, at least with our approach above, this seems to be the case. Note that (i) implies low overlap of $\ket{s}$ with the left singular vectors $\ket{v_j}$ of $L^{-1}$ and (ii) implies low overlap of $\ket{t}$ with the right singular vectors $\ket{u_j}$ of $L^{-1}$. It turns out that if only one of the two is small, then the contribution of $\sigma_j^{-1} \braket{s|v_j} \braket{u_j|t}$ to $L^{-1}(s,t)$ can still be significant for very small $\sigma_j$ but will be ignored in the pseudoinversion.

Acknowledgements

We thank François Le Gall for discussions about the differences between language and promise classes. He made us aware that our results can be seen as first evidence that the language classes $\mathsf{BQL}$ and $\mathsf{BPL}$ are distinct. We further acknowledge useful discussions with Klaus-Jörn Lange and we thank Lior Eldar, Troy Lee and Ronald de Wolf for valuable comments on an earlier draft. RE was supported by the Program QuanTEdu-France (ANR-22-CMAS-0001). SA was partially supported by French projects EPIQ (ANR-22-PETQ-0007), QUDATA (ANR18-CE47-0010), QUOPS (ANR-22-CE47-0003-01) and HQI (ANR-22-PNCQ-0002), and EU project QOPT (QuantERA ERA-NET Cofund 2022-25).

References

[AB06] S. Arora and B. Barak. Computational Complexity: A Modern Approach. Cambridge University Press, 2006.
[AKL+79] Romas Aleliunas, Richard M. Karp, Richard J. Lipton, László Lovász, and Charles Rackoff. Random walks, universal traversal sequences, and the complexity of maze problems. In 20th Annual Symposium on Foundations of Computer Science, pages 218--223, 1979.
[AL98] Eric Allender and Klaus-Jörn Lange. RUSPACE($\log n$) $\subseteq$ DSPACE($\log^2 n/\log \log n$). Theory of Computing Systems, 31, 1998.
[All23] Eric Allender. Guest column: Parting thoughts and parting shots. SIGACT News, 54(1):63–81, 2023.
[BJLR91] Gerhard Buntrock, Birgit Jenner, Klaus-Jörn Lange, and Peter Rossmanith. Unambiguity and fewness for logarithmic space. In Fundamentals of Computation Theory, pages 168--179. Springer, 1991.
[Coo85] Stephen A. Cook. A taxonomy of problems with fast parallel algorithms. Information and Control, 64(1):2--22, 1985.
[DSTS17] Dean Doron, Amir Sarid, and Amnon Ta-Shma. On approximating the eigenvalues of stochastic matrices in probabilistic logspace. Computational Complexity, 26:393--420, 2017.
[FL18] Bill Fefferman and Cedric Yen-Yu Lin. A complete characterization of unitary quantum space. In 9th Innovations in Theoretical Computer Science Conference, volume 94 of Leibniz International Proceedings in Informatics, pages 4:1--4:21, 2018.
[FR21] Bill Fefferman and Zachary Remscrim. Eliminating intermediate measurements in space-bounded quantum computation. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, 2021.
[GLW23] François Le Gall, Yupan Liu, and Qisheng Wang. Space-bounded quantum state testing via space-efficient quantum singular value transformation. arXiv preprint arXiv:2308.05079, 2023.
[GR22] Uma Girish and Ran Raz. Eliminating intermediate measurements using pseudorandom generators. In 13th Innovations in Theoretical Computer Science Conference, volume 215, pages 76:1--76:18, 2022.
[GRZ21] Uma Girish, Ran Raz, and Wei Zhan. Quantum logspace algorithm for powering matrices with bounded norm. In 48th International Colloquium on Automata, Languages, and Programming, volume 198, pages 73:1--73:20, 2021.
[GSTV11] Brady Garvin, Derrick Stolee, Raghunath Tewari, and N. V. Vinodchandran. ReachFewL = ReachUL. In Computing and Combinatorics, pages 252--258. Springer, 2011.
[HHL09] Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems of equations. Physical Review Letters, 103(15), 2009.
[kannan2008stcon] Sampath Kannan, Sanjeev Khanna, and Sudeepa Roy. STCON in directed unique-path graphs. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, 2008.
[Lan97] Klaus-Jörn Lange. An unambiguous class possessing a complete set. In STACS 97, pages 339--350. Springer, 1997.
[LPW08] D.A. Levin, Y. Peres, and E.L. Wilmer. Markov Chains and Mixing Times. American Mathematical Society, 2008.
[MW12] Dieter van Melkebeek and Thomas Watson. Time-space efficient simulations of quantum computations. Theory of Computing, 8(1):1--51, 2012.
[Rei08] Omer Reingold. Undirected connectivity in log-space. Journal of the ACM, 55(4):1--24, 2008.
[Sav70] Walter J. Savitch. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4(2):177--192, 1970.
[SZ99] Michael Saks and Shiyu Zhou. $\mathrm{BP}_\mathrm{H}\mathrm{SPACE}(S) \subseteq \mathrm{DSPACE}(S^{3/2})$. Journal of computer and system sciences, 58(2):376--403, 1999.
[TS13] Amnon Ta-Shma. Inverting well conditioned matrices in quantum logspace. In Proceedings of the 45th Annual ACM Symposium on Theory of Computing, page 881–890, 2013.

Appendix: Effective pseudoinversion in quantum logspace

The algorithm for the effective pseudoinversion is essentially identical to Ta-Shma's matrix inversion procedure except for a small change in the rotation step to appropriately treat ill-conditioned matrices. For completeness we describe it here but refer to [Ta-13] for the detailed complexity and error analysis.

Proof sketch of theorem 4.2. We make the following two simplifying assumptions:

Without loss of generality we assume that $Z=1$. Otherwise, we simply choose $Z = n\cdot |M|_{\max}$ as an upper bound for $\sigma_1(M)$ and rescale the matrix with factor $1/Z$. Approximating the entries of a $\tilde{\big(\frac{\zeta}{Z}\big)}$-effective pseudoinverse of this rescaled matrix up to accuracy $\varepsilon \cdot Z$ gives an $\varepsilon$ approximation of the entries of a $\tilde{\zeta}$-effective pseudoinverse of $M$ within the same space complexity.
Furthermore, we assume the matrix $M$ to be Hermitian. Otherwise we use the well-known reduction to the Hermitian case

$$ \begin{align*} H = H(M) := \begin{bmatrix} 0 & M^{\dagger} \\ M & 0 \end{bmatrix} \qquad \text{ with inverse} \qquad H^{-1} = \begin{bmatrix} 0 & M^{-1} \\ (M^{\dagger})^{-1} & 0 \end{bmatrix}. \end{align*} $$

It is easily verified that the eigenpairs of $H$ are given by

$$ \left\{ \pm\sigma_j, \: \frac{1}{\sqrt{2}} \left(\ket{0}\ket{v_j} \pm \ket{1} \ket{u_j}\right) \right\}_{j \in [n]} $$

such that

$$ M^+_{\widetilde{\zeta}}(s,t) = H^+_{\widetilde{\zeta}}(s,t+n). $$

Now, let the spectral decomposition of the matrix be given by $M=H=\sum_{j=1}^n \lambda_j \ket{h_j}\bra{h_j}$ and let $\ket{t} = \sum_{j=1}^n \beta_j \ket{h_j}$. The algorithm works on four registers: An input register $I$, an estimation register $E$, a shift register $S$ and an ancillary register $A$ of dimension at least three.

1. We start by preparing the initial state

$$ \begin{equation*} \ket{t}_I\ket{0}_E\ket{0}_S\ket{\text{initial}}_A = \sum_{j=1}^n \beta_j \ket{h_j}_I\ket{0}_E\ket{0}_S\ket{\text{initial}}_A. \end{equation*} $$

2. We then apply the consistent phase estimation procedure (compare Section 5.2 in [Ta-13]) to the input, estimation and shift register and obtain a state close to

$$ \begin{align*} \sum_{j=1}^n \beta_j \ket{h_j}_I \ket{0}_E \ket{s(j)}_S \end{align*} $$

where $s(j)$ is the j-th section number, that is a fixed classical value depending on $h_j$ only, from which we can recover a $\delta$-approximation $\widetilde{\lambda}_j = \widetilde{\lambda}(s(j))$ of the eigenvalue $\lambda_j$.

3. We next approximately apply a unitary map acting on the shift and ancillary register partially described by

$$ \begin{align*} \ket{s}_S \ket{\text{initial}}_A \mapsto \begin{cases} \ket{s}_S \left( \frac{\zeta}{\widetilde{\lambda}(s)} \ket{\text{well}}_A + \sqrt{1 - \left( \frac{\zeta}{\widetilde{\lambda}(s)} \right)^2} \ket{\text{nothing}}_A \right) & \text{ if } \left|\widetilde{\lambda}(s)\right| \geq \zeta,\\ \ket{s}_S \ket{\text{ill}}_A & \text{ if } \left|\widetilde{\lambda}(s)\right| \lt \zeta. \end{cases} \end{align*} $$

4. We reverse the consistent phase estimation and are left with a state close to

$$ \begin{equation*} \sum_{\left|\widetilde{\lambda}_j\right|\geq \zeta}^n \beta_j \ket{h_j}_I \left( \frac{\zeta}{\widetilde{\lambda}_j} \ket{\text{well}}_A + \sqrt{1-\left( \frac{\zeta}{\widetilde{\lambda}_j}\right)^2} \ket{\text{nothing}}_A\right) + \sum_{\left|\widetilde{\lambda}_j \right| \lt \zeta}^n \beta_j \ket{h_j}_I \ket{\text{ill}}_A. \end{equation*} $$

Note that the approximations of the eigenvalues are monotone in the sense that $\lambda_i \leq \lambda_j$ implies $\widetilde{\lambda}_i \leq \widetilde{\lambda}_j$. From this we get the existence of $\widetilde{\zeta}_+ > 0$ and $\widetilde{\zeta}_- < 0$, both in absolute value $\delta$-close to $\zeta$, such that

$$ \left\{ j\in[n]:|\widetilde{\lambda}_j|\geq\zeta \right\} = \left\{ j\in[n]:\lambda_j\geq\tilde{\zeta}_{+} \text{ or }\lambda_j\leq\tilde{\zeta}_{-} \right\}. $$

Choosing and combining symmetric shifts for the positive and negative eigenvalues during the consistent phase estimation allows to assume $|\widetilde{\zeta}_+| = |\widetilde{\zeta}_-|$. Denoting this value by $\tilde{\zeta}$ we find

$$ \left\{ j\in[n]:|\widetilde{\lambda}_j|\geq\zeta \right\} = \left\{ j\in[n]:|\lambda_j|\geq\tilde{\zeta} \right\}. $$

5. Finally, we measure the ancillary register. If the measurement outcome is $\ket{\text{well}}_A$, this leaves us with a state close to the normalized desired one

$$ \frac{1}{\big\|M_{\widetilde{\zeta}}^+\ket{t}\big\|_2} M_{\widetilde{\zeta}}^+\ket{t}. $$

In fact, estimating the success-probability of this measurement outcome gives a good approximation of

$$ \begin{equation*} \zeta^2 \sum_{|\lambda_j|\geq\widetilde{\zeta}} \beta_j^2 \lambda_j^{-2} = \zeta^2 \Big\|M_{\widetilde{\zeta}}^+\ket{t}\Big\|_2^2 \end{equation*} $$

from which we recover $\Big|M_{\widetilde{\zeta}}^+\ket{t}\Big|_2$.

Repeating the steps above sufficiently many times lets us create multiple copies of the desired state, assuming $\Big|M_{\widetilde{\zeta}}^+\ket{t}\Big|_2$ is non-negligible. We then use Ta-Shma's space-efficient tomography procedure Theorem 6.1 in [Ta-13] to approximately learn the state, and in particular entry

$$ \frac{1}{\big\|M_{\widetilde{\zeta}}^+\ket{t}\big\|_2} \bra{s} M_{\widetilde{\zeta}}^+ \ket{t}. $$

☐

This document is in arXiv: 2408.12473.

Replacing SSHGuard with 20 Lines of Perl Code

Sun, 25 Aug 2024 15:10:00 +0200

SSHGuard is a software to block unwanted SSH login attempts. SSHGuard has a very remarkable architecture: it has a set of independent programs doing parsing, block-indication, and actual blocking. I wrote about this in Analysis And Usage of SSHGuard. SSHGuard 2.4.3 is about 100kLines of C and shell code:

     17      42     438 ./Makefile.am
    123     403    3037 ./sshguard.in
    708    2765   22860 ./Makefile.in
 130646  622873 7655483 ./parser/attack_scanner.c
    401    1202   11249 ./parser/attack_parser.y
    584    2862   22115 ./parser/tests.txt
   2061    9087   79252 ./parser/attack_parser.c
      2       5      47 ./parser/test-sshg-parser
     36     201    1219 ./parser/parser.h
     56     178    1965 ./parser/attack.c
   1035    4286   36122 ./parser/Makefile.in
    146     366    3757 ./parser/parser.c
     21      42     418 ./parser/Makefile.am
    392    1959   22246 ./parser/attack_scanner.l
    284    1276   11808 ./parser/attack_parser.h
     25      62     425 ./fw/sshg-fw-ipset.sh
    688    2706   23910 ./fw/Makefile.in
     51     202    1296 ./fw/fw.h
     38      88     574 ./fw/sshg-fw-iptables.sh
     30      70    1089 ./fw/Makefile.am
     36      88     586 ./fw/sshg-fw-ipfilter.sh
     23      56     302 ./fw/sshg-fw-pf.sh
     27      70     486 ./fw/sshg-fw-ipfw.sh
     32      72     612 ./fw/sshg-fw.in
     23      52     363 ./fw/sshg-fw-null.sh
    384    1162   10702 ./fw/hosts.c
     33      80     920 ./fw/sshg-fw-firewalld.sh
     56     175    1147 ./fw/sshg-fw-nft-sets.sh
     19      51     353 ./sshg-logtail
    132     465    4383 ./blocker/sshguard_options.c
     47     272    1673 ./blocker/sshguard_blacklist.h
    137     605    3830 ./blocker/sshguard_whitelist.h
     26     146     924 ./blocker/sshguard_log.h
    129     607    3823 ./blocker/fnv.h
    137     354    3929 ./blocker/blocklist.c
    415    1580   14356 ./blocker/sshguard_whitelist.c
     31     116    1097 ./blocker/attack.c
    152     502    5019 ./blocker/sshguard_blacklist.c
    145     670    3972 ./blocker/hash_32a.c
    678    2598   25839 ./blocker/Makefile.in
     45     285    1906 ./blocker/sshguard_options.h
    302    1199   10354 ./blocker/blocker.c
     13      19     236 ./blocker/blocklist.h
     21      39     432 ./blocker/Makefile.am
     30      73     646 ./common/sandbox.c
     51     237    2163 ./common/address.h
     41     104    1242 ./common/service_names.c
    996    4757   31742 ./common/simclist.h
    167     698    4909 ./common/config.h.in
     16      31     310 ./common/sandbox.h
   1512    5612   47636 ./common/simclist.c
     77     441    3410 ./common/attack.h
 143277  673891 8088612 total

The binary size is ca. 4MB for sshg-parser:

$ ls -l /usr/lib/sshguard
total 4928
-rwxr-xr-x   1 root root   34912 Jul 29 16:06 sshg-blocker*
-rwxr-xr-x   1 root root    1532 Jul 29 16:06 sshg-fw-firewalld*
-rwxr-xr-x   1 root root   18448 Jul 29 16:06 sshg-fw-hosts*
-rwxr-xr-x   1 root root    1198 Jul 29 16:06 sshg-fw-ipfilter*
-rwxr-xr-x   1 root root    1098 Jul 29 16:06 sshg-fw-ipfw*
-rwxr-xr-x   1 root root    1181 Apr 17  2022 sshg-fw-ipset*
-rwxr-xr-x   1 root root    1186 Jul 29 16:06 sshg-fw-iptables*
-rwxr-xr-x   1 root root    1759 Jul 29 16:06 sshg-fw-nft-sets*
-rwxr-xr-x   1 root root     975 Jul 29 16:06 sshg-fw-null*
-rwxr-xr-x   1 root root     914 Jul 29 16:06 sshg-fw-pf*
-rwxr-xr-x   1 root root     353 Jul 29 16:06 sshg-logtail*
-rwxr-xr-x   1 root root 4630632 Jul 29 16:06 sshg-parser*

The main annoyance with SSHGuard was that sometimes it did not stop properly when stopped via systemd. During powering down one machine this is especially enerving as this increases overall downtime.

As SSHGuard has this quite clean architecture where parsing and block-indication are so clearly separated, it is easy to find out what it actually tries to block. In addition at looking at the source code of the Flex rules in parser/attack_scanner.l

I now wrote Simplified SSHGuard in less than 20 lines of Perl.

In Arch Linux you can use ssshguard.

1. Firewall preparations

Firewall setup is similar to SSHGuard.

# Generated by iptables-save v1.8.6 on Sun Dec 20 13:29:18 2020
*raw
:PREROUTING ACCEPT [207:14278]
:OUTPUT ACCEPT [180:113502]
COMMIT

# Empty iptables rule file
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i eth0 -p tcp --dport 22 -m set --match-set reisbauerHigh src -j DROP
-A INPUT -i eth0 -p tcp --dport 22 -m set --match-set reisbauerLow src -j DROP
COMMIT

The sets in ipset are defined in file /etc/ipset.conf and are:

create -exist reisbauerHigh hash:net family inet hashsize 65536 maxelem 65536 counters
create -exist reisbauerLow hash:net family inet hashsize 65536 maxelem 65536 counters

The set reisbauerLow is not needed. However, sometimes it is convenient to have an already defined set, to which you can swap:

ipset swap reisbauerHigh reisbauerLow

Once you power down your machine all firewall rules and all ipset sets are lost. On rebooting the machine you initialize iptables and ipset again. Though, the actual set content is forgotten. Therefore you might run a cron-job to periodically save the reisbauerHigh set to reisbauerLow and filter for the ten most used IP addresses and store them in /etc/ipset.conf. For example:

$ ipset save reisbauerLow | tail -n +2 | sort -rnk7 | cut -d' ' -f1-3 | head
add reisbauerLow 180.101.88.244
add reisbauerLow 170.64.232.196
add reisbauerLow 170.64.204.232
add reisbauerLow 170.64.202.190
add reisbauerLow 5.135.90.165
add reisbauerLow 170.64.133.48
add reisbauerLow 61.177.172.136
add reisbauerLow 139.59.4.108
add reisbauerLow 159.223.225.209
add reisbauerLow 103.164.8.158

Above command prints the ten most offending IP addresses which can be appended to /etc/ipset.conf via cron. See Periodic seeding.

2. Perl code

Similar to SSHGuard, the simplified version reads its input from journalctl. Certain output lines of journalctl then trigger the blocking via ipset. The logic is that all unsuccessful login attempts result in an entry in an ipset, which then permanently bans that IP address from any further login attempts. I.e., the ssh daemon no longer even sees that. You are blocked "forever", unless:

You reboot, because ipset's are then reset
You specifically unblock via ipset del reisbauerHigh
You flush ipset: ipset flush reisbauerHigh

All triggering keywords or phrases are highlighted below.

#!/bin/perl -W
# Simplified version of SSHGuard with just Perl and ipset

use strict;
my ($ip, %B);
my %whiteList = ( '192.168.0' => 1 );

open(F,'-|','/usr/bin/journalctl -afb -p info -n1 -t sshd -t sshd-session -o cat') || die("Cannot read from journalctl");

while () {
    if (/Failed password for (|invalid user )(\s*\w*) from (\d+\.\d+\.\d+\.\d+)/) { $ip = $3; }
    elsif (/authentication failure; .+rhost=(\d+\.\d+\.\d+\.\d+)/) { $ip = $1; }
    elsif (/Disconnected from (\d+\.\d+\.\d+\.\d+) port \d+ \[preauth\]/) { $ip = $1; }
    elsif (/Unable to negotiate with (\d+\.\d+\.\d+\.\d+)/) { $ip = $1; }
    elsif (/(Connection closed by|Disconnected from) (\d+\.\d+\.\d+\.\d+) port \d+ \[preauth\]/) { $ip = $2; }
    elsif (/Unable to negotiate with (\d+\.\d+\.\d+\.\d+) port \d+/) { $ip = $1; }
    else { next; }

    #print "Blocking $ip\n";
    next if (defined($B{$ip}));	# already blocked
    next if (defined($whiteList{ substr($ip,0,rindex($ip,'.')) }));	# in white-list

    $B{$ip} = 1;
    `ipset -quiet add -exist reisbauerHigh $ip/32 `;
}

close(F) || die("Cannot close pipe to journalctl");

Wait, isn't that more than 20 lines of code? Yes, but if you remove comments and empty lines, drop the close(), which is not strictly needed, then you come out at below 20 lines of source code, including configuration.

The whiteList hash variable contains all those class C networks, which you do not want to block, even if the passwords are given wrong multiple times. Adding class C addresses to %whitelist should be obvious. For example:

my %whiteList = ( '10.0.0' => 1, '192.168.0' => 1, '192.168.178' => 1 );

3. Starting and stopping

Starting and stopping via systemd is 100% same as SSHGuard. systemd script is stored here:

/etc/systemd/system/multi-user.target.wants/sshguard.service

The systemd script is as below:

[Unit]
Description=Simplified SSHGuard - blocks brute-force login attempts
After=iptables.service
After=ip6tables.service
After=libvirtd.service
After=firewalld.service
After=nftables.service

[Service]
ExecStart=/usr/sbin/ssshguard
Restart=always

[Install]
WantedBy=multi-user.target

4. Periodic seeding

Below Perl script can be run every few hours to save the current set of IP addresses and store them in /etc/ipset.conf.

#!/bin/perl -W
# The top most IP addresses from reisbauerLow+High are retained in reisbauerLow,
# or more exact, every ipset which blocked more than 99 packets.
# This program must be run as root: ipset command needs this privilege
#
# Command line argument:
# 	-m	minimum number of packets blocked so far, default is 100


use strict;

use Getopt::Std;
my %opts = ('m' => 100);
getopts('m:',\%opts);
my $minBlock = defined($opts{'m'}) ? $opts{'m'} : 100;
my @F;

open(F,'-|','/bin/ipset save -sorted') || die("Cannot read from ipset");

print "create -exist reisbauerHigh hash:net family inet hashsize 65536 maxelem 65536 counters\n";
print "create -exist reisbauerLow  hash:net family inet hashsize 65536 maxelem 65536 counters\n";

while () {
    next if (! /^add reisbauer/);
    chomp;

    @F = split(/ /);
    if ($minBlock > 0) {
        next if ($#F < 6);
        next if ($F[4] < $minBlock);
    }
    printf("add -exist reisbauerLow %s\n",$F[2]);
}

close(F) || die("Cannot close pipe to ipset");

Slashdot Effect on a Single Page

Wed, 14 Aug 2024 20:40:00 +0200

1. The post Hosting Static Content with GitLab was mentioned on Hacker News and this site experienced a heavy increase in accesses. Neither PHP, nor NGINX, nor the ISP experienced any problems. Nevertheless, I had never before seen such a stark rise in accesses to this website with real readers.

The Slashdot effect usually refers to the fact, when the web-server in question goes down, or is overloaded. In my case, the web-server could handle that all without any hickups.

2. Normally, I have roughly 1,000 monthly visitors to my blog. One can clearly spot the spike, where visits go up to more than 4,000. Below charts show:

yearly visits
monthly visits
weekly visits

Above numbers are based on hefty filtering in the web-server log. Below are the numbers, which I filter out of the access.log file from NGINX. access.log had 3,101,246 lines.

Filtering	number of suppressed lines	%
class B addresses	1,527,803	49.3
class C addresses	136,109	4.4
raw IP addresses	32,566	1.1
bots identifying as bots	863,515	27.8
empty user-agent	0	0
short user-agent (<=3)	40,925	1.3
no transmitted bytes	280	0
all lowercase user-agent	8,581	0.3
HTTP code	182,837	5.9

But in July this jumped to almost 5,000 monthly visits. Initially I thought this is another bot just reading my blog.

One can clearly see that there is a spike, then a sharp decline, after the initial "hype" is over.

Above two graphs show the filtered results, i.e., filtering out bots, and other accesses, which are not "real" visitors.

3. Below graph shows the monthly 30,000 to 90,000 unfiltered visits to my web-server. One can clearly see the spike going up to 174,000 visits.

Below is the unfiltered statistics for the GitLab post:

Added 16-Aug-2024: I update my web-server on a regular basis. This time I noticed that php-fpm recommended to change max_children parameter. Below warning is way after the initial boom as described above.

Aug 02 16:20:20 ryzen php-fpm[958]: [WARNING] [pool www] server reached pm.max_children setting (20), consider raising it
Aug 02 16:20:22 ryzen sshd[979]: drop connection #0 from [170.64.236.223]:58996 on [192.168.0.20]:22 penalty: failed authentication
Aug 02 16:20:26 ryzen sshd[979]: drop connection #0 from [170.64.236.223]:36880 on [192.168.0.20]:22 penalty: failed authentication
Aug 02 16:20:26 ryzen php-fpm[958]: [WARNING] [pool www] server reached pm.max_children setting (20), consider raising it
Aug 02 16:20:29 ryzen sshd[979]: drop connection #0 from [170.64.236.223]:42996 on [192.168.0.20]:22 penalty: failed authentication
Aug 02 16:20:32 ryzen sshd[979]: drop connection #0 from [170.64.236.223]:49112 on [192.168.0.20]:22 penalty: failed authentication
Aug 02 16:20:35 ryzen php-fpm[958]: [WARNING] [pool www] server reached pm.max_children setting (20), consider raising it

Recursive Generation of Runge-Kutta Formulas

Tue, 13 Aug 2024 15:15:00 +0200

1. Notation
2. Example Runge-Kutta methods
3. Local discretization error
4. Order condition
5. Power series expansion for global error
6. Cauchy product formula
7. Recursive calculation of the order condition

Below text is based on the results in

Bibliography:

Peter Albrecht.
Michael E. Hosea and LinkedIn.
rktec.c computes the coefficients of Albrecht's expansion of the local truncation error of Runge-Kutta formulas.

1. Notation

Consider the ordinary differential value problem with initial condition:

$$ \dot y(t) = f(t,y(t)), \qquad y(t_0) = y_0 \in\mathbb{R}. $$

Nomenclature and assumptions:

Let $h$ be our step-size, $t_j = t_0 + jh$, with $j\in\mathbb{N}$.
Let $p\in\mathbb{N}$ be the order of our Runge-Kutta method, see below.
The constants $c_i$ are between zero and one, i.e., $0repeat.
Let $Y_{j+c_i} = y(t_j+c_ih)$ be the exact solution of above initial value problem at point $t=t_j + c_i h$
Let $y_{j+c_i}$ be the approximation according to below Runge-Kutta formula.
We assume $y(t)$ is $p+2$ times continously differentiable.

Runge-Kutta methods are written in their nonlinear form

$$ k_i := f\left(t_j+c_ih, y_j + h \sum_{\ell=1}^s a_{i\ell}k_\ell\right), \qquad i=1,\ldots,s; $$

with

$$ y_{j+1} = y_j + h \sum_{\ell=1}^s b_i k_\ell, \qquad j=0,1,\ldots,m-1. $$

$s$ is the number of internal stages of the Runge-Kutta method. $s$ is usually in the range of 2 to 9. The higher $s$ is, the more work you have to do for each step $j$.

Substituting $k_i$ with $y_{j+c_i}$ we obtain

$$ y_{j+c_i} := y_j + h \sum_{\ell=1}^s a_{i\ell} k_\ell. $$

We have

$$ k_i = f(t_j+c_ih,y_{j+c_i}), $$

and thus get the $(s+1)$-stage linear representation

$$ \eqalign{ y_{j+c_i} &= y_j + h \sum_{\ell=1}^s a_{i\ell} f(t_j+c_ih,y_{j+c_i}), \qquad i=1,\ldots,s,\cr y_{j+1} &= y_j + h \sum_{\ell=1}^s b_\ell f(t_j+c_\ell h,y_{j+c_i}), \qquad j=0,\ldots,m-1. } $$

In matrix notation this is

$$ \eqalign{ y_{j+c} &= y_j e + h A f(t_j+c_ih,y_{j+c}), \qquad\hbox{"internal" stages}\cr y_{j+1} &= y_j + hb f(t_{j+c},y_{j+c}), \qquad\hbox{"last" stage.} } $$

Using the matrix

$$ E = \pmatrix{ 0 & \ldots & 0 & 1\cr \vdots & \ddots & \vdots & \vdots\cr 0 & \ldots & 0 & 1 } \in \mathbb{R}^{s\times(s+1)}. $$

we could write the whole process as $\tilde y_{j+1} = E \tilde y_j + \cdots$.

Here we use $c$ as vector and multiindex simultaneously:

$$ y_{j+c} := \pmatrix{ y_{j+c_1}\cr \vdots\cr y_{j+c_s} }, \quad e := \pmatrix{ 1\cr \vdots\cr 1 }, \quad f(t_{j+c},y_{j+c}) := \pmatrix{ f(t_tj+c_1h,y_{j+c_1})\cr \vdots\cr f(t_j+c_sh,y_{j+c_s}) }, $$

and

$$ A = \pmatrix{ a_{11} & \ldots & a_{1s}\cr \vdots & \ddots & \vdots\cr a_{s1} & \ldots & a_{ss} }, \qquad b := \pmatrix{ b_1\cr \vdots\cr b_s }, \qquad c = \pmatrix{ c_1\cr \vdots\cr c_s } $$

This corresponds to the classical Runge-Kutta Butcher tableau:

$$ \begin{array}{c|c} c & A\\ \hline & b^T \end{array} $$

. Definition. Above method is called an s-stage Runge-Kutta method.

It is an explicit method, if $a_{ij}=0$ for $j\ge i$, i.e., $A$ is a lower triangular matrix.
It is implicit otherwise.
It is called semi-implicit, or SIRK, if $a_{ij}=0$ for $j\gt i$ but $a_{ii}$ for at least one index $i$.
It is called diagonally implicit, or DIRK, if $a_{ii}\ne0$ and $a_{ij}=0,$ for $j\gt i$.

# Runge-Kutta methods ## explicit ## implicit ### semi-implicit ### diagonally implicit

In the following we use componentwise multiplications for the vector $c$:

$$ c^2 = \pmatrix{c_1^2\cr \vdots\cr c_s^2}, \qquad\ldots\qquad, c^\ell = \pmatrix{c_1^\ell\cr \vdots\cr c_s^\ell}. $$

2. Example Runge-Kutta methods

See the book Peter Albrecht, "Die numerische Behandlung gewöhnlicher Differentialgleichungen: Eine Einführung unter besonderer Berücksichtigung zyklischer Verfahren", 1979.

The classical Runge-Kutta method of order 4 with 4 stages.

$$ \begin{array}{c|cccc} {1\over2} & {1\over2}\\ {1\over2} & 0 & {1\over2}\\ 1 & 0 & 0 & 1\\ \hline & {1\over6} & {1\over3} & {1\over3} & {1\over6} \end{array} $$

Kutta's method or 3/8-method of order 4 with 4 stages.

$$ \begin{array}{c|cccc} {1\over3} & {1\over3}\\ {2\over3} & -{1\over3} & 1\\ 1 & 1 & -1 & 1\\ \hline & {1\over8} & {3\over8} & {3\over8} & {1\over8} \end{array} $$

Gill's method of order 4 with 4 stages.

$$ \begin{array}{c|cccc} {1\over2} & {1\over2}\\ {1\over2} & {\sqrt{2}-1\over2} & {2-\sqrt{2}\over2}\\ 1 & 1 & -{\sqrt{2}\over2} & 1+{\sqrt{2}\over2}\\ \hline & {1\over6} & {2-\sqrt{2}\over6} & {2+\sqrt{2}\over6} & {1\over6} \end{array} $$

Above examples show that order $p$ could be obtained with $s=p$ stages. Butcher showed that for $p\ge5$ this is no longer possible.

Butcher method of order 5 with 6 stages.

$$ \begin{array}{c|ccccc} {1\over4} & {1\over4}\\ {1\over4} & {1\over8} & {1\over8}\\ {1\over2} & 0 & -{1\over2} & 1\\ {3\over4} & {3\over16} & 0 & 0 & {9\over16}\\ 1 & -{3\over7} & {2\over7} & {12\over7} & -{12\over7} & {8\over7}\\ \hline & {7\over90} & 0 & {32\over90} & {12\over90} & {32\over90} & {7\over90} \end{array} $$

Runge-Kutta-Fehlberg method of order 4 with 5 internal stages. Also called RKF45. The embedded 5-th order method is only used for step-size control.

$$ \begin{array}{c|cccccc} {1\over4} & {1\over4}\\ {3\over8} & {3\over32} & {9\over32}\\ {12\over13} & {1932\over2197} & -{7200\over2197} & {7296\over2197}\\ 1 & {439\over216} & -8 & {3680\over513} & -{845\over4104}\\ {1\over2} & -{8\over27} & 2 & -{3544\over2565} & {1859\over4104} & -{11\over40}\\ \hline p=5 & {16\over135} & 0 & {6656\over12825} & {28561\over56430} & -{9\over50} & {2\over55}\\ p=4 & {25/216} & 0 & {1408\over2465} & {2197\over4104} & -{1\over5} & 0 \end{array} $$

Dormand-Prince method of order 4 with internal 5 stages, called DOPRI45.

$$ \begin{array}{c|ccccccc} {1\over5} & {1\over5}\\ {3\over10} & {3\over40} & {9\over40}\\ {4\over5} & {44\over45} & -{56\over15} & {32\over9}\\ {8\over9} & {19372\over6561} & -{25360\over2187} & {64448\over6561} & -{212\over729}\\ 1 & {9017\over3168} & -{355\over33} & {46732\over5247} & {49\over176} & -{5103\over18656}\\ 1 & {35\over384} & 0 & {500\over1113} & {125\over192} & -{2187\over6784} & {11\over84}\\ \hline p=5 & {35\over384} & 0 & {500\over1113} & {125\over192} & -{2187\over6784} & {11\over84}\\ p=4 & {5179\over57600} & 0 & {7571\over16695} & {393\over640} & -{92097\over339200} & {187\over 2100} & {1\over40} \end{array} $$

Implicit Gauß-method of order 4 with 2 internal stages.

$$ \begin{array}{c|cc} {3-\sqrt{3}\over6} & {1\over4} & {3-2\sqrt{3}\over12}\\ {3+\sqrt{3}\over6} & {3+2\sqrt{3}\over12} & {1\over4}\\ \hline & {1\over2} & {1\over2} \end{array} $$

Implicit Gauß-method of order 6 with 3 internal stages.

$$ \begin{array}{c|ccc} {5-\sqrt{15}\over10} & {5\over36} & {10-3\sqrt{15}\over45} & {25-6\sqrt{15}\over180}\\ {1\over2} & {10+3\sqrt{15}\over72} & {2\over9} & {10-3\sqrt{15}\over72}\\ {5+\sqrt{15}\over10} & {25+6\sqrt{15}\over180} & {10+3\sqrt{15}\over45} & {5\over36}\\ \hline & {5\over18} & {4\over9} & {5\over18} \end{array} $$

Erwin Fehlberg (1911-1990), John C. Butcher (1933), Carl Runge (1856-1927), Martin Wilhelm Kutta (1867-1944).

3. Local discretization error

The local discretization error is when you insert the exact solution into the numerical formula and look at the error that ensues.

There are two local discretization errors $d_{j+c}\in\mathbb{R}^s$ and $d_{j+1}\in\mathbb{R}$: one for the "internal" stages, and one for the "last" stage.

$$ \eqalign{ Y_{j+c} &= Y_j e + h A f(t_{j+c},Y_{j+c}) + h d_{j+c},\cr Y_{j+1} &= Y_j + h b^T f(t_{j+c},Y_{j+c}) + h \hat d_{j+1}, \qquad j=0,\ldots,m-1. }\tag{*} $$

2. Definition. The $d_{j+c}$ and $d_j$ are called local discretization errors.

Using

$$ Y_j^{(i)} = {d^i y(t_j)\over dt^i}, \qquad Y_j^{(1)} = f(t_j,y(t_j)), $$

and by Taylor expansion at $t_j$ for $Y_j$ we get for index $i=1$:

$$ \eqalign{ Y_{j+c_1} &= y(t_j+c_1h) = Y_j + {1\over1!}Y_j^{(1)} c_1h + {1\over2!}Y_j^{(2)}(c_1h)^2 + \cdots\cr f(t_{j+c_1},y(t_{j+c_1})) &= \dot y(t_j+c_1h) = Y_j^{(1)} + {1\over1!}Y_j^{(2)}c_1h + {1\over2!}Y_j^{(3)}(c_1h)^2 + \cdots } $$

The other indexes $i=2,\ldots,s$ are similar. We thus get for all the stages

$$ \eqalign{ d_{j+c} &= \pmatrix{ \gamma_{11} Y_j^{(1)} h + \gamma_{12} Y_j^{(2)} h^2 + \gamma_{13} Y_j^{(3)} h^3 + \cdots\cr \qquad\vdots \qquad\qquad\qquad \ddots\cr \gamma_{s1} Y_j^{(1)} h + \gamma_{s2} Y_j^{(2)} h^2 + \gamma_{s3} Y_j^{(3)} h^3 + \cdots } = \sum_{\ell=1}^{p+1} \gamma_\ell Y_j^{(\ell)} h^{\ell-1} + \cal{O}(h^{p+1}),\cr \hat d_{j+1} &= \sum_{\ell=1}^{p+1} \hat\gamma_\ell Y_j^{(\ell)} h^{\ell-1} + \cal{O}(h^{p+1}). } $$

Using

$$ \Gamma = \left( \gamma_1, \ldots, \gamma_{p+1} \right) = \pmatrix{ \gamma_{11} & \ldots & \gamma_{1,p+1}\cr \vdots & \ddots & \vdots\cr \gamma_{s1} & \ldots & \gamma_{s,p+1} } \in \mathbb{R}^{s\times(p+1)}, $$

the "error vectors" $\gamma_\ell\in\mathbb{R}^s$ and the error factor $\hat\gamma_\ell\in\mathbb{R}$ are

$$ \eqalign{ \gamma_\ell &= {1\over\ell!} c^\ell - {1\over(\ell-1)!} A c^{\ell-1}, \qquad \ell=1,\ldots,p+1;\cr \hat\gamma_\ell &= {1\over\ell!} - {1\over(\ell-1)!} b^T c^{\ell-1}, \qquad c^\ell := \left(c_1^\ell,\cdots,c_s^\ell\right)^T } $$

The "internal" stages are consistent iff $\gamma_\ell=0\in\mathbb{R}^s$ for $\ell=1,\ldots,p$. Now comes the kicker:

the last stage of the method may furnish approximations of order $p$ even if $\gamma_\ell=0$ does not hold.

4. Order condition

Define the global error for $y_j$:

$$ q_{j+c} := Y_{j+c} - y_{j+c} \in\mathbb{R}^s, \qquad \hat q_{j+1} := Y_{j+1} - y_{j+1} \in\mathbb{R} $$

and for $f(t,y)$:

$$ Q_{j+c} := f(t_{j+c},Y_{t+c}) - f(t_{j+c},y_{j+c}) \in\mathbb{R}^s. $$

. Theorem. (General order condition.) Assume that the local discretization errors $\hat d_j$ is $\cal O(h^p)$, $\forall j$.

(a) The Runge-Kutta method then converges with order $p$, i.e., $\hat q_j = \cal O(h^p)$, iff

$$ b^T Q_{t+c} = {\cal O}(h^p), \quad\forall j. $$

(b) This happens iff for the global error $q_{j+c}$ of the internal stages the following holds:

$$ \eqalign{ q_{j+c} &= {\cal O}(h^p) + h d_{j+c} + h A Q_{j+c},\cr h d_{j+c} &= \sum_{i=2}^{p+1} \gamma_i Y^{(i)}_j h^i + {\cal O}(h^{p+2}). } $$

Proof: Use the fact that

$$ \hat d_{j+1} = {\cal O}(h^p), \qquad h \sum_{\ell=1}^j \hat d_{\ell+1} = \cal O(h^p). $$

Then

$$ \eqalign{ q_{j+c} &= \hat q_j e + h d_{j+c} + h A Q_{t+j},\cr \hat q_0 &= 0,\cr \hat q_{j+1} &= \hat q_j + h \hat d_{j+1} + h b^T Q_{j+c},\cr &= h \sum_{\ell=0}^j \hat d_{\ell+1} + h \sum_{\ell=0}^j b^T Q_{\ell+c}. } $$

☐

Above theorem gives the general order condition for Runge-Kutta methods. However, in this form it is not practical to find the parameters $c,$ $b,$ and $A$.

Further outline:

By one-dimensional Tayler expansion the $Q_{j+c}$ are expressed by $q_{j+c}$.

5. Power series expansion for global error

Let

$$ g_\ell(t) := {(-1)^{\ell+1}\over\ell!} {\partial^\ell\over\partial y^\ell}f(t,y(t)), \qquad D := \mathop{\rm diag}(c_1,\ldots,c_s) \in\mathbb{R}^{s\times s}. $$

Further

$$ G_\ell(t) := \mathop{\rm diag}\left( g_\ell(t_j+c_1h),\ldots,g_\ell(t_j+c_sh)\right). $$

By Taylor expansion at $t_j$:

$$ G_\ell(t) = g_\ell(t_j) + h D g_l'(t_j) + {1\over2!}h^2 D^2 g_\ell''(t_j) + \cdots \tag{G} $$

. Theorem. With above definitions the $Q_{j+c}$ can be expressed by powers of $q_{j+c}$:

$$ Q_{j+c} = G_1(t_j) q_{j+c} + G_2(t_j) q_{j+c}^2 + G_3(t_j) q_{j+c}^3 + \cdots. $$

Proof: The $i$-th component of $Q_{j+c}$ is

$$ Q_{j+c_i} = f(t_j+c_ih,Y_{j+c_i}) - f(t_j+c_ih,y_{j+c_i}) \in\mathbb{R}. $$

Taylor expansion at $y=Y_{j+c_i}$ gives

$$ -f(t_j+c_ih,y_{j+c_i}) = -f(t_j+c_ih,Y_{j+c_i}) + \sum_{\ell=1}^p g_\ell(t_j+c_ih)(Y_{j+c_i}-y_{j+c_i})^\ell + \cdots. $$

Hence

$$ Q_{j+c_i} = \sum_{\ell=1}^p g_\ell(t_j+c_ih) q_{j+c_i}^\ell + \cdots. $$

☐

The following two nonlinear equations vanish:

$$ \pmatrix{0\cr 0} = \pmatrix{u\cr v} := \pmatrix{ q_{j+c} - h d_{j+c} - h A Q_{t+c} - {\cal O}(h^p)\cr Q_{t+c} - G_1(t_j) q_{t+c} - G_2(t_j) q_{j+c}^2 - G_3(t_j) q_{j+c}^3 - \cdots } \in \mathbb{R}^{s+1}. $$

The right side is analytical at $h=0$, $Q_{t+c}=0$, and $q_{j+c}=0$. The theorem of implicit functions says:

the solution $Q_{t+c}$ exists and is unique,
the solution is itself analytical.

Its Taylor expansion is

$$ \pmatrix{q_{t+c}(h)\cr Q_{t+c}(h)} = \pmatrix{ r_2(t_j) h^2 + r_3(t_j) h^3 + \cdots + r_{p-1} h^{p-1} + {\cal O}(h^p)\cr w_2(t_j) h^2 + w_3(t_j) h^3 + \cdots + w_{p-1} h^{p-1} + {\cal O}(h^p) } \tag{T} $$

Once can see that the term with $h^0$ and $h^1$ are equal to zero.

The general order condition $b^T Q_{t+c} = 0$ reduces to below orthogonal relations:

$$ b^T w_i(t_j) = 0 \in\mathbb{R}, \qquad i=2,\ldots,p-1. $$

6. Cauchy product formula

In below recap we are interested in the formula, not the convergence conditions, therefore skip these conditions.

. Theorem. (Cauchy product formula for finitely many power series.) Multiply $n$ power series:

$$ \prod_{k=1}^n \left( \sum_{\nu\ge0} a_{\nu k} z_k^\nu \right) = \sum_{|\alpha|\ge0} a_\alpha z^\alpha, $$

with the multiindex notation:

$$ \eqalign{ \alpha &= (\alpha_1,\ldots,\alpha_n)\cr |\alpha| &= \alpha_1 + \cdots + \alpha_n\cr a_\alpha &= a_{1\alpha_1} a_{2\alpha_2} \cdots a_{n\alpha_n}\cr z &= (z_1,\ldots,z_n)\cr z^\alpha &= z_1^{\alpha_1} \cdots z_n^{\alpha_n} } $$

Similarly, when the power series start at some index $\beta_i$.

. Theorem. (Cauchy product formula for finitely many power series.) Multiply $n$ power series:

$$ \prod_{k=1}^n \left( \sum_{\nu\ge\beta_k} a_{\nu k} z_k^\nu \right) = \sum_{|\alpha|\ge0} a_{\alpha+\beta} z^{\alpha+\beta}, $$

with the multiindex notation:

$$ \eqalign{ \alpha &= (\alpha_1,\ldots,\alpha_n)\cr \beta &= (\beta_1,\ldots,\beta_n)\cr \alpha+\beta &= (\alpha_1+\beta_1,\ldots,\alpha_n+\beta_n)\cr |\alpha| &= \alpha_1 + \cdots + \alpha_n\cr a_\alpha &= a_{1\alpha_1} a_{2\alpha_2} \cdots a_{n\alpha_n}\cr z &= (z_1,\ldots,z_n)\cr z^\alpha &= z_1^{\alpha_1} \cdots z_n^{\alpha_n} } $$

. Theorem. (Cauchy product formula for infinite many power series.) Multiply infinite many power series:

$$ \prod_{k\ge1}^n \left( \sum_{\nu\ge\nu_0} a_{\nu k} z_k^\nu \right) = \sum_{|\alpha|\ge\nu_0} a_\alpha z^{\alpha+\nu_0}, $$

with the infinite multiindex notation:

$$ \eqalign{ \alpha &= (\alpha_1,\alpha_2,\ldots)\cr |\alpha| &= \alpha_1 + \alpha_2 + \cdots\cr a_\alpha &= a_{1\alpha_1} a_{2\alpha_2} \cdots\cr z &= (z_1,z_2,\ldots)\cr z^\alpha &= z_1^{\alpha_1} z_2^{\alpha_2} \cdots } $$

For the special case $z_1=z_2=\cdots$

$$ z^{\alpha+\nu_0} = z^{\alpha_1+\nu_0} z^{\alpha_2+\nu_0} \cdots $$

you can substitute the right power of $z$ with $z^{|\alpha|+\nu_0}$, therefore

$$ \prod_{k\ge1} \left( \sum_{\nu\ge0} a_{k\nu} z^\nu \right) = \sum_{|\alpha|\ge0} a_\alpha z^{|\alpha|} $$

7. Recursive calculation of the order condition

. Theorem. (Recursion 0.) The $r_i$ and $w_i$ from (T) can be obtained from below recursion formula for $i=2,3,\ldots,p$:

$$ r_i(t_j) = \gamma_i Y_j^{(i)} + A w_{i-1}(t_j), \qquad w_1(t_j) := 0 \in\mathbb{R}^s, $$

and using the infinite multiindex $\alpha=(\alpha_1,\alpha_2,\ldots)$

$$ \eqalign{ w_i(t_j) &= \sum_{\ell=0}^{i-2} D^\ell {1\over\ell!} g_1^{(\ell)}(t_j) r_{i-\ell}(t_j)\cr &+ \sum_{\ell=0}^{i-4} D^\ell \sum_{\substack{\alpha_1,\alpha_2\ge2\\ \alpha_1+\alpha_2+\ell=i}} {1\over\ell!} g_2^{(\ell)}(t_j) r_{\alpha_1}(t_j) r_{\alpha_2}(t_j)\cr &+ \sum_{\ell=0}^{i-6} D^\ell \sum_{\substack{\alpha_1,\alpha_2,\alpha_3\ge2\\ \alpha_1+\alpha_2+\alpha_3+\ell=i}} {1\over\ell!} g_2^{(\ell)}(t_j) r_{\alpha_1}(t_j) r_{\alpha_2}(t_j) r_{\alpha_3}(t_j)\cr &+ \cdots\cr &= \sum_{k\ge1} \sum_{\ell=0}^{i-2k} D^\ell \sum_{\substack{\alpha\ge2\\ |\alpha|+\ell=i}} {1\over\ell!} g_k^{(\ell)}(t_j) r_\alpha(t_j) . } $$

Proof: The original proof by Albrecht is using induction. This proof is direct and uses the Cauchy product.

We omit $t_j$ in the following. According theorem and (T) we have

$$ Q_{j+c} = G_1 \left(\sum_\ell r_\ell h^\ell\right) + G_2 \left(\sum_\ell r_\ell h^\ell\right)^2 + G_3 \left(\sum_\ell r_\ell h^\ell\right)^3 + \cdots $$

Now use (G) and Cauchy product formula:

$$ \eqalign{ Q_{j+c} &= \sum_{\nu=1}^p G_\nu q_{j+c}^\nu + {\cal O}(h^p)\cr &= \sum_{\nu=1}^p G_\nu \sum_{|\alpha|\ge0} r_{\alpha_\nu+2} h^{|\alpha|+2} + {\cal O}(h^p)\cr &= \sum_{\nu=1}^p \sum_{|\alpha|\ge0} \sum_{\ell\ge0} {1\over\ell!} D^\ell g_\nu^{(\ell)} r_{\alpha_\nu+2} h^{|\alpha|+\ell+2} + {\cal O}(h^p) } $$

Now group by common powers of $h$ and you get the formula for $w_i$. ☐

Recursion 0 is the basis of all further considerations. With the Ansatz

$$ \eqalign{ w_i(t_j) &= \sum_{\ell=1}^{m_i} \rho_{i\ell} \alpha_{i\ell} e_{i\ell}(t_j),\cr r_i(t_j) &= \sum_{\ell=0}^{m_i-1} \sigma_{i\ell} \beta_{i\ell} e_{i-1,\ell}(t_j), \qquad e_{i-1,0} := Y^{(i)}. } $$

Hosea (1995) in his program rktec.c computes these $\rho_{i\ell}$ and $\sigma_{i\ell}$.

Main result: The condition $b^T w_i(t_j)=0$ must be satisfied for any $f$, and thus for various $e_{i\ell}$. This is the case if

$$ b^T \alpha_{i\ell} = 0, \qquad i=2,\ldots,p, \quad \ell=1,\ldots,m_i. $$

Thus the order condition becomes independent of the $e_{i\ell}$, i.e., independent of the initial value problem at hand.

Recursion 1:

$$ \eqalign{ R_i^\alpha := &\{\gamma_i\} \cup \left\{Aw \mid w\in W_{i-1}^\alpha\right\}, \qquad W_1^\alpha:= \emptyset, \quad i=2,3,\ldots\cr W_i^\alpha := &\{D^\ell r_1 \mid r_1\in R_{i-\ell}^\alpha, \quad\ell=0,\ldots,i-2\}\cr & \cup \left\{D^\ell r_1\cdot r_2 \mid r_1\in R_{\alpha_1}^\alpha, \quad r_2\in R_{\alpha_2}^\alpha, \quad\ell=0,\ldots,i-4, \quad \alpha_1+\alpha_2+\ell=i \right\}\cr & \cup \left\{D^\ell r_1\cdot r_2\cdot r_3 \mid r_1\in R_{\alpha_1}^\alpha, \: r_2\in R_{\alpha_2}^\alpha, \: r_3\in R_{\alpha_3}^\alpha, \: \ell=0,\ldots,i-6, \: |\alpha|+\ell=i\right\}\cr & \cup \quad \cdots } $$

. Theorem. (Main result) Let the $\alpha_{i\ell}\in\mathbb{R}^s$ be obtained from Recursion 1. The Runge-Kutta method then converges with order $p\ge3$ if the last stage has order of consistency of $p$, i.e.,

$$ b^T c^{\ell-1} = {1\over\ell}, \qquad\ell=1,\ldots,p, $$

and if

$$ b^T \alpha_{i\ell} = 0, \qquad i=2,\ldots,p-1, \quad \ell=1,\ldots,m_\ell. $$

Hosea (1995) notes that

Through order sixteen, for instance, ... implies a total of 1,296,666 terms (including the quadrature error coefficients) while there are "only" 376,464 disinct order conditions.

Hosting Static Content with GitHub

Tue, 16 Jul 2024 19:30:00 +0200

I wrote about hosting static sites on various platforms:

Hosting Static Content with surge.sh
Hosting Static Content with now.sh, now.sh renamed themself to vercel.app
Hosting Static Content with netlify.app
Hosting Static Content with Cloudflare
Hosting Static Content with Neocities
Hosting Static Content with GitLab

GitHub provides every user the possibility to host static web pages on GitHub. Similar to GitLab. Here are the steps to do that.

1. Create the repository eklausme.github.io. Change eklausme to your GitHub username.

2. In menu "Settings", check that the default branch is set to master, or whatever you use.

3. Create a file static.yml in the directory .github/workflow, which controls the CI process and is as below:

# Simple workflow for deploying static content to GitHub Pages
name: Deploy static content to Pages

on:
  # Runs on pushes targeting the default branch
  push:
    branches: ["master"]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
  contents: read
  pages: write
  id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
  group: "pages"
  cancel-in-progress: false

jobs:
  # Single deploy job since we're just deploying
  deploy:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Setup Pages
        uses: actions/configure-pages@v5
      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3
        with:
          # Upload entire repository
          path: '.'
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

This CI was taken from:

Choose "New Workflow"

Choose "Pages" at the bottom

4. Contrary to GitLab, where you put your static content in a specific directoy, in GitHub you place all static content under the root directory. Add the content as usual.

git add .
git commit -m"..."
git push

Once you push to your Git repository a job will be started governed by above static.yml and produces the intended website. This website is then accessible at

https://eklausme.github.io/

5. This blog uses Simplified Saaze. I use below command to generate all my static HTML pages:

php saaze -mortb /tmp/build

Then add your images, PDF, JS, CSS, redirects, etc.

6. Limitations. Contrary to GitLab, GitHub does have some "hidden" limitations, which GitLab does not have. Either the number of files or the sum of files is limited in GitHub. I had to delete my /img directory, because otherwise GitHub would not host it. Therefore I would recommend GitLab over GitHub.

Hosting Static Content with GitLab

Sun, 14 Jul 2024 23:00:00 +0200

I wrote about hosting static sites on various platforms:

Hosting Static Content with surge.sh
Hosting Static Content with now.sh, now.sh renamed themself to vercel.app
Hosting Static Content with netlify.app
Hosting Static Content with Cloudflare
Hosting Static Content with Neocities

GitLab provides every user the possibility to host static web pages on GitLab. Here are the steps to do that.

1. Create a public repository, i.e., with the name p. GitLab also calls it project. I chose a repository name which is short, as this repository name will be part of the URL.

2. In menu "Deploy", go to "Pages":

Then deselect unique domain:

3. Create a file .gitlab-ci.yml at the top level (not below public), which controls the CI process and is as below:

image: busybox
pages:
  stage: deploy
  script:
  - echo 'Nothing to do...'
  artifacts:
    paths:
    - public
    expire_in: 1 day
  rules:
    - if: $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH

4. GitLab requires all static content under directory public. Therefore, add your content to the public directory, commit, and push in Git as usual:

git add .
git commit -m"..."
git push

Once you push to your Git repository a job will be started governed by above .gitlab-ci.yml and produces the intended website. This website is then accessible at

https://eklausme.gitlab.io/p

As the repository name is part of the URL I made it short.

5. This blog uses Simplified Saaze. I use below command to generate all my static HTML pages with /p as prefix, or "relative base":

time RBASE=/p php saaze -mortb /tmp/build

The time command is only there because I am obsessed with speed.

Then add your images, PDF, JS, CSS, redirects, etc. to the /tmp/build directory.

Added 28-Jul-2024: Referenced in Hacker News and my web-server experienced a light Slashdot effect, i.e., 3,000 real visitors, and 24,000 accesses in total from bots, aggregators, etc.

See Slashdot Effect on a Single Page.

Das Fehlerverhalten zusammengesetzer linearer Mehrschrittformeln

Mon, 01 Jul 2024 22:15:00 +0200

1. Konsistenz, Konsistenzordnung und Fehlerkonstanten
2. Die Anwendung linearer Mehrschrittverfahren bei DAE
3. Mehrere Charakterisierungen der Konsistenzordnung
4. Die erste Dahlquist-Barriere
5. Die zweite Dahlquist-Barriere
6. Annullierte Dominanz und Totalannullation
7. Das $n$-dimensionale äußere Produkt für $n-1$ Vektoren
8. Äußeres Produkt und Fehlerkonstanten
9. Rechenregeln für Fehlerkonstanten

Bei zusammengesetzten Verfahren, also Verfahren mit mehr als einer Stufe, besitzt ersteinmal jede Stufe für sich eine eigene Fehlerkonstante im herkömmlichen Sinne. Dennoch zeigt z.B. die zyklische Hintereinanderausführung des impliziten und expliziten Euler-Verfahrens, daß das Einzelverhalten der Stufen nicht unbedingt auch das Gesamtverhalten des Zykluses wiedergibt. Das implizite Euler-Verfahren für sich alleine betrachtet hat die Konvergenzordnung 1, ebenso hat das explizite Euler-Verfahren für sich alleine betrachtet die Konvergenzordnung 1. Das zusammengesetzte Verfahren hat allerdings schon die Konvergenzordnung 2. Es ist nun naheliegend zu fragen, ob noch höhere Konvergenzordnungssprünge möglich sind. Desweiteren wird man für diesen Sprung der Konvergenzordnung eine Erklärung wünschen.

Allerdings wird man nicht in so unstetigen Übergängen denken wollen. Bei den klassischen Verfahren, wie linearen Mehrschrittformeln oder Runge-Kutta-Verfahren, ist bekannt, daß ein sehr genaues Verfahren der Ordnung $p$, sich ähnlich verhält wie ein sehr ungenaues Verfahren der eins höheren Ordnung $p+1$. Man erwartet also vielmehr einen gleitenden Übergang zwischen Verfahren. Paradebeispiel ist hierfür übrigens das $\vartheta$-Verfahren

$$ y_{n+1}-y_n = h(\vartheta f_{n+1}+(1-\vartheta)f_n),\qquad n=0,1,\ldots $$

Für $\vartheta=1/2$ erhält man die implizite Trapezregel mit der Ordnung 2 und in allen anderen Fällen nur Verfahren der Ordnung 1, insbesondere für $\vartheta=0$ ein explizites Verfahren. Es zeigt sich nun, daß der primäre dominante lokale Fehler eine erste Auskunft gibt über das Fehlerverhalten. Verschwindet der primäre dominante lokale Fehler, liegt also der Fall der annullierten Dominanz vor, so gibt der sekundäre dominante lokale Fehler ein weiteres Bild über das Fehlerverhalten.

Zuerst erscheint zur Klarstellung der Bezeichnungen, die Festlegung des Konsistenzbegriffes. Anschliessend werden eine Reihe von zueinander äquivalenten Beschreibungsmöglichkeiten für hohe Konsistenzordnung gegeben. Diese Beschreibungen sind direkt anwendbar zur Berechnung neuer Verfahren. Eine Reihe von Fehlerkonstanten werden miteinander verglichen und Gemeinsamkeiten deutlich gemacht.

1. Konsistenz, Konsistenzordnung und Fehlerkonstanten

Es sei

$$ \alpha_0y_n+\alpha_1y_{n+1}+\cdots+\alpha_ky_{n+k} = h\left(\beta_0f_n+\beta_1f_{n+1}+\cdots+\beta_kf_{n+k}\right), \qquad\alpha_k\ne0, $$

ein lineares $k$-Schrittverfahren. An die Koeffizienten des linearen $k$-Schrittverfahrens sind gewisse Einschränkungen zu stellen, damit die Lösungen, die durch das $k$-Schrittverfahren berechnet werden, auch etwas mit der Lösung der Differentialgleichung zu tun haben. Man unterscheidet zweierlei Bedingungsarten: einmal Konsistenzbedingungen und zum anderen Stabilitätsbedingungen. Es zeigt sich nachher, daß die Stabilitätsbedingung die einschränkendere Bedingung ist. Die Konsistenzbedingungen führen auf lineare Gleichungssyteme. Die Stabilitätsbedingungen führen auf nicht-lineare und Gleichungs- und Ungleichungssysteme.

Die Konsistenzbedingungen sind:

$$ C_{p,k}\cdot\pmatrix{\alpha\cr \beta\cr} = \pmatrix{A{\mskip 3mu}|{\mskip 3mu}B\cr} \pmatrix{\alpha\cr \beta\cr} = 0. $$

Die Konsistenzmatrix $C_{p,k}$ für lineare $k$-Schrittverfahren der Konsistenzordnung $p$ lautet im Verein mit dem Koeffizientenvektor des Verfahrens und der entsprechenden Bedingung

$$ \left( \begin{array}{ccccc|ccccc} 1& 1& 1& 1& \ldots& 1&& 0& 0& 0& 0& \ldots& 0\cr 0& 1& 2& 3& \ldots& k&&-1& -1& -1& -1& \ldots& -1\cr 0& 1^2& 2^2& 3^2& \ldots& k^2&& 0& -2\cdot1& -2\cdot2& -2\cdot3& \ldots& -2\cdot k\cr 0& 1^3& 2^3& 3^3& \ldots& k^3&& 0& -3\cdot1^2& -3\cdot2^2& -3\cdot3^2& \ldots& -3\cdot k^2\cr \vdots & \vdots & \vdots & \vdots & \ddots&\vdots && \vdots & \vdots & \vdots & \vdots & \ddots&\vdots\cr 0& 1^p& 2^p& 3^p& \ldots& k^p&& 0& -p1^{p-1}& -p2^{p-1} & -p3^{p-1}& \ldots& -pk^{p-1}\cr \end{array} \right) \cdot \pmatrix{\alpha_0\cr \alpha_1\cr \vdots\cr \alpha_{k-1}\cr \alpha_k\cr \beta_0\cr \beta_1\cr \vdots\cr \beta_{k-1}\cr \beta_k\cr} = \pmatrix{0\cr 0\cr 0\cr 0\cr \vdots\cr 0\cr}. $$

Beispielsweise lautet die Konsistenzmatrix $C_{p,k}$ bei spezieller Wahl von $p$ und $k$, wie folgt:

1. Beispiel: Konsistenzmatrix $C_{p+1,k}$ für $p=3$ und $k=3$:

$$ C_{4,3} = \pmatrix{ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0\cr 0 & 1 & 2 & 3 & -1 & -1 & -1 & -1\cr 0 & 1 & 4 & 9 & 0 & -2 & -4 & -6\cr 0 & 1 & 8 & 27 & 0 & -3 & -12 & -27\cr 0 & 1 & 16 & 81 & 0 & -4 & -32 & -108\cr } $$

und für $p=5$, $k=5$ lautet $C_{p+1,k}$ mithin

$$ C_{6,5} = \pmatrix{ 1 & 1 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0\cr 0 & 1 & 2 & 3 & 4 & 5 & -1 & -1 & -1 & -1 & -1 & -1\cr 0 & 1 & 4 & 9 & 16 & 25 & 0 & -2 & -4 & -6 & -8 & -10\cr 0 & 1 & 8 & 27 & 64 & 125 & 0 & -3 & -12 & -27 & -48 & -75\cr 0 & 1 & 16 & 81 & 256 & 625 & 0 & -4 & -32 & -108 & -256 & -500\cr 0 & 1 & 32 & 243 & 1024 & 3125 & 0 & -5 & -80 & -405 & -773 & -3125\cr 0 & 1 & 64 & 729 & 4096 & 15625 & 0 & -6 & -192 & -1458 & -6144 & -18750\cr } $$

2. Ein lineares $k$-Schrittverfahren mit mindestens der Konsistenzordnung $p$ muß also im Kern der Matrix $C_{p,k}\in\mathbb{Z}^{(p+1)\times(2k+2)}$ liegen, also

$$ \pmatrix{\alpha\cr \beta\cr}\in\ker C_{p,k}. $$

Erweitert man die Matrix $C_{p,k}$ in offensichtlicher Weise unten um eine weitere Zeile und damit zur Matrix $C_{p+1,k}\in\mathbb{Z}^{(p+2)\times(2k+2)}$, und ist dann das Matrix-Vektor Produkt nicht mehr der Nullvektor, so hat das Verfahren die genaue Konsistenzordnung $p$, und die von Null verschiedene Komponente des Ergebnises ist ein unskalierter Fehlerfaktor $c_{p+1}$. Wenn auf die Unskalierung besonders hingewiesen werden soll auch $\lambda c_{p+1}$, mit $\lambda\in\{\alpha_k,\beta_k,\sigma(1),\ldots\}$. Man hat also

$$ C_{p+1,k}\pmatrix{\alpha\cr\beta\cr}=\pmatrix{0\cr\vdots\cr0\cr c_{p+1}\cr}. $$

Den Wert $c_{p+1}$ teilt man jetzt noch durch $(p+1)!$, aus später ersichtlichen Gründen, die mit einer Taylorentwicklung zu tun haben.

Übliche Skalierungsgrößen sind nun $\alpha_k$ oder $\sum_{i=0}^k \beta_i$. Skaliert man mit der letztgenannten Summe, so heiße der resultierende Faktor auch Henrici's Fehlerkonstante. Sie tritt in natürlichster Art und Weise auf bei der Behandlung von differential-algebraischen Gleichungen, die man mit Verfahren löst, wie man sie bei gewöhnlichen Differentialgleichungen einsetzt.

Bibliographisch: Henrici, Peter Karl Eugen (1923--1987).

3. Die Minus-Zeichen in der Konsistenzmatrix $C_{p,k}\in\mathbb{Z}^{(p+1)\times(k+1)(m+1)}$ (bisher $m=1$) tauchen nicht mehr auf, wenn man statt

$$ \sum \alpha_i y_{n+i} = h \sum \beta_i \dot y_{n+i} $$

einfach

$$ \sum \alpha_i y_{n+i} + h\beta_i \dot y_{n+i} + h^2\gamma_i \ddot y_{n+i} = 0 $$

schreibt (oben für $m=2$).

4. Bei Diskretisierungen zur Lösung von Gleichungen der From $F(t,y,\dot y)=0$, hier insbesondere linearen Mehrschrittverfahren, wird man unmittelbar dazu geführt, die Gleichung

$$ {1\over h}\sum_{i=0}^k\alpha_iy_i = \sum_{i=0}^k\beta_i\dot y_i $$

wie folgt zu interpretieren. Die linke Summe stellt eine Näherung für die Ableitung $\dot y$ an einer hier nicht weiter interessierenden Zwischenstelle dar. Die rechte Summe ist genau dann ein gewichtetes Mittel der Werte $\dot y_i$, wenn sich die $\beta_i$-Summanden genau zu eins aufsummieren. Das letzte kann man aber stets erreichen durch geeignete Vormultiplikation der obigen Gleichung mit dem Kehrwert der Summe $\sum_{i=0}^k\beta_i$. Aufgrund der sofort offensichtlichen Linearität der Konsistenzbedingungen, erscheint die entsprechende Summe dann auch in dem Fehlerfaktor $c_{p+1}$.

Genauso ist aber auch

$$ \sum_{i=0}^{k-1} \beta_i \dot y_{n+i} + \sum_{i=0}^k \alpha_i y_{n+i} = \dot y_{n+k} $$

interpretierbar als Näherung eben an der Stelle $t_{n+k}$.

5. Es gibt mehrere weitere Möglichkeiten die Fehlerkonstante von Henrici abzuleiten.

Insbesondere durch Überlegungen bzgl. des Einflußes der lokalen Fehler auf den globalen Fehler. Dies sind die Überlegungen, wie man sie in den Aufsätzen von Skeel (1976) und Albrecht (1985) findet. In dem letztgenannten Aufsatz werden diese Überlegungen in allgemeinster Form unter Berücksichtigung von mehrfachen Eigenwerten $\mu=1$ durchgeführt. Ähnliche Erwägungen werden in der Dissertation von Tischer (1983) durchgeführt. U.a. findet man eine Darstellung und Ableitung der Fehlerkonstanten von Henrici in dem Buche von Hairer/Wanner/Nørsett (1987) und natürlich in dem Buche von Henrici (1962) selber. Die Fehlerkonstante von Henrici ist nicht originär von Henrici erfunden worden. Sie taucht ebenfalls bei zahlreichen anderen Autoren auf, wie z.B. bei Hull/Newberry (1961).

Bibliographisch: Henrici, Peter Karl Eugen (1923--1987), Hairer, Ernst (*1949), Wanner, Gerhard (*1942), Nørsett, Syvert Paul, Thomas E. Hull, A.C.R. Newberry, Robert David Skeel, Peter Albrecht.

Peter E. Tischer: "The Cyclic Use of Linear Multistep Formulas for the Solution of Stiff Differential Equations", Ph.D. Thesis, Department of Computer Science, Monash University, Clayton, Victoria, Australia, August 1983, x+180 pages.

Bei linearen Verfahren, die der starken Wurzelbedingung genügen, ergibt sich die Fehlerkonstante $C_A$ zu

$$ C_A = {v{\mskip 3mu}\gamma\over vA_1w}, \qquad \left\{\eqalign{v{\mskip 3mu}(A_0+A_1)&=0,\quad v\ne0\cr (A_0+A_1)w&=0.\quad w\ne 0\cr}\right. $$

Hierbei ist also $v$ der Linkseigenvektor zum Matrixpolynom $A_1\mu+A_0$ zum Eigenwert $\mu=1$ und $w$ ist der Rechtseigenvektor zum gleichen Matrixpolynom und gleichem Eigenwert. Um eine eindeutige Konstante zu erhalten, wird der letzte Vektor $w$ noch normiert z.B. durch die Norm

$$ \left\|w\right\| = {1\over s}\sqrt{w_1^2+\cdots+w_s^2}. $$

Das starke Wurzelkriterium garantiert, daß $vA_1w$ nicht verschwindet. Genauer: die Tatsache, daß $\mu=1$ einziger Eigenwert inklusive Multiplizität ist, garantiert das Nichtverschwinden.

2. Die Anwendung linearer Mehrschrittverfahren bei DAE

1. Um mit Einbein-Verfahren differential-algebraische Gleichungen der Form

$$ \eqalign{ F(t,\dot x, x, y) &= 0,\cr G(t,x,y) &= 0,\cr } $$

zu lösen, rechnet man

$$ \eqalign{ F(\tau_n,{1\over h_n}\rho_nx_n, \sigma_nx_n,\sigma_ny_n) &= 0,\cr G(t_n,x_n,y_n) &= 0.\cr } $$

Die differential-algebraische Gleichung enthalte vermittels $G$ eindeutig als rein algebraische Restriktionen identifizierbare Gleichungen. Das Einbein-Verfahren für gewöhnliche Differentialgleichungen der Form $\dot x=f(t,x)$, ist gegeben durch

$$ {1\over h_n}\sum_{\nu=0}^k \alpha_{\nu,n}x_{n-k+\nu} - f\left(\tau_n,{\mskip 3mu}\sum_{\nu=0}^k \beta_{\nu,n}x_{n-k+\nu}\right)=0, $$

und

$$ \tau_n=\sum_{\nu=0}^k \beta_{\nu,n} t_{n-k+\nu} \qquad\hbox{wobei}\quad \sum_{\nu=0}^k\beta_{\nu,n}=1,\enspace\forall n $$

Die Operatoren $\rho_n$ und $\sigma_n$ sind wie üblich

$$ \rho_n z_n = \sum_{\nu=0}^k \alpha_{\nu,n} z_{n-k+\nu},\qquad \sigma_n z_n = \sum_{\nu=0}^k \beta_{\nu,n} z_{n-k+\nu}. $$

Ebenfalls mögliche Diskretisierungen des Gleichungspaares $F(t,\dot x,x,y)=G(t,x,y)=0$ sind

$$ \eqalign{ F(\tau_n,{1\over h}\rho_nx_n, \sigma_nx_n,\sigma_ny_n) &= 0,\cr G(\tau_n,\sigma_nx_n,\sigma_ny_n) &= 0,\cr } $$

was sich besonders dann anbietet, falls bei einer differential-algebraischen Gleichung, die Trennung zwischen “reiner” Differentialgleichung und rein algebraischen Restriktionen, nicht klar zutage tritt. Dies ist beispielsweise bei dem Problem P9 der Fall.

2. Für lineare Mehrschrittverfahren der Form

$$ {1\over h_n}\left(\sum_{\nu=0}^k \alpha_{\nu,n}x_{n-k+\nu}\right) - \sum_{\nu=0}^k \beta_{\nu,n}\dot x_{n-k+\nu} = 0, \qquad n=k,k+1,\ldots, $$

stets mit der Normierung $\sum_{\nu=0}^k\beta_{\nu,n}=1$, gewinnt man eine Diskretisierung für die Gleichung $F(t,\dot x,x,y)=G(t,x,y)=0$, wie folgt. Man fügt temporär eine zusätzliche Variable $w:=\dot x$ dem System $F=G=0$ hinzu und erhält dann nach Diskretisierung, unter Beachtung von $w_n:=\dot x_n$, das vergrößerte System

$$ \eqalign{ F(t_n,w_n,x_n,y_n) &= 0,\cr G(t_n,x_n,y_n) &= 0,\cr {1\over h_n}\rho_nx_n - \sigma_nw_n &= 0,\cr } $$

mit den Unbekannten $x_n$, $y_n$ und $w_n$. Die letzte Gleichung ist allerdings sehr leicht nach $w_n$ aufzulösen. Man erhält

$$ w_n = {1\over\beta_{k,n}}\left({1\over h_n}\rho_nx_n-\sigma'_nw_n\right), $$

mit

$$ \sigma'_n = \sum_{\nu=0}^{k-1} \beta_{\nu,n}w_{n-k+\nu}. $$

Der Operator $\sigma'_n$ wirkt also lediglich auf schon zurückliegende berechnete Werte. Einsetzen des so aufgelösten $w_n$, führt auf

$$ \eqalign{ F(t_n,{1\over\beta_{k,n}}\left({1\over h_n}\rho_nx_n-\sigma'_nw_n\right), x_n,y_n) &= 0,\cr G(t_n,x_n,y_n) &= 0,\cr } $$

zur Bestimmung von $(x_n,{\mskip 3mu}y_n)$.

3. Prinzipiell sind auch explizite Operatoren $(\rho_n,\sigma_n$) denkbar. Entsprechend erhält man dann

$$ \eqalign{ F (t_{n-1}, {1\over\beta_{k-1,n}}\left({1\over h_n}\rho_nx_n- \sigma''_nw_n\right),x_{n-1}, y_{n-1}) &= 0,\cr G (t_{n-1}, x_{n-1}, y_{n-1}) &= 0,\cr } $$

wobei $\sigma''_n$ noch einen Term weniger enthält. Der Vorteil der leichteren Auflösbarkeit oder gar der Vermeidung von Nichtlinearitäten solcher Diskretisierungen, ist jedoch im Falle von differential-algebraischen Gleichungen nicht mehr vorhanden. Es muß in jedem Falle in jedem Zeitschritt ein i.d.R. nicht-lineares Gleichungssystem gelöst werden.

Zu dieser Diskretisierung schreibt Liniger (1979):

although to the author's knowledge, thus far such methods have not been used practically …

Der Einsatz von linearen Mehrschrittverfahren und Einbein-Verfahren, und die Ableitung von Diskretisierungen bei differential-algebraischen Gleichungen, wird bei Liniger (1979) untersucht. Insbesondere Fragen der Konsistenzordnung der Diskretisierung findet man bei Liniger (1979). Stabilitätsuntersuchungen sind für differential-algebraische Gleichungen schwieriger und werden daher auch dort nicht behandelt. Einen Konvergenzbeweis für die BDF findet man bei Petzold/Lötstedt (1986a). Eine gute übergreifende Darstellung bietet Griepentrog/März (1986).

Bibliographisch: Werner Liniger (1927--2017), Linda Ruth Petzold (*1954), Eberhard Griepentrog (1933--2023), Roswitha März (*1940), Per Lötstedt.

3. Mehrere Charakterisierungen der Konsistenzordnung

An dieser Stelle sollen eine Reihe von gleichwertigen Charakterisierungen angegeben werden, die garantieren, daß ein lineares Mehrschrittverfahren der Form

$$ \sum_{i=0}^k \alpha_iy_{n+i} = h\sum_{i=0}^k \beta_if_{n+i}, $$

mindestens die Konsistenzordnung $p$ hat. Unter Umständen hat das Verfahren sogar eine noch höhere Ordnung.

1. Es sei

$$ L(y,t_0,h) = \sum_{i=0}^k \left(\alpha_i y(t_0+ih) - h\beta_i\dot y(t_0+ih)\right). $$

und

$$ \rho(\mu) := \sum_{i=0}^k \alpha_i\mu^i,\qquad\hbox{ferner}\qquad \sigma(\mu) := \sum_{i=0}^k \beta_i\mu^i. $$

2. Satz: Nun sind die folgenden 9 Aussagen paarweise zueinander äquivalent.

(1) $C_{p,k}{\alpha\choose\beta}= {\bf0}$, also $(\alpha,\beta)^\top\in\ker C_{p,k}$.

(2) $\sum_{i=0}^k \alpha_i i^q = q\sum_{i=0}^k \beta_i i^{q-1}$, für $q=0,1,\ldots,p$.

(3) $\rho(e^h)-h\sigma(e^h)={\cal O}(h^{p+1})$, für $h\to 0$.

(4) $\zeta=1$ ist mindestens $p$-fache Nullstelle der Funktion

$$ h\mapsto{\rho(\zeta)\over\ln\zeta}-\sigma(\zeta), $$

also $\rho(\zeta)/\ln\zeta=\sigma(\zeta)+{\cal O}\bigl((\zeta-1)^p\bigr)$, für $\zeta\to 1$.

(5) $L(f,t,h)={\cal O}(h^{p+1}),\quad\forall f\in C^{p+2}(G,\mathbb{R}{})$.

(6) Die Monome bis zum Grade $p$ liegen im Kern des $h=1$ Schnittes von $L$, also $L(t^i,t_0,1)=0$, für $i=0,1,\ldots,p$.

(7) $L(f,t_0,h)={\cal O}(h^{p+1})$, für die spezielle Funktion $t\mapsto f(t)=e^t$.

(8) $y(t_0+kh)-y_k={\cal O}(h^{p+1})$, falls man die Startwerte als exakte Werte wählt.

(9) $L(y,t_0,h)=c_{p+1}h^{p+1}y^{(p+1)}(t_0)+{\cal O}(h^{p+2})$ mit $c_{p+1}=\sum_{i=0}^k\bigl(\alpha_ii^{p+1}-(p+1)\beta_ii^p\bigr)/(p+1)!$.

Diese Liste liesse sich fortsetzen. Einige der Nummern sind lediglich Umformulierungen anderer Nummern. Dennoch ist es gelegentlich nützlich eine der möglichen Formeln in der oben zitierten Form parat zu haben. Die Bedingung der Konsistenz und die Konsistenzordnung ist unabhängig von der Entwicklungsstelle der Taylorentwicklung. Zur praktischen Berechnung von Mehrschrittformeln, die zuerst nicht unbedingt stabil sein müssen, wählt man häufig (1), für den Beweis der Dahlquistbarriere werden (3) und (4) herangezogen. Einzelne Eigenschaften der obigen Angaben tragen in der Literatur häufig auch gesonderte Namen. Für die Beweise sei beispielsweise verwiesen auf Hairer/Wanner/Nørsett (1987) oder auch {Werner, Helmut}{Arndt, Herbert}Werner/Arndt (1986).

Wählt man, wie bei (8) die Startwerte exakt, also $y_0=y(t_0)$, $y_1=y(t_0+h)$ und so fort bis $y_{k-1}=y(t_0+(k-1)h)$, so gilt für den dann entstehenden Fehler zwischen dem nun durch die Formel bestimmten Wert $y_k$ und dem exakten Wert $y(t_0+kh)$, die Beziehung

$$ y(t_0+kh)-y_k = {1\over\alpha_k}W^{-1}L(y,t_0,h). $$

Hierbei ist $W=I-h\gamma J$, $\gamma=\beta_k/\alpha_k$ und $J$ ist die Jacobimatrix von $f$, deren Zeilen ausgewertet wurden an Stellen auf der Strecke zwischen $y(t_k)$ und $y_k$. Insbesondere gilt nicht notwendig $J=f_y(y_k)$, jedoch gilt dies näherungsweise. Im Falle steifer Differentialgleichungen und differential-algebraischer Gleichungen wird auch tatsächlich $W^{-1}L(y,t_0,h)$, bzw. eine Näherung hiervon, zum Gebrauch als Fehlerkonstante empfohlen, man vgl. hier Petzold (1982) und die dort angeführte Literatur. Im Falle von expliziten Formeln, also $\beta_k=0$, ist natürlich $W=I$.

4. Die erste Dahlquist-Barriere

You know, I am a multistep-man … and don't tell anybody, but the first program I wrote for the first Swedish computer was a Runge-Kutta code …
(G. Dahlquist, 1982, after some glasses of wine; printed with permission), Hairer/Wanner/Nørsett (1987)

Das Bemerkenswerte am Hopfschen Problem ist, daß eine einfach zu formulierende algebraische Frage eine einfache algebraische Antwort findet, daß indessen zur Lösung nichttriviale Methoden der Topologie erforderlich sind; hier erscheint zum ersten Mal der “topologische Stachel im Fleisch der Algebra”, der bis auf den heutigen Tag von vielen Algebraikern als so schmerzhaft empfunden wird.
R. Remmert, M. Koecher (1983)

Bibliographisch: Hairer, Ernst (*1949), Wanner, Gerhard (*1942), Nørsett, Syvert Paul, Remmert, Reinhold (1930--2016), Koecher, Max (1924--1990),

Der Kern der Konsistenzmatrix $C_{p,k}$ hat eine Fülle von sehr speziellen Eigenschaften, die hier zusammengestellt werden sollen. Anhand des sehr strukturierten Aufbaus dieser Matrix sind einige dieser Eigenschaften nicht verwunderlich. Insbesondere einige Symmetrieeigenschaften der Matrix übertragen sich auf den Kern. Nimmt man als Nebenbedingung noch das Erfülltsein der Wurzelbedingung hinzu, so gelten weitreichende Konsequenzen, hier die beiden Dahlquist-Barrieren. Diese beiden Barrieren sind auch eine der Gründe für das verstärkte Interesse an zusammengesetzten Verfahren.

1. Definition: Ein lineares $k$-Schrittverfahren heißt symmetrisch, falls

$$ \alpha_i=-\alpha_{k-i}\qquad\hbox{und}\qquad\beta_i=\beta_{k-i} \qquad\hbox{für alle}\quad i=0,\ldots,k. $$

2. Beispiel: Das Milne-Simpson-Verfahren $y_{n+1}-y_{n-1}=h\cdot(f_{n+1}+4f_n+f_{n-1})/3$ ist symmetrisch, während hingegen alle Adams-Verfahren, implizit oder explizit, nicht symmetrisch sind. Die Nullstellen des charakteristischen Polynoms $\rho$ des Milne-Simpson-Verfahren liegen bei $(+1)$ und bei $(-1)$.

Für symmetrische lineare Mehrschrittverfahren gilt $\rho(\mu)=-\mu^k\rho(1/\mu)$, aufgrund der obigen Definition. Mit $\mu$ ist auch gleichzeitig $1/\mu$ Nullstelle von $\rho$.

Für ein stabiles lineares Mehrschrittverfahren liegen somit alle Wurzeln auf dem Einheitskreis und sind einfach. An solchen Verfahren ist man eher weniger interessiert, da bei Schrittweiten $h\ne0$, diese Wurzeln vom Betrage $1$ aus dem Einheitskreis nach aussen wandern. Allerdings hängt bei Schrittweiten $h\ne0$ dieses Verhalten stark vom Prädiktor ab. In der Kombination als Picard-Prädiktor-Korrektor-Verfahren oder als Stufen zyklischer Verfahren haben jedoch symmetrische lineare Mehrschrittverfahren gegenüber anderen Verfahren keine Nachteile.

Ein Verfahren der Konsistenzordnung $p$ liefert für eine Differentialgleichung mit Polynomen des Grades $p$ als Lösung, unter völliger Vernachlässigung von Rundungsfehlern, die exakte Lösung. Ist das Verfahren zusätzlich noch stabil, so konvergiert das Verfahren.

3. Satz: Es gilt

Es existiert kein lineares $k$-Schrittverfahren der Konsistenzordnung $2k+1$.
Es gibt genau ein explizites $k$-Schrittverfahren der Konsistenzordnung $2k-1$. Aus $\beta_k=0$ folgt also automatisch $p\le2k-1$.
Es gibt genau ein implizites lineares $k$-Schrittverfahren der Konsistenzordnung $2k$.
Dieses eindeutig bestimmte implizite lineare $k$-Schrittverfahren der maximalen Konsistenzordnung $2k$, ist symmetrisch.
Symmetrische lineare Mehrschrittverfahren haben immer eine gerade Konsistenzordnung. Dies heißt, hat man von einem symmetrischen Mehrschrittverfahren die Konsistenzordnung $2\nu-1$ nachgewiesen, so hat das Verfahren auch schon automatisch die eins höhere Konsistenzordnung $2\nu$.
Zu vorgegebenen Polynomgraden $k_\alpha$ und $k_\beta$, mit $k_\beta\le k_\alpha$, kann man Polynome $\rho$ und $\sigma$ finden, mit $\mathop{\rm grad}\rho=k_\alpha$ und $\mathop{\rm grad}\sigma=k_\beta$, die ein lineares $k=k_\alpha$-Schrittverfahren festlegen und zwar mit der Konsistenzordnung $1+k_\alpha+k_\beta$.
Der Rang der Konsistenzmatrix $C_{p,k}$ ist für alle $p$ und $k$ maximal. Es gilt also

$$ \mathop{\rm rank} C_{p,k}=\min(p+1,k+1),\qquad\forall p,k\in\mathbb{N}. $$

Es war $C_{p,k}\in\mathbb{Z}^{(p+1)\times(2k+2)}$ und mit dem ^{Dimensionssatz für lineare Gleichungssysteme} ergibt sich daher $\dim\ker C_{p,k}=(2k+2)-\min(p+1,k+1)$.
Den führenden Koeffizienten $\alpha_k$ kann man als Normierungsfaktor auffassen, und damit gibt es genau eine $2k-p$ parameterabhängige Schar von linearen $k$-Schrittverfahren, falls $p\ge k$. Von dieser Schar ist aber nach der ersten Dahlquist-Barriere nur ein kleiner Teil als einstufiges Verfahren konvergent.

Es zeigt sich, daß die maximal erreichbare Konsistenzordnung eines linearen Mehrschrittverfahrens der Form $\sum_{i=0}^k (\alpha_i y_{n+i} - h\beta_i f_{n+i})$ durch die Wurzelbedingung an $\rho$, begrenzt ist. Ist das charakteristische Polynom $\rho$ stabil, so ist dies gleichzeitig ein Hemmschuh für die maximal erreichbare Konsistenzordnung, d.h. Konsistenzordnung und Stabilität beissen sich gegeneinander, oder in noch anderer Formulierung, mehr geometrischer Sprechweise:

Die Menge aller derjenigen Vektoren

$$ \left(\alpha_0,\ldots,\alpha_\kappa,\beta_0,\ldots,\beta_\kappa\right) \in\mathbb{R}^{2\kappa+2}, $$

welche zu einem stabilen Polynom $\rho(\mu)=\sum_{i=0}^\kappa \alpha_i\mu^i$ führen, beschreibbar durch Ungleichungen vom Routh-Hurwitz-Typ, {Routh, E.J.}{Hurwitz, Adolf (1859--1919} stellt einen nicht-linearen Kegel im $\mathbb{R}^{2\kappa+2}$ dar. Die lineare Untermannigfaltigkeit beschrieben durch $C_{p,\kappa}{\alpha\choose\beta}=0$ schneidet den nicht-linearen Kegel überhaupt nicht für $p>\kappa+2$ und berührt ihn für $p=\kappa+1+{1\over2}\left[1+(-1)^\kappa\right]$. Es gilt der

4. Satz: (^{Erste Dahlquist-Barriere}) Das lineare $k$-Schrittverfahren sei wenigstens konsistent der Ordnung $1$ und das charakteristische Polynom $\rho$ erfülle die Wurzelbedingung.

(a) Dann unterliegt die Konsistenzordnung und damit gleichzeitig einhergehend die Konvergenzordnung $p$, einer Schranke nach oben wie folgt

$$ p\le\cases{ k+2, & falls $k$ gerade ist;\cr k+1, & falls $k$ ungerade ist;\cr k, & falls $\beta_k/\alpha_k\le0$, insbesondere falls das Verfahren explizit ist.\cr } $$

(b) Weiterhin gilt: Stabile lineare Mehrschrittverfahren der maximalen Konsistenzordnung $k+2$ ($k$ also gerade) sind symmetrisch. Bei einem solchen linearen $k$-Schrittverfahren mit geradem $k$, besitzt $\rho$, also wie oben bemerkt, die Wurzeln $(+1)$, $(-1)$ und $(k-2)/2$ Paare verschiedener unimodularer Wurzeln. Zu jedem $\rho$ mit dieser Eigenschaft der Wurzeln, existiert genau ein einziges $\sigma$, sodaß $(\rho,\sigma)$ ein lineares $k$-Schrittverfahren der maximalen Konsistenzordnung $k+2$ ist.

5. Wegen dem großen Interesse, dem man diesem Satz beimißt, findet man den sehr schönen, funktionentheoretischen Beweisgang in mehreren ($\ge8$) Büchern. Verwiesen sei hier lediglich auf die schon mehrfach zitierten beiden Bücher von Werner/Arndt (1986) und Hairer/Wanner/Nørsett (1987) sowie Werner/Schaback (1972). Man vgl. auch die Bücher von Gear (1971c), Henrici (1962) und Stetter (1973) (unvollständiger Beweis).

Darüberhinaus ist der Satz erweitert worden, um für eine noch größere Klasse von Verfahren seine entsprechende Gültigkeit zu behalten, man vgl. hier Jeltsch/Nevanlinna (1986). Eine Kurzübersicht über Ordnungsbeschränkungen findet man in dem Tagungsaufsatz von Wanner (1987). G. Dahlquist bewies diesen Satz 1956.

Bibliographisch: Jeltsch, Rolf, Nevanlinna, Olavi, Hairer, Ernst (*1949), Wanner, Gerhard (*1942), Nørsett, Syvert Paul, Dahlquist, Germund, Stetter, Hans Jörg (*1930), Schaback, Robert (*1945), Werner, Helmut (1931--1985), Arndt, Herbert, Gear, Charles William (1935--2022), Henrici, Peter Karl Eugen (1923--1987).

6. Konsequenzen dieser ersten Dahlquist-Barriere sind:

6.1. Es gibt kein stabiles lineares $3$-Schrittverfahren mit der Konsistenzordnung 6. Es gibt allerdings mehrere lineare zyklische Mehrschrittformeln der Konvergenzordnung 6 mit nur 3 Startwerten, nämlich die Verfahren DH2 und DH3.{Donelson III, John}{Hansen, Eldon} Allerdings erhält man die Lösungswerte stets im “3er-Pack”. Die letzte Stufe dieser beiden dreistufigen Zyklen benutzt für sich alleine nur 3 Startwerte, davon stammen zwei aus dem aktuellen Zyklus, insgesamt jedoch hat man dann 6 Lösungswerte mit äquidistanten Gitterabstand vorliegen.

6.2 Die Adams-Moulton-Verfahren

$$ y_{n+1}-y_n=h\sum_{i=0}^\kappa \beta_i f_{n+i-\kappa} $$

haben die Konsistenzordnung $\kappa$ und die Konvergenzordnung $\kappa+1$, welches für ungerade $\kappa$ von keinem anderem konvergenten linearen einstufigen Mehrschrittverfahren überboten werden kann.

Als ein Beispiel für die Verallgemeinerungen, sei der Satz von Reimer aus dem Jahre 1968 angegeben, ohne Beweis. Während die erste Dahlquist-Barriere lediglich für lineare Mehrschrittverfahren galt, verallgemeinerte der Satz von Reimer, 12 Jahre später nach der ersten Dahlquist-Barriere, den Sachverhalt auf lineare Verfahren mit beliebig vielen Ableitungen.

Bibliographisch: Reimer, Manfred.

7. Satz: (Satz von Reimer über den maximalen Grad stabiler Differenzenformeln.) Die lineare Differenzenform

$$ L(h,y,\ldots,y^{(m)}) := \sum_{\nu=0}^\kappa \sum_{\mu=0}^m h^\mu \alpha_\nu^{(\mu)} y_\nu^{(\mu)}, \qquad\hbox{mit}\quad\kappa\ge1, m\ge1 $$

und der Normierung $\alpha_\kappa^{(0)}=1$ habe den Grad $p$ und sei stabil. Dann gilt

$$ p \le N+{1\over2}\left[1+(-1)^{N+1}\right], \qquad\hbox{mit}\quad N=(\kappa+1)m. $$

Die Schranke wird für jedes Paar $(\kappa,m)\in\mathbb{N}^2$ angenommen. Weiterhin wird der maximal mögliche Grad $p=N+1$ genau dann erreicht, wenn $N=(\kappa+1)m$ ungerade ist und das Verfahren symmetrisch ist.

Beweis: siehe Reimer (1982). ☐

8. Die hier häufig auftauchenden BDF$i$ haben das folgende Stabilitätsverhalten. Die BDF$i$ ist ein lineares $i$-Schrittverfahren der Konsistenzordnung $i$ und definiert durch die Formeln

$$ \sum_{\nu=1}^i {1\over\nu}\nabla^\nu y_{n+1} = hf_{n+1},\qquad n=0,\ldots . $$

9. Satz: Die BDF$i$ sind stabil für $i\in\{1,\ldots,6\}$ und instabil für alle $i\ge7$.

Den funktionentheoretischen Beweis findet man in dem Buche von Hairer/Wanner/Nørsett (1987). Die Tatsache, daß die BDF$i$ für alle $i\ge7$ instabil sind, wurde zuerst 1972 von C.W. Cryer bewiesen. Drei Jahre später erschien ein alternativer Beweis von D.M. Creedon und J.J.H. Miller (alle BIT). Der Satz gilt nicht für zusammengesetzte Verfahren, wo die BDF$i$ als Stufen auftauchen, wie z.B. das zyklische Verfahren siebenter Ordnung von Tendler (1973) deutlich macht. Man siehe auch Tendler/Bickart/Picel (1978).

Bibliographisch: Colin Walker Cryer, Theodore A. Bickart (1936--2023), obituary.

Joel Marvin Tendler: "A Stiffly Stable Integration Process Using Cyclic Composite Methods", Ph.D. Diss., Syracuse University, Syracuse, New York, 26.Feb.1973, viii+iv+172 pages.

5. Die zweite Dahlquist-Barriere

1. Die erste Dahlquist-Barriere schränkt die höchstmögliche Konvergenzordnung ein, aber allerdings nicht allzu drastisch. Sehr drakonisch wird jedoch die Vielfalt $A$-stabiler linearer Mehrschrittverfahren durch die zweite Dahlquist-Barriere eingeengt. Es stellt sich heraus, daß man über die Ordnung 2 grundsätzlich nicht hinaus kommt. Hier gilt dann also

2. Satz: Zweite Dahlquist-Barriere.

(1) Ein lineares $A$-stabiles Mehrschrittverfahren ist stets implizit und besitzt höchstens die Konsistenzordnung und damit die Konvergenzordnung 2.

(2) Unter allen linearen $A$-stabilen Mehrschrittverfahren der Konsistenzordnung 2, besitzt die Trapezregel $y_{n+1}=y_n+h\cdot(f_{n+1}+f_n)/2$ ($\vartheta=1/2$-Einschrittverfahren) die kleinstmögliche Fehlerkonstante.

Die BDF2 ist das einzige lineare $2$-Schrittverfahren, welches $A_\infty^0$-stabil ist.

Für den funktionentheoretischen Beweis sei auf die Originalarbeit von Dahlquist (1963) verwiesen, Germund G. Dahlquist (1925--2005). Der Beweis beruht maßgeblich auf den folgenden beiden Sachverhalten und dem Vorzeichenverhalten gewisser Terme.

3. Lemma: Ein $k$-Schrittverfahren ist $A$-stabil genau dann, wenn $\rho(\mu)/\sigma(\mu)$ das Äußere des Einheitskreises auf die komplexe linke Halbebene abbildet, also $\mathop{\rm Re}\nolimits \left[\rho(\mu)/\sigma(\mu)\right]<0$, für $\left|\mu\right|>1$.

4. Satz: Satz von Riesz-Herglotz.

Voraussetzungen: Die Funktion $\varphi(z)$ sei holomorph für $\mathop{\rm Re}\nolimits z>0$. Desweiteren gelte für alle Argumente in der rechten Halbebene $\mathop{\rm Re}\nolimits \varphi(z)\ge0$, also

$$ \mathop{\rm Re}\nolimits z\gt 0 {\mskip 5mu}\Longrightarrow{\mskip 5mu} \mathop{\rm Re}\nolimits \varphi(z)\ge0. $$

Ferner genüge $\varphi$ auf der positiven reellen Achse der Beschränktheitsbedingung $\sup\left\{\left|x\varphi(x)\right|: 0

Behauptung: $\varphi$ hat für alle Argumente aus der rechten Halbebene die Darstellung

$$ \varphi(z) = \int_{-\infty}^\infty {d\omega(t)\over z-it}, \qquad\hbox{für}\quad\mathop{\rm Re}\nolimits z\gt 0, $$

wobei $\omega(t)$ beschränkt und nicht fallend ist.

Bibliographisch: Gustav Ferdinand Maria Herglotz (1881--1953), Friedrich Riesz (1880--1956).

Aus dieser einschränkenden Begrenzung der Konvergenzordnung, beziehen Begriffe wie $A[\alpha]$-, $S[\delta]$-Stabilität, etc., überhaupt ihre Berechtigung. Gäbe es $A$-stabile lineare $k$-Schrittverfahren beliebig hoher Ordnung, so wären dies die idealen Verfahren zur Lösung steifer Differentialgleichungen, unter der Voraussetzung daß nicht andere Eigenschaften, wie z.B. Schrittzahl $k$, Fehlerkonstanten, u.s.w., erheblich verschlechtert würden. Beispielsweise wäre ein $A_\infty^0$-stabiles, lineares 4-Schrittverfahren, bei Fehlerkonstanten im Bereich von ca. $1\over10$, ein ideales Verfahren zur Lösung steifer Differentialgleichungen. Die zweite Dahlquist-Barriere besagt, daß es solch ein Verfahren prinzipiell nicht geben kann.

Alternative Beweisgänge werden in Wanner (1987), s.u., angedeutet, allerdings nicht streng bewiesen. Die Trapezregel ist nicht $A_\infty^0$-stabil. Die Dissertation Tischer (1983) und Tischer/Sacks-Davis (1983) geben $A_\infty^0$- und $S_\infty^0$-stabile zyklische zweistufige Verfahren an, mit den Konvergenzordnungen $p=2,3,4$ und benötigten Startwerten von $k=2,3,4$. Allerdings sind bei allen Stufen der zyklischen Formeln von Tischer (1983) (Dissertation) und Tischer/Sacks-Davis (1983), die Äquilibrierungsmaße vergleichsweise hoch.

Bibliographisch: Peter E. Tischer, Ron Sacks-Davis.

Wanner, Gerhard (*1942): "Order Stars and Stability", in ``The State of the Art in Numerical Analysis", Proceedings of the joint IMA/SIAM conference held at the University of Birmingham, 14--18 April 1986, Edited by A. Iserles and M.J.R. Powell, Clarendon Press, Oxford, 1987, 451--471

Die Tatsache, daß ein explizites lineares Mehrschrittverfahren nicht $A$-stabil sein kann, ist sehr leicht einzusehen. Auch eine beliebige Kombination expliziter linearer Mehrschrittverfahren, mit äquidistanter Gitterweite $h$, kann nicht $A$-stabil sein. Sehr wohl kann jedoch eine Kombination von impliziten und expliziten Verfahren sogar $A_\infty^0$-stabil sein.

Zusätzliches Licht auf die Beziehung zwischen Konsistenzordnung und Stabilitätseigenschaften wirft der Satz von Jeltsch/Nevanlinna (1982).

Bibliographisch: Rolf Jeltsch, Olavi Nevanlinna.

5. Satz: siehe Jeltsch/Nevanlinna (1982). Es gilt

$$ \forall k\in\mathbb{N}:\forall\alpha\in[0^\circ,90^\circ):\quad \hbox{existiert mindestens ein $A[\alpha]$-stabiles Verfahren} $$

und

$$ \forall k\in\mathbb{N}:\forall\delta\gt 0:\quad\hbox{existiert mindestens ein $S[\delta]$-stabiles Verfahren}. $$

Weiter gibt es funktionale Zusammenhänge zwischen Fehlerkonstanten und erreichbarer Höchstordnung.

6. Satz: siehe Jeltsch/Nevanlinna (1986).

Voraussetzung: Es sei $c_{p+1}$ der Fehlerfaktor von

$$ (L_hy)(t) := \sum_{i=0}^k \sum_{j=0}^m \alpha_{ij} h^j y^{(j)}(t+ih), \qquad c_{p+1}={1\over y^{(p+1)}(0)} (L_1y)(0). $$

Die Wurzeln von $\rho_0(\zeta)$ seien $\zeta_1=1,\zeta_2,\ldots,\zeta_k$, welche in der Einheitskreisscheibe liegen:

$$ \left|\zeta_i\right|\le R,\qquad 0\le R\le1. $$

Behauptung: (1) Ist die Formel explizit und von der Ordnung $p=mk$, so gilt

$$ c_{p+1} \ge {1\over(m+k)!} 2^{1-k} \sum_{j=0}^{k-1} {k-1\choose j} \left(1-R\over1+R\right)^j f_{jkm}. $$

(2) Im impliziten Falle und der Ordnung $p=(k+1)m$ gilt

$$ (-1)^m c_{p+1} \ge {(-1)^m\over\left[(k+1)m\right]!} 2^{1-k} \sum_{j=0}^{k-1} {k-1\choose j} \left(1-R\over1+R\right)^j e_{jkm}. $$

(3) In beiden Fällen, also in Fall (1) und (2) ist Gleichheit möglich bei der Formel maximaler Ordnung mit dem charakteristischen Polynom

$$ \rho_0(\zeta) = (\zeta-1)(\zeta+R)^{k-1}, $$

welches stabil ist, für $R<1$ oder $R=1$ und $k\le2$.

Die Größen $f_{jkm}$ und $e_{jkm}$ sind gegeben durch

$$ f_{jkm} := \sum_{i=0}^{k-1} T_{ij} W_{i,k-1,m}, \qquad W_{jkm} := \int_j^{j+1} \prod_{\nu=0}^k (t-\nu) dt, $$

und

$$ e_{jkm} := \sum_{i=0}^{k-1} T_{ij} W_{i,k+1,m}, \qquad T_{ij} := \sum_{\mu=\max(0,i-j)}^{\min(n-j,i)} {j\choose i-\mu} {n-j\choose\mu} (-1)^{j-i+\mu}, $$

für $j=0,\ldots,k-1$.

Beweis: Jeltsch/Nevanlinna (1986). ☐

Nach dem Satz von Jeltsch/Nevanlinna (1982) gibt es also “fast $A$-stabile” lineare Mehrschrittverfahren mit beliebiger Anzahl von Startwerten. Dennoch sind diese Verfahren nicht $A$-stabil und schon gar nicht $A_\infty^0$- oder $S_\infty^0$-stabil, nach der zweiten Dahlquist-Barriere. Vielmehr rücken die Wurzeln betragsmässig immer mehr der eins näher, für $H\to\infty$, und die Fehlerkonstanten werden mit größer werdenden $k$ immer größer. Beispielsweise verwendet Gupta (1985) $k$-Schrittverfahren mit den in der Tabelle angegebenen Eigenschaften.

Bibliographisch: Gopal K. Gupta.

p	$\alpha$	$\delta$	$c_{p+1}$	$\mu_\infty$	k
1	90.00	0.0	0.5	0.0	1
2	90.00	0.0	0.083	1.0	1
3	86.46	0.075	0.242	0.32	3
4	80.13	0.282	0.374	0.43	5
5	73.58	0.606	0.529	0.567	7
6	67.77	1.218	0.724	0.878	9
7	65.53	1.376	1.886	0.898	12
8	64.96	1.149	7.686	0.790	16
9	62.78	2.086	16.737	0.989	19
10	63.74	1.223	133.955	0.878	26

Hierbei bedeutet $p$ die Konvergenzordnung, $k$ die Anzahl der Startwerte, $c_{p+1}$ der Fehlerfaktor, $\alpha$ der Widlund-Winkel, und $\delta$ ist der entsprechende Wert bei der $S[\delta]$-Stabilität. $\mu_\infty$ ist der Betrag der betragsmässig größten Wurzel bei $\infty$.

Es zeigt sich, daß das Programm DSTIFF, welches diese Formeln benutzt, häufig doppelt so viele Funktionsauswertungen und doppelt so lange Rechenzeiten beansprucht, wie das Programm LSODE, welches auf den BDF$i$, mit $i\in\{1,\ldots,5\}$ basiert. Man beachte allerdings, daß hier Implementierungen von Formeln und Heuristik miteinander verglichen wurden. Dieser Vergleich, den Gupta (1985) anstellte, ist also keine endgültige Wertung von Formeln, sondern eine Wertung von Programmen. Eine geschickte Programmierung und eine durchdachte Heuristik sind von nicht zu unterschätzender Wichtigkeit.

Aufgrund einer modifizierten Strategie für die Korrektoriteration in dem Programm DSTIFF, sind die Anzahlen für $LU$-Zerlegungen (= Jacobimatrixauswertungen) leicht geringer, als für das Programm LSODE. Lediglich erwartungsgemäss für das Problem B5 benötigt das Programm DSTIFF bedeutend weniger Schritte, als das Programm LSODE. Bei diesem Problem werden die Stabilitätseigenschaften besonders gefordert. Wüßte man im voraus um die Lage der Eigenwerte der konstanten Jacobimatrix, so könnte man bei dem Programm LSODE gleich von vorne herein eine passende Höchstordnung wählen und damit würden sich beide Programme wieder angleichen. Das Verhältnis der Schritte der beiden Programme zueinander beträgt bei B5 grob $1:4$ (DSTIFF:LSODE). Dieses Verhältnis übersetzt sich allerdings nicht in gleichem Maßstab auf die Rechenzeit. Die Rechenzeit ist lediglich um einen wesentlich geringeren Betrag angestiegen.

Eine gewisse Sonderstellung nehmen die BDF$i$ ein, wegen $\sigma(\mu)=\mu^i$.

7. Bemerkung: Die BDF$i$ sind die einzigen linearen $i$-Schrittverfahren der Konsistenzordnung $i$, für $i=1,\ldots,6$, die $A_\infty^0[\alpha]$- bzw. $S_\infty^0[\delta]$-stabil sind.

Es gibt weitere lineare Mehrschrittverfahren ($\ne$BDF$i$), die $A_\infty^0[\alpha]$- bzw. $S_\infty^0[\delta]$-stabil sind, jedoch ist dann die Konsistenzordnung $i$ nicht mehr mit $i$ Startwerten zu erreichen. Für mehrstufige Verfahren gilt die Bemerkung nicht mehr, wie z.B. die zyklischen Formeln von Tendler (1973) deutlich machen, siehe auch Tendler/Bickart/Picel (1978).

6. Annullierte Dominanz und Totalannullation

Wie üblich bedeute “$\cong$” Gleichheit bis auf ${\cal O}(h^{p+2})$. Jede Stufe eines zusammengesetzten Verfahrens wird nun in einen Taylorabschnitt (Taylor, Brook (1685--1731)) zerlegt und man erhält hierfür

$$ \gamma \cong \pmatrix{ c_{11}\dot yh+\cdots+c_{1,p+1}y^{(p+1)}h^{p+1}\cr c_{21}\dot yh+\cdots+c_{2,p+1}y^{(p+1)}h^{p+1}\cr \vdots \qquad \cdots \qquad \vdots\cr c_{s1}\dot yh+\cdots+c_{s,p+1}y^{(p+1)}h^{p+1}\cr } = \underbrace{\pmatrix{ c_{11} & \ldots & c_{1p}\cr c_{21} & \ldots & c_{2p}\cr \vdots & \ddots & \vdots\cr c_{s1} & \ldots & c_{sp}\cr}}_{\in\mathbb{C}^{s\times (p+1)}}{\mskip 3mu} \underbrace{\pmatrix{\dot yh\cr \ddot yh^2\cr \vdots\cr y^{(p)}h^p\cr}} _{\in\mathbb{C}^p} $$

$\gamma$ kann aufgespalten werden in eine Summe

$$ \gamma \cong \pmatrix{c_{11}\cr\vdots\cr c_{s1}\cr}\dot yh +\pmatrix{c_{21}\cr\vdots\cr c_{s,2}\cr}\ddot yh +\cdots+\pmatrix{c_{1,p+1}\cr\vdots\cr c_{s,p+1}\cr}y^{(p+1)}h^{p+1} $$

und jeder Summand wird einzeln auf annullierte Dominanz, oder Totalannullation geprüft. Bei Verfahren, bei denen alle Stufen die gleiche Konsistenzordnung haben, sind $c_{ij}=0$, für $j\le p$, und $i=1,\ldots,s$, und die obige Gleichung reduziert sich dann auf

$$ \gamma \cong \pmatrix{ c_{1,p+1}y^{(p+1)}h^{p+1}\cr \vdots\cr c_{s,p+1}y^{(p+1)}h^{p+1}\cr} = \pmatrix{c_{1,p+1}\cr \vdots\cr c_{s,p+1}\cr} y^{(p+1)}h^{p+1}. $$

1. Beispiel: Es werde das explizite Euler-Verfahren als Prädiktor verwendet und mit der BDF2 werde zweimal anschliessend iteriert.

Schritt	Formel	Verfahren
Prädiktor	$y^0_{n+1}=y_n+z_n$	explizites Euler-Verfahren
Korrektor	$y_{n-1}-4y_n+3y^1_{n+1}=2z^0_{n+1}$	BDF2
Korrektor	$y_{n-1}-4y_n+3y_{n+1}=2z^0_{n+1}$	BDF2

Mit $u_n=(y^0_n,{\mskip 3mu}y^1_n,{\mskip 3mu}y_n)$ ergibt sich für die sechs Matrizen

$$ A_0=\pmatrix{0&0&0\cr 0&0&1\cr 0&0&1\cr},\quad A_1=\pmatrix{0&0&-1\cr 0&0&-4\cr 0&0&-4\cr},\quad A_2=\pmatrix{1&0&0\cr 0&3&0\cr 0&0&3\cr},\quad B_0=B_1={\bf 0},\quad B_2=\pmatrix{0&0&0\cr 2&0&0\cr 0&2&0\cr}. $$

Für das Matrixpolynom $\rho(\mu)=A_0+A_1\mu+A_2\mu^2$ ergibt sich

$$ \rho(\mu) = \pmatrix{ \mu^2 & 0 & 0\cr 0 & 3\mu^2 & 1-4\mu\cr 0 & 0 & 3\mu^2-4\mu+1\cr} $$

Auffällig ist die obere Dreiecksgestalt und die Verteilung der $\mu^\kappa$-Terme auf der Diagonalen. Weiterhin erscheint in der rechten unteren Ecke das charakteristische Polynom des Korrektors. Dieser Sachverhalt gilt ganz allgemein. Der Nullensatz für Prädiktor-Korrektor-Verfahren lautet:

2. Satz: Sei $\rho_c$ das charakteristische Polynom des Korrektors. Für das Matrixpolynom eines $P(EC)^i\{E\}$-Verfahrens, mit

$$ \rho(\mu) = \sum_{\nu=0}^\kappa A_\nu\mu^\nu,\qquad \deg\rho = \kappa,\qquad A_\nu\in\mathbb{C}^{(i+1)\times(i+1)},\quad\nu=0,\ldots,\kappa, $$

hat man für alle $\mu\in\mathbb{C}$ die Darstellung

$$ \rho(\mu) = \pmatrix{ \alpha_{11}\mu^\kappa & 0 & \ldots & *\cr 0 & \alpha_{22}\mu^\kappa & \ldots & *\cr \vdots & \vdots & \ddots & \vdots\cr 0 & 0 & \ldots & \rho_c(\mu)\cr} \in \mathbb{C}^{(i+1)\times(i+1)}. $$

Beweis: (Nullensatz) Es ist

$$ u_{n+\nu} = \pmatrix{y^0_{n+\nu}\cr \vdots\cr y^{i-1}_{n+\nu}\cr y_{n+\nu}\cr}, \qquad \nu=-\kappa+1,\ldots,1. $$

Die ersten $i$ Komponenten der Vektoren $u_{n+\nu}$ kommen in der letzten Korrektorstufe nicht vor. Die Matrizen $A_0,\ldots,A_{\kappa-1}$ tragen Elemente lediglich auf der letzten Spalte, sind also alle von der Form

$$ A_0\sim A_1\sim\ldots A_{\kappa-1}\sim\pmatrix{ 0 & \ldots & 0 & *\cr \vdots & \ddots & \vdots & \vdots\cr 0 & \ldots & 0 & *\cr} $$

während hingegen $A_\kappa$ Diagonalgestalt hat, also

$$ A_\kappa = \pmatrix{ \alpha_{11} & & \llap{0}\cr & \ddots & \cr \rlap{0} & \ldots & \alpha_\kappa\cr}, $$

wobei $\alpha_{11}=\alpha_\kappa^P$, der führende Koeffizient des Prädiktor-Verfahrens ist und $\alpha_{22}=\cdots=\alpha_{i+1,i+1}=\alpha_\kappa$ gleich dem führenden Koeffizient des Korrektors ist. Summation der $A_\nu\mu^\nu$ ergibt die Behauptung. ☐

Nur $y^*_{n+1}$, also die Lösung der impliziten Korrektorgleichung für die Zeit $t_{n+1}$, ist gesucht. Die anderen vergangenen Werte sind schon gefunden. Die Zwischenwerte der vergangenen Iterationen werden nicht mehr verwendet, anders als bei semiiterativen Verfahren. Würde man diese dennoch verwenden, so könnte sich natürlich auch das Spektrum ändern, weil sich sich dann auch die obere Dreiecksgestalt ändert. Vorausgesetzt wird hier ebenfalls, daß immer mit dem gleichen Korrektor iteriert wird, und daß nur ein einziger Prädiktor genommen wird. Z.B. verwenden off-step-point Verfahren u.U. mehrere Prädiktoren, so auch Filippi/Kraska (1973).

Bibliographisch: Siegfried Filippi (1929--2022), Ernst Kraska (1932--2021), Todesanzeige.

3. Folgerungen: (1) Das charakteristische Polynom $Q(\mu,0)=\det\rho(\mu)$ sowohl des $P(EC)^iE$-, als auch des $P(EC)^i$-Verfahrens hat mindestens $\kappa i$ Nullen im Spektrum und die restlichen $\kappa$ Eigenwerte stimmen mit den Wurzeln des charakteristischen Polynomes $\rho_c$ des Korrektors überein.

(2) Insbesondere ist ein $P(EC)^iE$- bzw. $P(EC)^i$-Verfahren genau dann stabil, wenn der Korrektor stabil ist. Die Stabilität der Prädiktorformel ist völlig unerheblich für die $D$-Stabilität des $P(EC)^iE$- bzw. $P(EC)^i$-Verfahrens. Sehr wohl hat natürlich der Prädiktor Einfluß auf das Aussehen des Stabilitätsgebietes.

(3) Die Linkseigenvektoren $v_m$ mit $m=1,\ldots$, zu den nicht zu Null gehörenden Eigenwerten, also somit $m\le\kappa$, sind alle von der Form

$$ v_m = (0,{\mskip 3mu}\ldots,{\mskip 3mu}0,{\mskip 3mu}1)\in\mathbb{C}^{1\times(i+1)},\qquad m\le\kappa. $$

Folgerung (2) kann man auch auf andere Art und Weise einsehen. Seien die beiden Funktionen $\varphi$ und $\psi$ Lipschitz-stetig, dann ist auch die verkettete Funktion $\varphi\circ\psi$ Lipschitz-stetig. Ist also

$$ \left|\varphi(x)-\varphi(y)\right|\le K\left|x-y\right|\qquad\hbox{und}\qquad \left|\psi(u)-\psi(v)\right|\le L\left|u-v\right|, $$

so gilt

Denkt man sich nun ein Picard-Prädiktor-Korrektor-Verfahren nicht als einziges mehrstufiges, i.d.R. lineares Verfahren, sondern denkt man es sich als ein ineinander verschachteltes, i.d.R. nicht-lineares Verfahren, so sieht man ebenfalls sofort, daß für die Stabilitätseigenschaften bzgl. $h\to0$, nur der Korrektor maßgeblich ist und der Prädiktor unmaßgeblich ist.

4. Sei der Prädiktor $\hat z_{n+\kappa}$ gegeben durch die Matrix-Differenzengleichung

$$ \hat A_0z_n+\cdots+\hat A_{\kappa-1}z_{n+\kappa-1}+\hat A_\kappa\hat z_{n+\kappa} = h\varphi(z_n,\ldots,z_{n+\kappa-1}),\qquad \hat A_\kappa=I $$

und der Korrektorwert ergebe sich als Lösung der Matrix-Differenzengleichung

$$ A_0z_n + \cdots + A_\kappa z_{n+\kappa-1} + A_\kappa = h \psi(z_n,\ldots,z_{n+\kappa-1},z_{n+\kappa}), \qquad A_\kappa = I. $$

Picard-Iteration besteht nun darin, daß man den Wert $z_{n+\kappa}$ in der Funktion $\psi$ ersetzt durch die Iterierte der vorherigen Iteration, also wird $z_{n+\kappa}$ in $\psi$ ersetzt durch $\hat z_{n+\kappa}$. Direkte Substitution der Bestimmungsgleichung für den Prädiktorwert $\hat z_{n+\kappa}$ in die Funktion $\psi$, ergibt dann

$$ \sum_{i=0}^\kappa A_i z_{n+i} = h\vartheta(z_n,\ldots,z_{n+\kappa-1}), $$

wobei die Funktion $\vartheta$ Lipschitz-stetig ist und damit erhält man mit den üblichen Sätzen bei gegebener Konsistenz und Stabilität dann die Konvergenz. Über die Entstehungsgeschichte von $\vartheta$ braucht man nichts zu wissen, außer eben, daß $\vartheta$ Lipschitz-stetig bzgl. seiner Argumente ist. Insbesondere die Nullstabilität hängt jetzt offensichtlich nur noch von den Matrizen $A_i$ ab. Iteriert man häufiger als einmal, also Picard-$P(EC)^i\{E\}$ (i>1), so wird die Verschachtelungtiefe nur höher, am Prinzip ändert sich nichts.

Aus der Folgerung (3) ergibt sich jetzt sofort, daß die Fehlervektoren des Prädiktors und die Fehlervektoren der Zwischeniterierten von den Linkseigenvektoren $v_m$ vollständig weggefiltert werden. Dies heißt, die Eigenwerte nicht gleich Null bekommen nur den Fehlerfaktor des Korrektors zu Gesicht und die restlichen Fehlerfaktoren werden von den Nullen im Spektrum total annulliert. Überhaupt gibt es Parallelen zwischen Runge-Kutta-Verfahren und Prädiktor-Korrekor-Verfahren. Prädiktor-Korrektor-Verfahren steht hier sowohl für $P(EC)^iE$- also auch für $P(EC)^i$-Verfahren, kurz $P(EC)^i\{E\}$-Verfahren. Diesen Effekt der Totalannullation kann man anhand eines Beispiels besonders deutlich nachvollziehen.

5. Beispiel: Als Prädiktor-Korrektor-Verfahren werde verwendet

Schritt	Formel	Verfahren
Prädiktor	$y^0_{n+1}=y_n+z_n$	explizites Euler-Verfahren
Korrektor	$y_{n-1}-4y_n+3y_{n+1}=2z^0_{n+1}$	BDF2

Mit $u_n=(y^0_n,{\mskip 3mu}y_n)$ erhält man die sechs Matrizen

$$ A_0=\pmatrix{0&0\cr 0&1\cr},\quad A_1=\pmatrix{0&-1\cr 0&-4\cr},\quad A_2=\pmatrix{1&0\cr 0&3\cr},\qquad B_0={\bf 0},\quad B_1=\pmatrix{0&1\cr 0&0\cr},\quad B_2=\pmatrix{0&0\cr 2&0\cr} . $$

Als Lösung für die Differenzengleichung

$$ A_2u_{n+2}+A_1u_{n+1}+A_0u_n=\Gamma\hat Z,\qquad n=0,1,\ldots $$

erhält man nach Durchmultiplikation mit $A_2^{-1}$ für den dominanten Term

$$ \pmatrix{1&*&*&1\cr 0&*&*&1\cr} \pmatrix{0&&&\cr &0&&\cr &&1/3&\cr &&&1\cr} \pmatrix{*&*\cr *&*\cr 0&1\cr 0&1\cr} \pmatrix{1/2&*\cr 0&-2/3\cr} \hat Z. $$

Die ersten Fehlerterme des Prädiktors sind $h^2/2\ddot y+{\cal O}(h^3)$ und für den Korrektor lauten sie $-2h^3/3y^{III}+{\cal O}(h^4)$.

Die Fehlervektoren des Prädiktors liegen nun gerade so, daß sie genau auf diese Nullen heraufpassen, das heißt die Fehlervektoren stehen senkrecht auf den nicht zu Null gehörenden Jordanvektoren. Die vom Prädiktor gelieferten niedrigen Konsistenzordnungen, werden deswegen total annulliert (Totalannullation). Dieses Verhalten ist völlig analog dem Verhalten bei Runge-Kutta-Verfahren, wo die Stufen mit niedrigen Konsistenzordnungen von den Nullen in der Jordanmatrix

$$ J = \pmatrix{ 0 & & & \llap0\cr & \ddots & & \cr & & 0 & \cr \rlap0 & & & 1\cr} $$

vollständig bedämpft werden und somit keinerlei Wirkung zeigen. Dies gilt zumindestens im asymptotischen Falle $h\to 0$, wo allein einzig die $A_\nu$ entscheidend wirken und die Matrizen $B_\nu$ keine Rolle spielen.

6. Beispiel: Runge-Kutta-Verfahren mit insgesamt 4 Stufen.

$$ A = \pmatrix{ 0 & 0 & 0 & 1\cr 0 & 0 & 0 & 1\cr 0 & 0 & 0 & 1\cr 0 & 0 & 0 & 1\cr}, \quad X = \pmatrix{ 1 & 0 & 0 & 1\cr 0 & 1 & 0 & 1\cr 0 & 0 & 1 & 1\cr 0 & 0 & 0 & 1\cr}, \quad J = \pmatrix{ 0 & & & \llap0\cr & 0 & & \cr & & 0 & \cr \rlap0 & & & 1\cr}, \quad Y = \pmatrix{ 1 & -1 & 0 & 0\cr 0 & 1 & -1 & 0\cr 0 & 0 & 1 & -1\cr 0 & 0 & 0 & 1\cr} $$

Die Konsistenzordnung kann pro Stufe um eine Einheit steigen. Die Matrix $C$ hat somit die Form

$$ C = \pmatrix{ c_1 & * & * & *\cr 0 & c_2 & * & *\cr 0 & 0 & c_3 & *\cr 0 & 0 & 0 & c_4\cr} $$

7. Beispiel: Die hier zutage tretende Ähnlichkeit zwischen Runge-Kutta-Verfahren und $P(EC)^i\{E\}$-Verfahren gilt sogar soweit, daß manche $P(EC)^i\{E\}$-Verfahren mit bestimmten expliziten Runge-Kutta-Ver-fahren völlig gleichwertig sind. Zum Beispiel gilt für das verbesserte Euler-Verfahren mit dem Parametertableau

$$ \begin{array}{c|cc} 1 && 1 & \cr \hline && {1\over2} & {1\over2}\cr \end{array} \qquad\qquad \eqalign{k_0 &= f(t_0,y_0)\cr k_1 &= f(t_0+h,y_0+hk_0)\cr \hline y_1 &= y_0+{h\over2}\left(k_0+k_1\right)\cr } $$

daß es völlig identisch ist mit dem impliziten Trapezverfahren, wobei das explizite Euler-Verfahren als Prädiktor verwendet wird:

$$ % sich ausdehnender Rechtspfeil, s.Knuth, S.325 % ev. noch ein \smash einfügen, sodaß \matrix "nichts" merkt %\def\mapright#1{ % \setbox0=\hbox{$#1$} % temporär ablegen und dann ausmessen % \dimen0=\wd0 % weil man \wd0 nicht "advancen" kann % \advance\dimen0 by 0.7cm % bißchen mehr Platz % \mathop{ % \limits will dies halt so % \hbox to \dimen0{\rightarrowfill} % } \limits^{#1} % darüber die Information %} y_{n+1} = y_n + {h\over2}\left\{f_n+f_{n+1}\right\} %\mapright{PECE} \overset{PECE}\longrightarrow y_n + {h\over2}\left\{f_n+f(t_{n+1},y_n+hf_n)\right\}. $$

Die Analyse beider Verfahren geschieht häufig völlig getrennt. Die Konsistenzordnung 2 des verbesserten Euler-Verfahrens weist man häufig durch Taylorentwicklung direkt nach. Beim Prädiktor-Korrektor Verfahren wendet man die Konsistenzsätze an, weist Stabilität des Korrektors nach und zeigt schließlich mit Hilfe des Satzes von Liniger (1971), daß die Konsistenzordnung des Korrektors erhalten bleibt, wenn man ausreichend lange iteriert.

8. Wichtig für das Konvergenzverhalten ist die spektrale Struktur der drei Matrizen $X$, $J$ und $Y$:

$$ \eqalign{ X: & \qquad XJ^\nu Y\gamma=\ldots,\cr J: & \qquad J^\nu Y\gamma=\ldots,\cr Y: & \qquad Y\gamma=(\ldots 0\ldots)^\top.\cr } $$

Seien $v_1,\ldots,v_r$ die Linkseigenvektoren zum Matrixpolynom $\rho$ zum Eigenwert $\lambda=1$, so gilt als Bedingung der annullierten Dominanz

$$ v_i{\mskip 3mu}\rho(1)=0,\quad v_i{\mskip 3mu}\gamma=0,\quad v_i\ne{\bf0}^\top,\qquad i=1,\ldots,r. $$

Für den Fall $r=1$ erhält man also die geometrische Bedingung, daß die Spalten der Matrix $(\rho(1),{\mskip 3mu}\gamma)$ aus dem orthogonalen Komplement von $v$ sein müssen, somit

$$ (\rho(1),{\mskip 3mu}\gamma)\in v^\bot. $$

Ist der Linkskern von $\rho(1)$ nicht mehr 1-dimensional, sondern $r$-dimensional, so hat die Bedingung zu gelten

$$ \def\spn{\mathop{\rm span}} \left(\rho(1),\gamma\right)\in\left(\spn(v_1,\ldots,v_r)\right)^\bot $$

Für Eigenwerte $\left|\mu\right|=1$ gilt ganz entsprechend

$$ \left(\rho(\mu),\gamma\right)\in\left(\spn(v_1,\ldots,v_r)\right)^\bot, $$

wobei $r$ jetzt die Vielfachheit des Eigenwertes zu $\left|\mu\right|=1$ ist und entsprechend $v_1,\ldots,v_r$ die Linkseigenvektoren zu diesem Eigenwert sind. Algebraische Vielfachheit (=Multiplizität der Nullstelle des charakteristischen Polynoms) und geometrische Vielfachheit (=Dimension des invarianten Unterraumes) müssen bei dominanten Eigenwerten natürlich gleich sein. Anhand der oben schon angegebenen Darstellung für die Lösung der Matrixdifferenzengleichung, und zwar in der Form

$$ u_{m+1} = XT^{m+1}c + X\sum_{i=0}^m T^{m-i}Yf_i, \qquad m=0,1,\ldots, $$

ist das Auftauchen der Linkseigenvektoren sofort offenkundig. Die Bedingung der annullierten Dominanz ist eine stetige Invariante, da bei einfachen Eigenwerten die Eigenvektoren stetig von kleinen Änderungen abhängen. Bei mehrfachen Eigenwerten muß dies nicht unbedingt gelten.

7. Das $n$-dimensionale äußere Produkt für $n-1$ Vektoren

Man vgl. auch On Differential Forms.

1. Da die Determinante in jeder Spalte linear ist, stellt

$$ x\mapsto\det\left(a_1,\ldots,a_{n-1},x\right) $$

für fest gegebene $a_1,\ldots,a_{n-1}\in\mathbb{R}^n$, eine Linearform des $\mathbb{R}^n$ dar. Nach dem Darstellungssatz von Riesz:

Sei $H$ ein Hilbertraum, und sei $f:H\to\mathbb{C}$ ein stetiges lineares Funktional, dann

$$ \dot\exists b\in H:\enspace\forall x\in H:\quad f(x)=\langle b,x\rangle$$

und weiter ist $|b|=|f|$.

Daher gibt es genau einen Vektor $b\in\mathbb{R}^n$, sodaß die Linearform als Skalarprodukt geschrieben werden kann:

$$ \dot\exists b:\enspace\forall x:\quad \det\left(a_1,\ldots,a_{n-1},x\right)=\langle b,x\rangle \qquad(a_i\hbox{ fest}). \tag{*} $$

Bibliographisch: Riesz, Friedrich (1880--1956).

2. Diesen, implizit durch das Skalarprodukt, eindeutig bestimmten Vektor $b$ nennt man das Vektorprodukt (auch Kreuzprodukt oder äußeres Produkt) und schreibt hierfür

$$ b =: a_1\wedge\cdots\wedge a_{n-1} =: \bigwedge_{i=1}^{n-1} a_i, $$

oder auch

$$ a_1\times\cdots\times a_{n-1}=\mathop{\times}_{i=1}^{n-1} a_i. $$

Es gilt also

$$ \det\left(a_1,\ldots,a_{n-1},x\right) = \left\langle\bigwedge_{i=1}^{n-1}a_i,x\right\rangle.\tag{**} $$

3. Hieraus liest man ab

$$ \eqalign{ \bigwedge_{i=1}^{n-1}a_i=0\quad &\Longleftrightarrow\quad a_i\hbox{ linear abhängig},\cr a_1\wedge\cdots\wedge a_i\wedge\cdots\wedge a_k\wedge\cdots\wedge a_{n-1}&= -\left(a_1\wedge\cdots\wedge a_k\wedge\cdots\wedge a_i\wedge\cdots\wedge a_{n-1}\right),\cr a_1\wedge\cdots\wedge a_i+\hat a_i\wedge\cdots\wedge a_{n-1} &= \left(a_1\wedge\cdots\wedge a_i\wedge\cdots\wedge a_{n-1}\right) + \left(a_1\wedge\cdots\wedge \hat a_i\wedge\cdots\wedge a_{n-1}\right),\cr a_1\wedge\cdots\wedge\lambda a_i\wedge\cdots\wedge a_{n-1} &= \lambda\left(a_1\wedge\cdots\wedge a_i\wedge\cdots\wedge a_{n-1}\right),\cr \left\langle\bigwedge_{i=1}^{n-1}a_i,a_k\right\rangle &= 0, \quad k=1,\ldots,n-1.\cr } $$

Die letzte Gleichung sagt, daß das äußere Produkt senkrecht auf jedem “Einzelfaktor” steht. Weiter kann man jetzt noch die Jacobische und die Grassmannsche Identität leicht nachrechnen. Die obigen Gleichungen gelten auch für $n=2$, wobei dann $\bigwedge_{i=1}^1a_i=a_1$ ist.

Bibliographisch: Grassmann, Hermann (1809--1877), Jacobi, Carl Gustav (1804--1851).

Das oben eingeführte äußere Produkt ist ein spezielles äußeres Produkt. Es gibt weitere äußere Produkte. Bei diesen ist der Bildbereich nicht mehr notwendig gleich $\mathbb{C}^n$, sondern $\mathbb{C}[{n\choose m}]$, bei einem $m$-fachen Produkt. Für $m=n-1$ ergibt sich natürlich genau das oben angegebene Produkt, bis auf Proportionalität.

4. Die Komponenten des Vektors $b$ bei $(*)$, ergeben sich durch sukzessives Einsetzen der $n$ Einheitsvektoren $e_i$ zu

$$ b_i = \left\langle b,e_i\right\rangle = \det\left(a_1,\ldots,a_{n-1},e_i\right), \qquad i=1,\ldots,n. $$

Für den Betrag des äußeren Produktes gilt

$$ \left|\bigwedge_{i=1}^{n-1}a_i\right|^2 = \left|\matrix{ \langle a_1,a_1\rangle & \ldots & \langle a_1,a_{n-1}\rangle\cr \vdots & \ddots & \vdots\cr \langle a_1,a_{n-1}\rangle & \ldots & \langle a_{n-1},a_{n-1}\rangle\cr }\right|, $$

weges des Satzes über die Gramsche Determinante (Gram, Jorgen Pedersen (1850--1916))

$$ \det(a_1,\ldots,a_n){\mskip 3mu}\det(b_1,\ldots,b_n) = \left|\matrix{ \langle a_1,b_1\rangle & \ldots & \langle a_1,b_n\rangle\cr \vdots & \ddots & \vdots\cr \langle a_n,b_1\rangle & \ldots & \langle a_n,b_n\rangle\cr }\right|. $$

5. Die Definition des Vektorproduktes kann auch in der folgenden Form geschehen:

$$ x\mapsto\det(a_1,\ldots,a_{i-1},x,a_{i+1},\ldots,a_n) $$

ist eine Linearform für feste $a_i$, u.s.w. Die $a_1,\ldots,a_{i-1},x,,a_{i+1},\ldots,a_n$ bilden ein Rechtssystem.

6. Beispiel: $n=3$. Gesucht sind die Komponenten des Vektorproduktes $a\times b$, mit $a=(\alpha_1,{\mskip 3mu}\alpha_2,{\mskip 3mu}\alpha_3)$ und $b=(\beta_1,{\mskip 3mu}\beta_2,{\mskip 3mu}\beta_3)$. Zu berechnen sind drei Determinanten,

$$ a\times b = \pmatrix{ \left|\matrix{\alpha_1&\beta_1&1\cr \alpha_2&\beta_2&0\cr \alpha_3&\beta_3&0\cr}\right|\\[0.5em] \left|\matrix{\alpha_1&\beta_1&0\cr \alpha_2&\beta_2&1\cr \alpha_3&\beta_3&0\cr}\right|\\[0.5em] \left|\matrix{\alpha_1&\beta_1&0\cr \alpha_2&\beta_2&0\cr \alpha_3&\beta_3&1\cr}\right|\cr} = \pmatrix{ \alpha_2\beta_3-\alpha_3\beta_2\cr \alpha_3\beta_1-\alpha_1\beta_3\cr \alpha_1\beta_2-\alpha_2\beta_1\cr}. $$

7. Beispiel: $n=2$. Gesucht sind die Komponenten des Vektors $a^\bot$, welcher senkrecht steht auf $a$, mit $a={\alpha_1\choose\alpha_2}$. Das äußere Produkt liefert gerade solch einen Vektor. In diesem Fall hat das Produkt nur einen Faktor. Zu berechnen sind hier $n=2$ Determinanten und zwar

$$ a^\bot = \pmatrix{ \left|\matrix{\alpha_1 & 1\cr \alpha_2 & 0\cr}\right| \\[0.5em] \left|\matrix{\alpha_1 & 0\cr \alpha_2 & 1\cr}\right| \cr} = \pmatrix{-\alpha_2\cr \alpha_1\cr}. $$

Eine Einführung in das Vektorprodukt findet man beispielsweise in den Büchern von Walter (1986) oder Koecher (1985). Besonders hervorzuheben ist hierbei die ausführliche Darstellung von Gröbner (1966).

Bibliographisch: Rolf Walter (1937--2022), Max Koecher (1924--1990), Wolfgang Gröbner (1899--1980).

8. Äußeres Produkt und Fehlerkonstanten

Für die Fehlerkonstante von Henrici

$$ C := {v{\mskip 3mu}\gamma\over v{\mskip 3mu}\rho'(1){\mskip 3mu}w},\qquad\left\{\eqalign{ v{\mskip 3mu}\rho(1)&=0, \quad v\ne0,\cr \rho(1){\mskip 3mu}w&=0, \quad w\ne0,\cr }\right. $$

erhält man nun das folgende Resultat. Da das äußere Produkt $\bigwedge_{i=1}^{n-1}a_i$ senkrecht steht auf $a_i$, für $i=1,\ldots,n-1$, ist dieses Produkt also Linkseigenvektor von $\rho(1)$, wenn man die Spalten der Matrix $\rho(1)$ mit $a_i$ bezeichnet und einen Spaltenvektor, sagen wir $a_n$, herausstreicht. Wenn man einmal von Umnumerierungen absieht, so hat man damit alle Fälle abgedeckt. Die Restmatrix sei

$$ \widehat{\rho(1)}\in\mathbb{R}^{n\times(n-1)}. $$

Ist $\bigwedge_{i=1}^{n-1}a_i=0$, so hat $\rho(1)$ einen mehrfachen Eigenwert $\mu=1$; man erhält hier also zugleich ein leichtes Kriterium, unter der Voraussetzung starker Stabilität. Dies liegt daran, daß das Vektorprodukt genau dann verschwindet, falls die Faktoren linear abhängig sind, siehe Gröbner (1966). Da die Berechnung von $\bigwedge_{i=1}^{n-1}a_i$ allerdings häufig über Determinanten geschieht, ist dieses Kriterium von der praktischen Rechnung nicht immer günstig. Wie starke Stabilität nachgewiesen wurde, sei hier dazu noch nicht einmal berücksichtigt. Für die Fehlerkonstante ergibt sich wegen $(**)$

$$ C = {\det(\widehat{\rho(1)},\gamma)\over \det(\widehat{\rho(1)},\rho'(1){\mskip 3mu}w)}. $$

Dies heißt, das Volumen der durch die linear unabhängigen Spaltenvektoren von $\rho(1)$ und dem Vektor $\gamma$ aufgespannten Körpers, ist der Zähler der Henricischen Fehlerkonstante.

Verschwindet der Zähler der Henricischen Fehlerkonstante, so liegt annullierte Dominanz vor. Albrecht (1979) nennt dies die Ordnungsbedingung. Durch Berechnen von $\det\left(\widehat{\rho(1)},\gamma\right)$ kann man dies also überprüfen. Diese Prüfung auf annullierte Dominanz kann man natürlich auch ohne den Umweg über das Kreuzprodukt, wie folgt herleiten. Aus

$$ v\ne0,\qquad v{\mskip 3mu}\rho(1)=0,\qquad v{\mskip 3mu}\gamma=0 $$

folgt, daß die zusammengesetzte Matrix $\left(\rho(1),\gamma\right)$ nicht maximalen Rang haben kann, also

$$ \mathop{\rm rank}\left(\rho(1),\gamma\right) = \kappa-n_1,\qquad \hbox{somit}\qquad \det\left(\widehat{\rho(1)},\gamma\right)=0. $$

Hierbei war $n_1$ die Vielfachheit des Eigenwertes $\mu=1$. Für Eigenwerte $\left|\mu\right|=1$ gilt allgemein $\mathop{\rm rank}\left(\rho(\mu),\gamma\right)=\kappa-n_\mu$, mit $n_\mu$ Vielfachheit des Eigenwertes $\left|\mu\right|=1$. Falls $n_\mu\ge1$ dann $\det\left(\widehat{\rho(1)},\gamma\right)=0$.

Damit sind alle denkbaren Fälle erschöpft, wenn man von Umnumerierungen absieht.

1. Beispiel: Zweistufiges, zyklisches lineares Mehrschrittverfahren mit zwei Startwerten. Die erste Stufe, mit Fehlerfaktor $\gamma_1$, sei

$$ \alpha_0^1y_{2n}+\alpha_1^1y_{2n+1}+\alpha_2^1y_{2n+2}=\ldots $$

und die zweite Stufe, mit Fehlerfaktor $\gamma_2$, sei

$$ \alpha_0^2y_{2n}+\alpha_1^2y_{2n+1}+\alpha_2^2y_{2n+2}+\alpha_3^2y_{2n+3}=\ldots{\mskip 3mu}. $$

Dann ist

$$ \rho(1) = \pmatrix{ \alpha_0^1+\alpha_2^1 & \alpha_1^1\cr \alpha_0^2+\alpha_2^2 & \alpha_1^2+\alpha_3^2\cr}. $$

2. Bedingung der annullierten Dominanz für zweistufige Verfahren. Für zweistufige Verfahren hat man

$$ \def\sumalf#1#2#3{\displaystyle\sum_{i\mathbin\%#3={#1}}\alpha_i^{#2}} \rho(1) = \pmatrix{ \sumalf012 & \sumalf112\cr \sumalf022 & \sumalf122\cr} =: \pmatrix{ \alpha_g^1 & \alpha_u^1\cr \alpha_g^2 & \alpha_u^2\cr} $$

und der Fehlervektor sei $\gamma=(\gamma_1,{\mskip 3mu}\gamma_2)$. Aufgrund der Konsistenz $\rho(1){\mskip 3mu}{1\choose 1}={0\choose0}$, ist

$$ \alpha_u^1+\alpha_g^1=0, \qquad \alpha_g^2+\alpha_u^2=0. $$

Ist nun die Matrix $\rho(1)$ nicht die Nullmatrix, so erhält man als Linkseigenvektor zu $\rho(1)$ natürlich

$$ \pmatrix{\alpha_g^1\cr \alpha_g^2\cr}^\bot = \pmatrix{-\alpha_g^2\cr \alpha_g^1\cr} $$

und damit als Bedingung für annullierte Dominanz

$$ \alpha_g^1\gamma_2 = \alpha_g^2\gamma_1. $$

Wäre jetzt $(\alpha_g^1,{\mskip 3mu}\alpha_g^2)=(0,{\mskip 3mu}0)$, so wäre die Matrix $\rho(1)$ gleich der Nullmatrix und die Bedingung der annullierten Dominanz führte zu der Bedingung, daß der Fehlervektor $\gamma$ sowohl auf $1\choose 0$, als auch auf $0\choose 1$ senkrecht stehen müßte. Damit wäre $\gamma_1=\gamma_2=0$, die Bedingung also leer. Die Konsistenzordnung im modifizierten Sinne wäre schon eine Ordnung höher als die wirklich erreichte Konvergenzordnung.

3. Beispiel: Verwendet man als erste Stufe das ^{Verfahren von Milne-Simpson} der Ordnung 4, mit

$$ 3y_{n+1} - 3y_{n-1} = z_{n+1} + 4z_n + z_{n-1} $$

und als zweite Stufe ein beliebiges Verfahren dritter Ordnung, so ist die Bedingung der annullierten Dominanz automatisch erfüllt, wegen

$$ \alpha_g^1 = 0, \qquad \gamma_1 = 0. $$

Das so gebildete zweistufige Verfahren konvergiert dann insgesamt mit der Ordnung 4.

4. Bedingung der annullierten Dominanz für dreistufige Verfahren. Sei der Fehlervektor des Verfahrens bezeichnet mit $\gamma=(\gamma_1,{\mskip 3mu}\gamma_2,{\mskip 3mu}\gamma_3)$ und der Matrix $\rho(1)$ sei gegeben durch

$$ \rho(1) = \pmatrix{ \sumalf013 & \sumalf113 & \sumalf213\cr \sumalf023 & \sumalf123 & \sumalf223\cr \sumalf033 & \sumalf133 & \sumalf233\cr} =: \pmatrix{ * & m_1 & n_1\cr * & m_2 & n_2\cr * & m_3 & n_3\cr} $$

wobei

$$ \eqalign{ m_i: &\quad\hbox{Summe der $\alpha_i$-Koeffizienten mit $i=3k+1$,}\cr n_i: &\quad\hbox{Summe der $\alpha_i$-Koeffizienten mit $i=3k+2$ .}\cr } $$

Als Bedingung für annullierte Dominanz ergibt sich nun

$$ \gamma_1(m_2n_3-m_3n_2)+\gamma_2(m_3n_1-m_1n_3)+\gamma_3(m_1n_2-m_2n_1)=0, $$

unter der Voraussetzung, daß $\rho(1)$ den Rang 2 hat.

Die Verallgemeinerung auf den $r$-stufigen Fall ergibt unmittelbar

$$ \rho(1) = \pmatrix{ \sumalf01r & \ldots & \sumalf{r-1}1r\cr \vdots & \ddots & \vdots\cr \sumalf0rr & \ldots & \sumalf{r-1}rr\cr } $$

9. Rechenregeln für Fehlerkonstanten

1. Henrici, Peter Karl Eugen (1923--1987). Hier wird nun allgemeiner eine Klasse von Fehlerkonstanten vorgestellt und die Beziehung zueinander werden aufgezeigt. Liegt das Verfahren $z_{n+1}=Az_n+h\varphi_n$ zugrunde mit Fehlervektor $\gamma$, so wird eine Fehlerkonstante definiert durch

$$ C_A = {\tilde v{\mskip 3mu}\tilde\gamma\over\tilde vw}, \qquad\left\{ \eqalign{\tilde vA&=\tilde v,\quad\tilde v\ne0,\cr Aw&=w,\quad w\ne0.\cr}\right. $$

Sei jetzt leicht allgemeiner $Lz_{n+1} = Uz_n + h\varphi_n$. Dann gilt

$$ C_A = {v{\mskip 3mu}\gamma\over vLw}, \qquad\left\{ \eqalign{v(L+U)&=0,\quad v\ne0\cr (L+U)w&=0,\quad w\ne0.\cr}\right. $$

Dies gilt wegen $A=-L^{-1}U$, daher

$$ A-I=-(L^{-1}U+I)=-L^{-1}(L+U). $$

Aus $\tilde v(A-I)=0$ folgt

$$ \tilde vL^{-1}(L+U) = (vL) L^{-1}(L+U)=0, $$

und damit ist $v$ Linkseigenvektor von $L\mu+U$ zum Eigenwert $\mu=1$, während $\tilde v=vL^{-1}$ und $\tilde\gamma=L^{-1}\gamma$ war. Vorausgesetzt ist natürlich, daß das Matrixpolynom $L\mu+U$ monisch ist, also $L$ invertierbar ist. Wegen der Null-Stabilität des Verfahrens ist das natürlich der Fall. Erkennbar ist auch, daß der Nenner nicht verschwinden kann, da die Matrix $A$ zur Klasse M gehört, siehe Ortega (1972).

Bibliographisch: Ortega, James McDonough.

Die obige Fehlerkonstante verallgemeinert sich sinngemäß bei mehrfachen Eigenwerten $\mu=1$.

Da die zyklischen Verfahren in der Dissertation von Tendler (1973) oder Tendler/Bickart/Picel (1978), in der Dissertation von Tischer (1983) und Tischer/Sacks-Davis (1983) und schließlich auch alle zyklischen Verfahren von Donelson/Hansen (1971) jedoch nur einen einfachen Eigenwert bei $\mu=1$ besitzen, wird dieser Fall hier nicht weiter verfolgt.

Bibliographisch: Peter E. Tischer, Ron Sacks-Davis, Donelson III, John (1941--2010), biography, Hansen, Eldon Robert (*1927), wiki.

Joel Marvin Tendler: "A Stiffly Stable Integration Process Using Cyclic Composite Methods", Ph.D. Diss., Syracuse University, Syracuse, New York, 26.Feb.1973, viii+iv+172 pages.

Im folgenden werden zwei Eigenschaften der Fehlerkonstanten von Henrici gezeigt. Zum einen ist die Fehlerkonstante von Henrici unabhängig von einer Skalierung des Verfahrens und zum anderen kann man sich auf den Fall einer Linearisierung des Matrixpolynomes beschränken. Für die praktische Rechnung von Linkseigenvektoren und weiteren Größen ist es natürlich günstiger das Verfahren in Form eines Matrixpolynomes mit möglichst geringer Dimension darzustellen. An anderer Stelle wiederum ist es angebrachter die Linearisierung zu betrachten, um nur mit einer einzigen Matrix zu hantieren. Daher ist es günstig für beide Darstellungen äquivalente Beschreibungen zur Verfügung zu haben.

Als nächstes wird nun also gezeigt, daß die Fehlerkonstante von Henrici unanhängig von einer Skalierung der Stufen ist. Jede Stufe darf beliebig mit einem Faktor $(\ne\!0)$ multipliziert werden, der gesamte Zyklus darf sogar einer nichtsingulären Skalierung unterzogen werden. Weiterhin ersieht man hieraus, daß die Fehlerkonstante von Henrici unabhängig von der Reihenfolge der Stufen ist. Für die umgedrehte Reihenfolge der Stufen wählt man beispielsweise einfach die Hankelmatrix $D=(\delta_{i,s+1-j})_{i,j=1}^s\in\mathbb{C}^{s\times s}$. Eine Vertauschung von Stufen innerhalb eines Zykluses kann natürlich sehr wohl die Anzahl der Startwerte ändern, u.U. kann sich also auch sogar die Anzahl der Matrizen im Matrixpolynom ändern. Dieser Fall ist dennoch mitberücksichtigt, da man ja das “alte” Verfahren mit Nullmatrizen ergänzen kann.

2. Satz: Sei $D\in\hbox{GL}(\mathbb{C},s)$ und sei $\hat\rho(\mu)=D\rho(\mu)$ das skalierte Polynom. Das zu dem Matrixpolynom $\rho$ gehörige Verfahren habe die Fehlerkonstante

$$ C={v{\mskip 3mu}\gamma\over v{\mskip 3mu}\rho'(1){\mskip 3mu}w}\qquad\left\{\eqalign{ v{\mskip 3mu}\rho(1)&=0,\quad v\ne0,\cr \rho(1){\mskip 3mu}w&=0,\quad w\ne0.\cr}\right. $$

Der Vektor $\gamma$ enthält hierbei die Fehlerfaktoren der Stufen. $\hat C$ sei die Fehlerkonstante von Henrici des skalierten Verfahrens. Dann gilt

$$ C = \hat C. $$

Beweis: Es ist $\hat\rho(1)=D\rho(1)$, $\hat\gamma=D\gamma$, und $\hat v:=vD^{-1}$ ist Linkseigenvektor von $\hat\rho(1)$. Die Ableitung des Matrixpolynomes $\rho$ an der Stelle 1 skaliert sich ebenfalls entsprechend, also $\hat\rho'(1)=D\rho'(1)$. Der Rechtseigenvektor $w$ ist auch gleichzeitig Rechtseigenvektor von $\hat\rho(1)$. Nun ist

$$ \hat C = {\hat v\hat\gamma\over\hat v{\mskip 3mu}\hat\rho'(1){\mskip 3mu}\hat w} = {vD^{-1}D\gamma\over vD^{-1}D\rho'(1){\mskip 3mu}w} = C. $$

☐

3. Sei jetzt allgemeiner statt des Matrixpolynoms $\rho(\mu)=L\mu+U$, betrachtet der Fall des Matrixpolynoms

$$ \rho(\mu) := A_0+A_1\mu+\cdots+A_\kappa\mu^\kappa. $$

Dann ist

$$ C_H = {v{\mskip 3mu}\gamma\over v{\mskip 3mu}\rho'(1){\mskip 3mu}w}, $$

eine Fehlerkonstante. Hierbei sind $v$ und $w$ entsprechend die Links- und Rechtseigenvektoren des Matrixpolynomes $\rho(\mu)$ zum dominanten Eigenwert $\mu=1$, es ist also

$$ v{\mskip 3mu}\rho(1)=0,\quad v\ne0\qquad\quad\hbox{und}\qquad\quad \rho(1){\mskip 3mu}w=0,\quad w\ne0. $$

Durch Wahl einer speziellen Norm und entsprechende Normierung des Vektors $w$ kann man dann auch den bestimmten Artikel benutzen.

In natürlicher und offensichtlicher Weise wird hiermit die klassische Fehlerkonstante von Henrici verallgemeinert. Auch hier kann der Nenner nicht verschwinden, da, wie unten gezeigt wird, diese Konstante mit der oben angegebenen Konstante äquivalent ist. Sind die Koeffizienten des Polynomes $A_i$ nicht von der Form $\alpha_i\otimes I$, so gilt nicht notwendig $\rho'(1)=\sigma(1)$, wie man anhand des folgenden Beispiels einsieht.

4. Beispiel: Die zyklische, zweimalige Hintereinanderausführung der BDF2 führt auf die Matrix mit den Einträgen

$$ \pmatrix{A_0, &A_1 &| &B_0, &B_1 \cr} $$

und den Elementen

$$ \left( \begin{array}{cccc|cccc} 1 & -4 & 3 & 0 & \tt & 0 & 0 & 2 & 0\cr 0 & 1 & -4 & 3 & \tt & 0 & 0 & 0 & 2\cr \end{array} \right). $$

Hier ist $\rho(\mu)=A_0+A_1\mu$ und $\sigma(\mu)=B_1\mu$. Offensichtlich gilt jetzt nicht $\rho'(1)=\sigma(1)$, da $\rho'(1)\equiv A_1\ne\sigma(1)\equiv B_1$ und dies obwohl jede Stufe die gleiche Konsistenzordnung hat, ja sogar alle Stufen gleich sind.

Nun wird gezeigt, daß alle angegebenen Fehlerkonstanten äquivalent sind. Um die nachstehenden Überlegungen durchsichtiger zu gestalten, soll anhand eines einfachen Beispieles unter anderem einige Eigenschaften der Begleitmatrix gezeigt werden.

5. Beispiel: Es sei das Polynom

$$ \def\aa{\alpha_0} \def\ab{\alpha_1} \def\ac{\alpha_2} \rho(\mu)=\aa+\ab\mu+\ac\mu^2+\mu^3 $$

vorgelegt, und es sei $\rho(1)=0$. Die Koeffizienten dieses Polynoms seien aus einem beliebigen Ring, nicht notwendig kommutativ, wobei 1 das neutrale Element bezeichne. Die Begleitmatrix zu $\rho$ sei

$$ C_1 = \pmatrix{ 0 & 1 & 0\cr 0 & 0 & 1\cr -\aa & -\ab & -\ac\cr}, \qquad\hbox{also}\qquad I-C_1 = \pmatrix{ 1 & -1 & 0\cr 0 & 1 & -1\cr \aa & \ab & 1+\ac\cr}. $$

Jetzt ist

$$ v = (\ab+\ac+1, \ac+1, 1) $$

Linkseigenvektor des Matrixpolynoms $I\mu-C_1$ zu $\mu=1$, wegen

$$ v(I-C_1) = \bigl( (\ab+\ac+1)+\aa,{\mskip 3mu}-(\ab+\ac+1)+(\ac+1)+\ab,{\mskip 3mu}-(\ac+1)+(\ac+1)\bigr) = (0,{\mskip 3mu}0,{\mskip 3mu}0), $$

da ja $\aa+\ab+\ac+1=0$, aufgrund $\rho(1)=0$. Wichtig ist noch zu vermerken, daß die Summe der Komponenten des Vektors $v$, gerade die Ableitung des Polynoms an der Stelle 1 ist, also es gilt

$$ v\pmatrix{1\cr 1\cr 1\cr} = (\ab+\ac+1)+(\ac+1)+1 = 3+2\ac+1\ab = \rho'(1). $$

Wegen $\rho(1)=0$ ist selbstverständlich $w^\top=(1,1,1)$ Rechtseigenvektor der Matrix $C_1$. Den Linkseigenvektor zu $C_1\in\mathbb{C}^{s\times s}$ kann man natürlich auch über das äußere Produkt erhalten. Aus der Matrix $(I-C_1)$ streicht man eine Spalte und ersetzt diese Spalte sukzessive $s$-mal durch den $i$-ten Einheitsvektor, für $i=1,\ldots,s$ und berchnet die $s$ Determinanten, also die Komponenten des äußeren Produktes.

Interessant ist in diesem Zusammenhang der nachstehende Zusammenhang zwischen Rechtseigenvektoren und Begleitmatrix, siehe Schäfke/Schmidt (1973), S.94.

Bibliographisch: Schäfke, Friedrich Wilhelm (1922--2010), Schmidt, Dieter (*1941).

6. Satz: Sei $C_1$ die Begleitmatrix des monischen Polynoms $\rho$ des Grades $n$. Ist $0\ne\mu\in\mathbb{C}$ Nullstelle von $\rho(\mu)$ der genauen Ordnung $r$, so liefert die vektorwertige Funktion $w_0\colon\mathbb{C}\to\mathbb{C}^n$ definiert durch

$$ w_0(\mu) := \pmatrix{1\cr \mu\cr \vdots\cr \mu^{n-1}\cr} $$

mit

$$ w_i(\mu) := {1\over i!}w^{(i)}(\mu), \qquad i=0,1,\ldots,r-1 $$

ein System von Rechts-Jordanvektoren zum Eigenwert $\mu$ von $C_1$, für welches also gilt

$$ (C_1-\mu I)w_i = \cases{ 0, &für $i=1$,\cr w_{i-1}, &für $i=2,3,\ldots,r$.\cr } $$

Beweis: Man geht aus von der Identität $(\lambda I-C_1)=\rho(\lambda)e_n$, wobei $e_n=(0,\ldots,0,1)\in\mathbb{C}^n$. $i$-malige Differentiation liefert

$$ \left(\lambda I-C_1\right) w_0^{(i)}(\lambda) = \rho^{(i)}(\lambda) e_n - i w_0^{(i-1)}(\lambda) $$

Einsetzen von $\lambda=\mu$, für $i=0,1,\ldots,r-1$ ergibt mit $\rho^{(r)}(\mu)=0$ (algebraische Vielfachheit von $\mu$), daß die $r$ linear unabhängigen Vektoren $w_i$ Hauptvektoren zu $\mu$ von $C_1$ sind. ☐

Um nun eine gewisse Äquivalenz der Fehlerkonstanten zu zeigen, verfährt man wie nachstehend. Bei gewissen Einschränkungen an die Links- und Rechtseigenvektoren, kann man tatsächlich Gleichheit erzielen. Zumindestens Proportionalität ist stets gewährleistet.

7. Satz: Voraussetzungen: Es sei

$$ \rho(\mu) = I\mu^\kappa + A_{\kappa-1}\mu^{\kappa-1} + \cdots + A_1\mu + A_0 \in \mathbb{C}^{\ell\times\ell}, $$

($\ell{\buildrel\land\over=}$Stufenzahl), ferner sei $C_1\in\mathbb{C}^{\ell\kappa\times\ell\kappa}$ die erste Begleitmatrix zu $\rho(\mu)$, also

$$ C_1 = \pmatrix{ 0 & I & 0 & \ldots & 0\cr 0 & 0 & I & \ldots & 0\cr \vdots & \vdots & \vdots & \ddots & \vdots\cr & & & \ldots & I\cr -A_0 & -A_1 & & \ldots & -A_{\kappa-1}\cr } . $$

$v$ und $w$ seien beliebige aber feste Links- und Rechtseigenvektoren von $\rho(\mu)$ zu $\mu=1$ und

$$ \eqalignno{ v_c &:= v{\mskip 3mu}(A_1+A_2+\cdots+I, A_2+\cdots+I, \ldots, I) \in \mathbb{C}^{\kappa\ell}, \cr w_c &:= \pmatrix{I\cr \vdots\cr I\cr} w \in \mathbb{C}^{\kappa\ell}, \qquad I=I_\ell\in\mathbb{C}^{\ell\times\ell} . \cr } $$

$\gamma\in\mathbb{C}^\ell$ sei gänzlich beliebig und $\gamma_c = (0,\ldots,0,\gamma)^\top \in \mathbb{C}^{\kappa\ell}$, ($\gamma,\gamma_c{\buildrel\land\over=}$Fehlervektoren).

Behauptungen: (1) $v_c$ und $w_c$ sind Links- und Rechtseigenvektoren von $\rho_c(\mu) := I\mu - C_1$ zu $\mu=1$.

(2) Es gilt

$$ {v{\mskip 3mu}\gamma\over v{\mskip 3mu}\rho'(1){\mskip 3mu}w} = {v_c{\mskip 3mu}\gamma_c\over v_c{\mskip 3mu}w_c} = {v_c{\mskip 3mu}\gamma_c\over v_c{\mskip 3mu}\rho_c'(1){\mskip 3mu}w_c} . $$

Beweis: zu (1): Man sieht schnell, daß tatsächlich $v_c\rho_c(1)=0$ und $\rho_c(1)w_c=0$, mit

$$ \rho_c(1) = I - C_1 = \pmatrix{ I & -I & 0 & \ldots & 0\cr 0 & I & -I & \ldots & 0\cr \vdots & \vdots & \vdots & \ddots & \vdots\cr & & & \ldots & -I\cr A_0 & A_1 & & \ldots & I+A_{\kappa-1}\cr } . $$

zu (2): Gezeigt wird, daß Zähler und Nenner jeweils gleich sind. Für die Zähler ist dies unmittelbar klar. Für die Nenner rechnet man leicht nach, daß

$$ v_c{\mskip 3mu}\rho_c'(1){\mskip 3mu}w_c = v_c{\mskip 3mu}I_{\kappa\ell\times\kappa\ell}{\mskip 3mu}w_c = v_c{\mskip 3mu}w_c = v{\mskip 3mu}\rho'(1){\mskip 3mu}w . $$

☐

Die hier durchgeführten Überlegungen gelten sinngemäß in beliebigen, nicht notwendigerweise kommutativen Ringen. Hierzu ersetzt man $\mathbb{C}^\ell$ durch $\mathbb{R}$. Der obige Satz rechtfertigt in gewisser Hinsicht

8. Definition: Die Linearform $\gamma\mapsto v\gamma/v\rho'(1)w$ heißt Henrici-Linearform, mit $v$, $w$ wie oben. Insbesondere für einen Fehlervektor (spezieller Vektor des $\mathbb{C}^\ell$) heißt der Wert dann Henrici-Fehlerkonstante.

Selbstverständlich wird nicht behauptet, daß $v\gamma/v\rho'(1)w = v_c\gamma/v_c\rho_c'(1)w_c$ für beliebige Links- und Rechtseigenvektoren $v$, $w$, bzw. $v_c$, $w_c$. Dies erkennt man unmittelbar, falls man einen der Vektoren beliebig streckt oder staucht. Für den Zähler waren gewisse Unterraumeigenschaften von $\gamma_c$, nämlich $\gamma_c=(0,\ldots,0,*)^\top$ bedeutsam.

Die weiteren Verallgemeinerungen führen dann direkt zu den Begriffen der annullierten Dominanz und der Totalannullation. Um nun die Verbindung mit der klassischen Fehlerkonstante von Henrici weiter aufzuzeigen, sei auf den folgenden Sachverhalt hingewiesen.

Bei $m$-facher Wiederholung ein und desselben Verfahrens, multipliziert sich die oben angegebene Fehlerkonstante mit $m$. Dieses Ergebnis ergibt sich sofort, wenn man erkennt, daß

$$ (1,\ldots,1) \in \mathbb{C}^{1\times m} $$

Linkseigenvektor von $\rho(1)$ ist. Dann steht im Zähler $(1,\ldots,1){\mskip 3mu}\gamma$ und da jede Komponente von $\gamma$ natürlich gleich ist, erhält man sofort das verlangte Resultat, wenn man noch weiß, daß der Nenner natürlich bei welcher Dimension auch immer, gleich bleibt. Auch hier gelten wieder, w.o. schon bemerkt, diese Ergebnisse in beliebigen Ringen, nicht notwendig kommutativ.

Stabilitätsfunktionale und Semistabilitätsfunktionale

Tue, 18 Jun 2024 16:30:00 +0200

1. Semistabilitätsfunktionale in Matrixdarstellung
2. Bemerkungen zum Spijkerschen Stabilitätsfunktional

1. Semistabilitätsfunktionale in Matrixdarstellung

Mit Ausnahme der Booleschen Algebra wird keine Theorie in der Mathematik universeller benutzt als die lineare Algeba. Es gibt kaum eine Theorie, die elementarer ist, trotz der Tatsache, daß Generationen von Professoren und Lehrbuchautoren die Einfachheit dieser Theorie durch höchst unangebrachte Berechnungen mit Matrizen verdunkelt haben.

Jean Alexandre Dieudonné (1960)

Man kann beweisen, man vgl. z.B. das Buch Albrecht (1979), oder den Aufsatz von Skeel (1976), daß die Norm $|\mathbb{P}\delta|$ Stabilitätsfunktional ist für Verfahren der allgemeinen Form

$$ z_{n+1} = Sz_n + h\varphi_n, \qquad z_i\in\mathbb{R}^s,\quad\hbox{und}\quad S\in\mathbb{R}^{s\times s}. $$

Man hat also Fehlerabschätzungen der Art

$$ c_1\|\mathbb{P}\delta\| \le \|Z-z\| \le c_2\|\mathbb{P}\delta\|. % \qquad\forall n\in\mathbb{N}_0. $$

Hierbei ist $Z=(Z_1,\ldots,Z_n)$ der entsprechende Vektor der exakten Lösungen und $z=(z_1,\ldots,z_n)$, die durch obige Verfahrensvorschrift gewonnene Näherung hierfür.

Diese doppelseitige Abschätzung ist insoweit von besonderer Bedeutung, da sie sofort verständlich macht, daß die berechnete Näherung sich nicht beliebig weit von der exakten Lösung entfernen kann, wenn man die Größe $\delta$ “klein” hält. Wichtig ist natürlich, daß die linke Konstante $c_1$ nicht verschwinden darf, also $c_1\ne0$ und daß die rechte Konsrtante $c_2$ nicht zu “groß” ist, also $c_2<\infty$. Ferner ist zu berücksichtigen, daß beide Konstanten $c_1$ und $c_2$ nicht selber von der Größe $\delta$ abhängen.

Die obige Verfahrensvorschrift $z_{n+1}=Sz_n+h\varphi$ ist recht allgemein. Hier genügt vollkommen die Verfahrensvorschrift

$$ %=Sz_n+h\varphi +A_0z_n = h(B_LF_{n+1}+B_UF_n) \sum_{i=0}^\kappa A_iz_{n+i} = h\sum_{i=0}^\kappa B_iF_{n+i}, \qquad n=0,1,\ldots, $$

mit den Matrizen $A_i,B_i\in\mathbb{R}^{s\times s}$ und entsprechenden Vektoren $z_{n+i},F_{n+i}\in\mathbb{R}^s$.

Es ist sofort offensichtlich, daß die obige Verfahrensvorschrift in der Vorschrift $z_{n+1}=Sz_n+h\varphi_n$ natürlich enthalten ist. Selbstverständlich hängt die Steuerungsfunktion auch von der Zeit $t_n$ ab, möglicherweise auch noch von weiteren Größen. Alle diese einfliessenden Parameter seien in der Schreibweise unterdrückt.

Das allgemeinene Stabilitätsfunktional $\psi$, für welches gilt

$$ c_1\psi(\delta_n) \le \|Y_n-y_n\| \le c_2\psi(\delta_n), $$

muß nicht notwendig eine Norm sein.

Es müssen also nicht die Bedingungen der Definitheit und die der Homogenität und die Dreiecksungleichung erfüllt sein. Das Funktional $\psi$ hängt natürlich von zahlreichen Größen ab, ist also eine Funktion mehrerer Veränderlicher. Diese ganzen Abhängigkeiten werden aber in der weiteren Schreibweise nicht gesondert alle aufgeführt. Unter Berücksichtigung der Argumente hätte man zu schreiben

$$ \psi = \psi(n,h,A_i,\delta), $$

dabei hängen die Matrizen $A_i$ von den Koeffizienten $\alpha_{ij}$ des Verfahrens ab und der Vektor $\delta$ hängt ab von der Matrix $S$ und den Matrizen $B_i$.

Bei der folgenden diskreten Fassung der Kontrollgleichung

$$ z_{n+1}=Az_n+u_n $$

wird nun das Stabilitätsverhalten weiter betrachtet. Es ist hierbei $z_n$ der Vorzustand, $z_{n+1}$ der Folgezustand und $u_n$ die Steuerungsgröße. Von Interesse sei jetzt lediglich der innere Zustand, nicht jedoch der Ausgang des obigen Systems. Häufig ist der Ausgang von der Form $x_n=Bz_n$.

Man erhält jetzt nacheinander

$$ \eqalign { z_1 &= Az_0+u_0, \cr z_2 &= Az_1+u_1 = A^2z_0+Au_0+u_1, \cr z_3 &= Az_2+u_2 = A^3z_0+A^2u_0+Au_1+u_2, \cr \vdots & \qquad \qquad \vdots \qquad \qquad \qquad \qquad \ddots\cr %\noalign{\hbox to 8cm{\dotfill}} z_k &= Az_{k-1}+u_{k-1} = A^kz_0+A^{k-1}u_0+\cdots+Au_{k-2}+A^0u_{k-1}. } $$

Schreibt man dies in Matrix-Vektor Schreibweise auf, so erhält man

$$ \pmatrix{z_1\cr z_2\cr z_3\cr \vdots\cr z_k\cr} = %\left( \vcenter{\offinterlineskip\halign { % \strut$#$\quad & \vrule# && \quad$#$\cr \left( \begin{array}{c|ccccc} A && A^0 & & & & \cr A^2 && A^1 & A^0 & & 0 & \cr A^3 && A^2 & A^1 & A^0 & & \cr \vdots && \vdots & \vdots & \vdots & \ddots & \cr A^k && A^{k-1} & A^{k-2} & A^{k-3} & \ldots & A^0\cr %}} \end{array} \right) \pmatrix{z_0\cr u_0\cr u_1\cr \vdots\cr u_{k-1}\cr} = \pmatrix{A\cr A^2\cr A^3\cr \vdots\cr A^k\cr} z_0 + \mathbb{P} \pmatrix{u_0\cr u_1\cr \vdots\cr u_{k-1}\cr}, $$

wobei hier die Block-Toeplitz-Dreiecksmatrix $\mathbb{P}$ auftaucht, mit

$$ \mathbb{P} = \pmatrix{ A^0 & & & & \cr A^1 & A^0 & & 0 & \cr A^2 & A^1 & A^0 & & \cr \vdots & \vdots & \vdots & \ddots & \cr A^{k-1} & A^{k-2} & A^{k-3} & \ldots & A^0\cr } . $$

Otto Toeplitz (1881--1940).

Wichtig ist zu vermerken, daß diese Block-Dreicksmatrix $\mathbb{P}$ von der Iterationsstufe $k$ abhängt, insbesondere wird die Matrix dimensionsmässig größer, mit größer werdendem $k$; es ist $\mathbb{P}\in\mathbb{R}^{ks\times ks}$.

Man erkennt, wie die alten Steuerungen $u_0,u_1,\ldots,u_{k-1}$ nachwirken, nämlich in Matrixpotenzen

$$ A^k, A^{k-1},\ldots, A, A^0. $$

Die Überlegungen gelten sinngemäß, wenn man die Matrix $A$ selber abhängig vom Index $n$ hält. Dies heißt also, daß sich die Systemzustandsüberführung jedesmal ändern kann. Man hat also die Kontrollgleichung

$$ z_{n+1} = A_nz_n+u_n. $$

Hier erhält man dann ganz genauso wie oben, der Reihe nach ausgehend vom Anfangszustand $z_0$:

$$ \eqalign{ z_1 &= A_0z_0+u_0, \cr z_2 &= A_1A_0z_0+A_1u_0+u_1, \cr z_3 &= A_2A_1A_0z_0+A_2A_1u_0+A_2u_1+u_2, \cr \vdots & \qquad \vdots \qquad \qquad \qquad \ddots \cr %\noalign{\hbox to 10cm{\dotfill}} z_k &= A_{k-1}z_{k-1}+u_{k-1} = A_{k-1}\cdots A_0z_0+A_{k-1}\cdots A_1u_0 +\ldots+A_{k-1}u_{k-2}+u_{k-1}.\cr } $$

Wiederum in Matrix-Vektor Schreibweise ergibt dies

$$ \pmatrix{z_1\cr z_2\cr z_3\cr \vdots\cr z_k\cr} = \left ( \begin{array}{c|ccccc} A_0 && I & & & & \cr A_1A_0 && A_1 & I & & 0 & \cr A_2A_1A_0 && A_2A_1 & A_2 & I & & \cr \vdots && \vdots & \vdots & \vdots & \ddots & \cr A_{k-1}\cdots A_0 && A_{k-1}\cdots A_1 & A_{k-1}\cdots A_2 & A_{k-1}\cdots A_3 & \ldots & I\cr \end{array} \right ) \pmatrix{z_0\cr u_0\cr u_1\cr \vdots\cr u_{k-1}\cr}. $$

Auch hier erkennt man den Einfuß vergangener Steuerungen $u_0,u_1,\ldots,u_{k-1}$ auf den neuen Zustand $z_k$, nämlich nun als Matrizenprodukt (im Gegensatz zu den Matrixpotenzen) zu

$$ (A_{k-1}\cdots A_0), (A_{k-1}\cdots A_1), \ldots, (A_{k-1}), I. $$

Interpretiert man jetzt die Steuerungen $u_i$ als Störungen $\delta_i$ des schon oben angegebenen Verfahrens $z_{n+1}=Az_n+h\varphi$, untersucht man also die veränderte Steuerungsgleichung

$$ \tilde z_{n+1} = A\tilde z_n + h\tilde\varphi_n + \delta_n, $$

so erkennt man, wie sich diese Störungen aufsammeln und “aufaddieren”. Entscheidend ist hier ist also wieder die Block-Toeplitz-Dreiecksmatrix $\mathbb{P}$, mit

$$ \mathbb{P} = \pmatrix{ A^0 & & & & \cr A^1 & A^0 & & 0 & \cr A^2 & A^1 & A^0 & & \cr \vdots & \vdots & \vdots & \ddots & \cr A^{k-1} & A^{k-2} & A^{k-3} & \ldots & A^0\cr } , $$

bzw. für den allgemeineren Falle hat die Block-Dreiecksmatrix $\mathbb{P}$ die Gestalt, nicht notwendig eine Toeplitz-Matrix,

$$ \mathbb{P} = \pmatrix{ I & & & & \cr A_1 & I & & 0 & \cr A_2A_1 & A_2 & I & & \cr \vdots & \vdots & \vdots & \ddots & \cr A_{k-1}\cdots A_1 & A_{k-1}\cdots A_2 & A_{k-1}\cdots A_3 & \ldots & I\cr }, $$

welche beide von der Iterationsstufe $k$ abhängig sind, also wie oben $\mathbb{P}\in\mathbb{R}^{ks\times ks}$. Die Matrix $\mathbb{P}$ ist hier offensichtlich wegen $-Az_n+z_{n+1}={}*$, ($n=0,1,\ldots$) die Inverse der Matrix

$$ \mathbb{P}^{-1} = \pmatrix{ I & & &\llap0\cr -A & I & & \cr & -A & I & \cr &\ddots&\ddots& \cr 0 & & -A & I\cr} \qquad\hbox{bzw.}\qquad \mathbb{P}^{-1} = \pmatrix{ I & & &\llap0\cr -A_1 & I & & \cr & -A_2 & I & \cr &\ddots &\ddots & \cr 0 & & -A_{k-1} & I\cr} $$

Die Sammelwirkung der Steuerungen, bzw. der Störungen, hängt nun ab von $\mathbb{P}\delta$, mit

$$ \delta = \pmatrix{\delta_0\cr \vdots\cr \delta_{k-1}\cr}. $$

Würde man das lineare und homogene Gleichungsystems $\mathbb{P}\delta=0$ betrachten und nach den $\delta_i$ auflösen, so erhielte man das Ergebnis, daß die $\delta_0,\ldots,\delta_{k-1}$ gerade die Jordan-Kette der Länge $k$ ist, bzgl. $\lambda_0$ für das Matrixpolynom $L(\lambda)$, wenn man von der Feinheit absieht, daß man u.U. die ersten $i$-Nullvektoren $\delta_0,\ldots,\delta_{i-1}$ wegstreicht. Bibliographisch: Keldysh, M.V., Jordan, Camille (1838--1921)

Hierbei ist das Matrixpolynom $L(\lambda)$ gegeben durch

$$ L(\lambda) = \sum_{i=0}^{k-1} A^i(\lambda-\lambda_0)^i , % = A^{k-1}(\lambda-\lambda_0)^{k-1}+\ldots+A(\lambda-\lambda_0)+I, $$

man siehe hierzu Gohberg/Lancaster/Rodman (1982). Autoren sind Gohberg, Izrael' TSudikovich, Lancaster, Peter und Rodman, Leiba.

Für den allgemeineren Fall, daß man in jedem Zustand eine neue Matrix $A_n$ betrachtet, also $z_{n+1}=A_nz_n+u_n$, ergibt sich das Matrixplynom $L(\lambda)$ zu

$$ L(\lambda) = \sum_{i=0}^{k-1}{\mskip 3mu} (\prod_{j=0}^i A_j) \cdot(\lambda-\lambda_0)^i. $$

Man sieht sofort, daß für das Spektrum $\sigma(\mathbb{P})$ stets gilt, daß $\sigma(\mathbb{P})=\{1\}$ und zwar unabhängig von den Matrizen $A_0,\ldots,A_k$.

Insbesondere ist die Block-Dreiecksmatrix $\mathbb{P}$ invertierbar und somit ist $\left\Vert\mathbb{P}{}\cdot{}\right\Vert$, für festes $k$, eine Norm, da ganz allgemein für jede Vektornorm gilt, daß mit $\left\Vert\cdot\right\Vert$ auch für eine beliebige invertierbare Matrix $P$ dann ebenfalls $\left\Vert Px\right\Vert$ eine Norm ist. Dabei geht die Invertierbarkeit für die Definitheit ein und die Linearität wird für die Homogenität und die Dreiecksungleichung benötigt.

Das weitergehende Resultat, daß dann für die zugehörige Matrixnorm $\left\Vert A\right\Vert$ entsprechend $\left\Vert PAP^{-1}\right\Vert$ die zugehörige Matrixnorm zu $\left\Vert Px\right\Vert$ ist, kann man leicht beweisen. Dennoch wird dieses Ergebnis hier nicht weiter verwendet. Somit hat man ohne Mühe die Aussage erhalten, daß das Stabilitätsfunktional $\psi(\delta)=\Vert\mathbb{P}\delta\Vert$ tatsächlich eine Norm ist.

Will man nun zu einer Abschätzung für $\Vert z_{n+1}-\tilde z_{n+1}\Vert$ gelangen und beachtet man, daß man ja eine explizite Darstellung der Lösungen hat, so ergibt sich zunächst für $z_{n+1}=Az_n+u_n$ die Darstellung

$$ Z_{n+1} = \pmatrix{A\cr \vdots\cr A^{n+1}\cr} z_0 + \mathbb{P}\pmatrix{u_0\cr \vdots\cr u_n\cr} =: Tz_0+\mathbb{P} u. $$

Für das gestörte System $\tilde z_{n+1} = A\tilde z_n+v_n$ mit den “veränderten” Steuerungen $v_i$ erhält man die Darstellung

$$ \tilde Z_{n+1} = \pmatrix{A\cr \vdots\cr A^{n+1}\cr} \tilde z_0 + \mathbb{P}\pmatrix{v_0\cr \vdots\cr v_n\cr} =: T\tilde z_0+\mathbb{P} v. $$

Hier sind wieder die einzelnen Vektoren $z_i$, bzw. die $\tilde z_i$, zu einem größerem Vektor $Z_{n+1}$, bzw. $\tilde Z_{n+1}$, zusammengefaßt. Es ist also

$$ Z_{n+1} = \pmatrix{z_1\cr \vdots\cr z_{n+1}\cr} \qquad\hbox{und}\qquad \tilde Z_{n+1} = \pmatrix{\tilde z_1\cr \vdots\cr \tilde z_{n+1}\cr}. $$

Die Differenz der beiden oben angegebenen Darstellungen führt nun direkt auf

$$ \|Z_{n+1}-\tilde Z_{n+1}\| = \left\|T(z-\tilde z_0)+\mathbb{P}(u-v)\right\| \le \left\|T\right\|{\mskip 3mu} \left\|z_0-\tilde z_0\right\| + \left\|\mathbb{P}\right\|{\mskip 3mu} \left\|u-v\right\|. $$

Da die Matrizen $T$ und $\mathbb{P}$ von der Iterationsstufe $k$ abhängen, sind Einschränkungen an die Komponenten dieser beiden Matrizen zu stellen. Es werde jetzt an die Matrixpotenz $A^i$ oder an die Produkte $A_k\cdots A_{k-i}$ die Forderung gestellt, daß ihre Normen, für alle $i$ und alle $k$ beschränkt seien. Es solle also gelten, daß

$$ \|A^i\| \le \hbox{const}, \quad\forall i\in\mathbb{N}_0; $$

oder allgemeiner

$$ \left\|A_k\cdots A_{k-i}\right\| \le \hbox{const}, \quad\forall i\lt k,\forall k. $$

Im Lichte der obigen Bauart der oben angegebenen Block-Dreiecksmatrix $\mathbb{P}$, sind diese beiden Forderungen sofort offenkundig sinnvolle Einschränkungen, da die obigen Matrixpotenzen, bzw. Matrixprodukte, die Komponenten der Block-Dreiecksmatrix $\mathbb{P}$ ausmachen. Die erste Bedingung führt dann sofort auf die entsprechende Bedingung an die Eigenwerte der Matrix $A$. Die zweite Bedingung ist diffiziler.

Es zeigt sich nun, daß diese beiden Forderungen genügen, sodaß auch die Normen von $\mathbb{P}$ und $T$ trotz größer werdendem $k$, nicht zu stark wachsen. Man beachte strikt, daß sich die Normen ändern, mit größer werdendem $k$. Die sonst recht triviale Aussage, daß die Norm einer festen, beliebigen Matrix stets beschränkt ist, gilt hier nicht.

Vielmehr gilt: $\left|\mathbb{P}\right|={\cal O}(k)$, oder in anderer Formulierung, es ist

$$ {1\over k}\left\|\mathbb{P}\right\| \le \hbox{const}, \qquad\forall k\in\mathbb{N}_0. $$

Der Nachweis werde nur für die Maximumnorm $\left|\cdot\right|_\infty$ geführt. Exakterweise müßte man natürlich stets Supremumsnorm notieren, dennoch sei diese Feinheit von jetzt ab nicht näher beachtet. Da $|A^j|\le c$, $\forall j\in\mathbb{N}_0$ ergibt sich

$$ \left\|\mathbb{P}\right\|_\infty \le\max_{i=0}^{k-1}{\mskip 3mu} \sum_{j=0}^{k-1} c = kc = {\cal O}(k). $$

Für die 1-Norm $\left\Vert\cdot\right\Vert_1$ ergibt sich dieses Resultat ganz analog. Dies hängt mit der speziellen Gestalt der Matrix $\mathbb{P}$ zusammen. Für den allgemeinen Falle verlaufen die Überlegungen ähnlich.

Die Tatsache, daß $\left\Vert T\right\Vert = {\cal O}(k)$, ist sofort offenkundig für sowohl die Maximumnorm $\left\Vert\cdot\right\Vert_\infty$, als auch für die 1-Norm $\left\Vert\cdot\right\Vert_1$.

Betrachtet man jetzt wieder die beiden Verfahren $z_{n+1}=Sz_n+h\varphi_n$ und $\tilde z_{n+1}=S\tilde z_n+h\tilde\varphi_n+\delta_n$, so erhält man in üblicher vektorieller Schreibweise für die $\varphi_i$, $\delta_i$ und $z_i$ sofort

$$ Z_{n+1}-\tilde Z_{n+1} = T(z_0-\tilde z_0) + h\mathbb{P}(\varphi-\tilde\varphi) - \mathbb{P}\delta $$

und dann mit der Standardabschätzung

$$ \|Z_{n+1}-\tilde Z_{n+1}\| \le \left\|T\right\| {\mskip 3mu} \left\|z_0-\tilde z_0\right\| + \left|h\right| {\mskip 3mu} \left\|\mathbb{P}\right\| {\mskip 3mu} \left\|\varphi-\tilde\varphi\right\| + \left\|\mathbb{P}\right\| {\mskip 3mu} \left\|\delta\right\|. \tag{*} $$

Setzt man jetzt von den Funktionen $\varphi_n$ nur deren Beschränktheit voraus, und damit für $\varphi$, so erhält man sofort das Ergebnis, daß die beiden Zustände sich nicht beliebig weit voneinander entfernen können, wenn nur $k \left|\delta\right| \le \hbox{const}$ und

$$ \left\|z_0-\tilde z_0\right\| = {\cal O}(\frac{1}{k}). $$

Diese beiden Bedingungen sind auch tatsächlich häufig gegeben. Die Beschränktheit des mittleren Summanden in der obigen Abschätzung $(*)$ ist wegen des Vorfaktors von $h$ offensichtlich, da dieser dann das ${\cal O}(k)$-Wachstum der Norm $\left|\mathbb{P}\right|$ auffängt. Die Beschränktheit von der Funktion $\varphi$ ist z.B. dann gegeben, wenn man weiß, daß diese Funktion Lipschitz-stetig ist. Auf einer kompakten Definitionsmenge — sagen wir $[a,b]\times K\times J\subset\!\subset\mathbb{R}^{2s+1}$ — folgt dann sofort die Beschränktheit von $\varphi$.

Bei einer genaueren Untersuchungen muß man natürlich die gestörte Gleichung $\tilde z_{n+1}=Az_n+\tilde h\tilde\varphi_n+\delta_n$ betrachten, da sich bei einer Störung natürlich auch die Schrittweitenfolge $h_0,h_1,\ldots{\mskip 3mu}$ ändert. Das Ausklammern der Schrittweite $h$, setzt natürlich gleiche Schrittweiten beider Verfahren voraus. Man kann dies in die Funktion $\tilde\varphi$ versuchen hinein zu verlagern. Die dann auftretenden Abschätzungen verlangen dann etwas mehr Sorgfalt. Man beachte, daß hier nur Beschränktheit von $|Z_{n+1}-\tilde Z_{n+1}|$ folgt, nicht jedoch erhält man mit der Standardabschätzung wie oben, das weitergehende Resultat, daß die Normdifferenz $|Z_{n+1}-\tilde Z_{n+1}|$ kleiner wird, wenn man $\delta$ normmässig genügend heftig verkleinert.

Man beachte, daß hier nur ein Semistabilitätsfunktional vorliegt mit der obigen Abschätzung $(*)$. Das Stabilitätsfunktional $\psi(\delta) = \left|\mathbb{P}\delta\right|$ geht hier additiv ein.

Bei einer Abschätzung, wie sie z.B. bei Skeel (1976), oder in dem Buche von Albrecht (1979), und auch in dem Buche von Hairer/Wanner/Nørsett (1987) beschrieben wird, geht dieses Funktional direkt multiplikativ in die Abschätzung der Form $(*)$ ein. Man erhält dann natürlich weitergehende Resultate. Allerdings wachsen die Faktoren vor dem Funktional exponentiell mit der Länge des Integrationsintervalles und ebenso exponentiell in der Lipschitzkonstanten. Insbesondere läßt sich auch der Abstand zweier Zustände verkleinern, falls man die Störung hinreichend stark verkleinert.

Bibliographisch: Hairer, Ernst (*1949), Wanner, Gerhard (*1942), Nørsett, Syvert Paul.

2. Bemerkungen zum Spijkerschen Stabilitätsfunktional

Nebenläufig sei auf die völlige Analogie der Lösungen von diskreter und kontinuierlicher Zustandsgleichung hingewiesen.

Das lineare und inhomogene Differentialgleichungs-Anfangswertproblem

$$ \dot x = A(t)x+u(t), \qquad x(t_0)=x_0, $$

hat die eindeutig bestimmte Lösung

$$ x(t) = H(t)x_0+\int_{t_0}^t H(t)H^{-1}(\tau)u(\tau)d\tau, $$

wobei $H=H(t)$ das (eindeutig bestimmte) Fundamentalsystem der homogenen Gleichung $\dot x=A(t)x$ ist, mit $H(t_0)=I$. Die Spezialisierung auf die lineare und inhomogene Anfangswertaufgabe mit konstanten Koeffizienten

$$ \dot x=Ax+u, \qquad x(t_0)=x_0, $$

hat demnach die (eindeutig bestimmte) Lösung, die sogar auf der gesamten reellen Achse existiert, falls die Inhomogenität $u(t)$ ebenso existiert,

$$ x(t) = e^{A(t-t_0)}x_0 + \int_{t_0}^t e^{A(t-\tau)}u(\tau)d\tau. $$

Für die diskrete Gleichung $z_{n+1}=Az_n+u_n$ erhält man nach Vorgabe des Anfangszustandes $z_0$ die eindeutig bestimmte Lösung

$$ z_n = A^nz_0+\sum_{\nu=0}^{n-1} A^{n-1-\nu}u_\nu. $$

Zwischen den beiden Problemen $\dot x=Ax+Bz$ und $x_{k+1}=Sx_k+Rz_k$ kann man durch den Homomorphismus

$$ S=e^A,\qquad R=\left(\int_0^1 e^{A\tau}d\tau\right)B $$

stets vermitteln.

Eine weitere Analogie hat man wie folgt. Gilt

$$ % siehe Knuth, Seite 136, der an alles gedacht hat \def\skpty{\skew6\dot{\tilde y}} % Ableitung von y tilde \eqalign{ \left|y(t_0) - \tilde y(t_0)\right| &\le\rho,\cr \left|\skpty(t) - f(y(t))\right| &\le\delta(t),\cr |f(\tilde y(t)) - f(y(t))| &\le\ell(t) \left|\tilde y(t) - y(t)\right|,\cr } $$

so erhält man die Abschätzung

$$ |\tilde y(t) - y(t)| \le e^{L(t)}{\mskip 3mu}\biggl(\rho + \int_{t_0}^t e^{-L(s)}{\mskip 3mu}\delta(s) ds\biggr), $$

mit

$$ L(s) = \int_{t_0}^t \ell(\tau) d\tau. $$

Der notationellen Einfachheit halber sei angenommen, daß $t\ge t_0$ ist — dies erspart Betragszeichen.

Spezialisiert man auf konstante $\delta(t)$ und $\ell(t)$, also $\delta(t)\equiv\delta$ und $\ell(t)\equiv L$, so erhält man die bekannte Abschätzung

$$ |\tilde y(t) - y(t)| \le \bigl(\rho + (t-t_0)\delta\bigr) e^{L\left|t-t_0\right|}, $$

welche eine Aussage darüber macht, wie verschiedene Anfangswerte zu ein und derselben Differentialgleichung zum Auseinanderlaufen der dazugehörigen Lösungen führen können. Im ungünstigsten Falle muß man mit exponentiellen Wachstum rechnen; die Ungleichung ist scharf.

Spezialisiert man lediglich $\ell(t)\equiv L$, so ergibt sich

$$ |\tilde y(t) - y(t)| \le \biggl(\rho + \int_{t_0}^t \delta(\tau) d\tau \biggr) e^{L|t-t_0|}. $$

Die letzte Abschätzung weist schon formal auf den engen Zusammenhang zum Spijkerschen Stabilitätsfunktional hin. Direkter wird dieser Zusammenhang im Falle der folgenden Überlegungen.

Hat man

$$ \eqalign{ \skpty &= f(\tilde y) + d_2(t),\cr \tilde y(t_0) &= y_0 + d_1,\cr } $$

mit den beiden Defekten $d_1, d_2(t)\in\mathbb{R}$, so erhält man

$$ |\tilde y(t) - y(t)| \le e^{L\left|t-t_0\right|}{\mskip 3mu} \max_{\tau\in[t_0,t]}{\mskip 3mu} \biggl|d_2 + \int_{t_0}^\tau d_1(s) ds\biggr|. $$

Man beachte, daß sich die letzte Defektabschätzung nicht durch Spezialisierung aus der obigen allgemeinen Abschätzung herleiten lässt. Dennoch sind die Beweise für beide Aussagen natürlich ähnlich. Ebenso ist gut zwischen $d_2(t)$ aus dem Banachraum $\mathbb{R}$ und der nicht-negativen skalaren Größe $\delta(t)$ zu unterscheiden; Banach, Stefan (1892--1945). Entsprechend sind die Integrale zu verstehen.

Das Spijkersche Stabilitätsfunktional lautet hier

$$ \psi_{\hbox{Sp}}(\delta) = \max_{n=0}^N{\mskip 3mu} \biggl|\sum_{i=0}^n \delta_i\biggr|, $$

in Abweichung der Notation von Albrecht (1979), wegen der veränderten Schreibweise der $\delta_i$. Die zum Stabilitätsfunktional gehörende Matrix $\mathbb{P}$ ist natürlich

$$ \mathbb{P} = \pmatrix{ I & & & & \cr I & I & & 0 & \cr I & I & I & & \cr \vdots & \vdots & \vdots & \ddots & \cr I & I & I & \ldots & I\cr } . $$

Zu den Abschätzungen vergleiche man die Bücher von Schäfke/Schmidt (1973) und Hairer/Wanner/Nørsett (1987). Dort findet man auch Hinweise auf weiterführende Literatur und schwächere Voraussetzungen bei den Behauptungen.

Bibliographisch: Schäfke, Friedrich Wilhelm (1922--2010), Schmidt, Dieter (*1941).

Divergenz der Korrektoriteration: Theorie und Experimente

Mon, 17 Jun 2024 20:00:00 +0200

1. Das modifizierte Newton-Verfahren und Spezialisierungen
2. Die Divergenzsätze von Hughes Hallett
3. Die Experimente von Byrne/Hindmarsh/Jackson/Brown

Einer der ganz zentralen Bestandteile eines Programmes, basierend auf Verfahren mit impliziten Stufen, ist die Auflösung der Implizitheit durch ein Iterationsverfahren zur Lösung von Gleichungssystemen. Diese Gleichungssysteme sind i.d.R. nichtlinear. Bei steifen Differentialgleichungen und bei linearen Mehrschrittformeln, als auch bei impliziten Runge-Kutta-Verfahren, ist der Aufwand, der hierfür getrieben wird, enorm und stellt den maßgeblichen Anteil an der Gesamtrechenzeit dar. Man ist jedoch nicht so sehr an der exakten Lösung der zahlreich auftauchenden nichtlinearen Gleichungssysteme interessiert, sondern an der zügigen Integration der Differentialgleichung. Die Gleichungssysteme sind hier nur ein Weg dorthin. Es stellt sich nun heraus, daß das zur Anwendung gelangende Newton-Kantorovich Iterationsverfahren unter gewissen Umständen nicht immer konvergiert. Diese Divergenz bleibt häufig unbemerkt und äußert sich lediglich in größeren globalen Fehlern. Da die Schaltentscheidung im Programm TENDLER, wie auch in anderen Programmen, während der Korrektoriteration gefällt wird, ist diesem Punkte in schaltfähigen Programmen besondere Aufmerksamkeit zu widmen.

Zuerst wird das Newton Iterationsverfahren in Beziehung gesetzt zu bekannten Iterationsverfahren für lineare Gleichungssysteme und umgekehrt. Anschliessend erlauben die Ergebnisse von Hughes Hallett (1984) eine einfache Antwort darauf, ob überhaupt gewisse Modifikationen des Newton Iterationsverfahren konvergieren können. Die Erfahrungen von Byrne/Hindmarsh/Jackson/Brown (1977) zeigen anhand praktischer Beispiele, daß die Korrektoriteration tatsächlich wesentlichen Einfluß auch auf die Genauigkeit des Verfahrens haben und nicht nur auf die Effizienz.

1. Das modifizierte Newton-Verfahren und Spezialisierungen

Hier seien kurz die wichtigsten Iterationsarten zur Lösung linearer Gleichungsysteme angeschrieben. Später, bei einem Erklärungsversuch für auffallende Verhalten der Korrektoriterationen, werden die Untersuchungen wichtig werden. Bei der Erklärung des sehr großen globalen Fehlers bei den beiden Programmen GEAR und EPISODE, bei der Wahl einer Diagonalapproximation der Jacobimatrix, sind die Divergenzsätze von Hughes Hallett (1984) ein möglicher Ansatzpunkt. Eine Charakteristik der Erklärung ist, daß zwischen den beiden möglichen Interpretationen, modifizierten Newton-Verfahren und Iterationsverfahren für lineare Gleichungssysteme, hin und her gewechselt wird.

Das gedämpfte modifizierte Newton-Verfahren lautet

$$ x^{\nu+1} = x^\nu - \omega W^{-1}\varphi(x^\nu,x^{(\nu+1)}), \qquad\nu=1,2,\ldots $$

zur Lösung des Nullstellenproblems $f(x)=0$, $x={\mskip 5mu}?$ Hierbei ist die Matrix $W$ in der Regel eine Näherung für die Jacobimatrix $J=f'(x^\nu)$. Insbesondere hängt die Matrix $W$ häufig auch von den vorhergehenden Iterierten $x^1, x^2, \ldots{\mskip 3mu}$ ab, also $W=W(x^1,\ldots,x^\nu)$. Dies muß jedoch nicht unbedingt so sein. Um die Schreibweise handlich und übersichtlich zu halten, seien im weiteren gewisse Abhängigkeiten in der Schreibweise nicht gesondert erwähnt. Den Wert $f(x^\nu)$ bezeichnet man als das Residuum und die Differenz zweier aufeinander folgender Iterierten $x^{\nu+1}-x^\nu$ wird Pseudoresiduum genannt.

Die beiden Begriffe Residuum und Pseudoresiduum werden auch für die Beträge, bzw. Normen dieser Größen benutzt, also $|f(x^\nu)|$ und $|x^{\nu+1}-x^\nu|$. Wird also von genügend kleinem Residuum gesprochen, so ist damit natürlich die entsprechende Norm gemeint.

Die beiden wichtigsten Spezialfälle dieser recht allgemeinen Iterationsklasse sollen jetzt kurz angegeben werden. Die Auffassung als Spezialisierung muß nicht immer vorgenommen werden. In anderem Zusammenhang kann es durchaus günstiger sein, anders vorzugehen.

Es seien allgemein Zerlegungen der invertierbar vorausgesetzten Matrix $A$ betrachtet: $A=L+D+R$, mit Subdiagonalmatrix $L$, Diagonalmatrix $D$ und Superdiagonalmatrix $R$. Voraussetzung sei weiter, daß keine der Diagonalelemente der Diagonalmatrix $D$ verschwindet, also $\det D\ne0$. Eine andere Zerlegung, allgemeinere Zerlegung der Matrix $A$, sei $A=M-N$, mit invertierbarer Matrix $M$. Gegeben sei das Nullstellenproblem $Ax-b=0$, $x={\mskip 5mu}?$

Zwei typische Vertreter für Iterationsarten zur Lösung obiger Gleichung sind nun die Jacobi-Überrelaxation (JOR-Verfahren) und die sukzessive Überrelaxation (SOR-Verfahren).

Die erstere der beiden Iterationen findet sich auch unter anderen Benennungen.

Die Iterationsvorschrift für das JOR-Verfahren lautet nun

$$ x^{\nu+1} = x^\nu - \omega D^{-1}(Ax^\nu - b), \qquad\nu=1,2,\ldots $$

mit der Iterationsmatrix $J_\omega = D^{-1}(D-\omega A) = I - \omega D^{-1}A$. Hierbei ist $J_\omega$ die Matrix, die bei der Schreibweise auftritt

$$ x^{\nu+1} = J_\omega x^\nu + \omega D^{-1}b. $$

Die weitere wichtige Spezialisierung für den Dämpfungs- bzw.
Verstärkungsfaktor $\omega$, nämlich $\omega=1$, liefert die gewöhnliche [Jacobi Iteration]. Jacobi, Carl Gustav (1804--1851).

Die Iterationsvorschrift für das SOR-Verfahren lautet

$$ x^{\nu+1} = x^\nu - \omega D^{-1}\left(Lx^{\nu+1} + (D+R)x^\nu - b\right), \qquad\nu=1,2,\ldots $$

mit der Iterationsmatrix

$$ R_\omega = (D+\omega L)^{-1}\bigl((1-\omega)D-\omega R\bigr) = (I+\omega D^{-1}L)^{-1}\bigl((1-\omega)I-\omega D^{-1}R\bigr). $$

Die weitere Spezialisierung $\omega=1$, liefert das Gauß-Seidel Iterationsverfahren. Gauß, Carl Friedrich (1777--1855), von Seidel, Philipp Ludwig (1821--1896).

Nebenläufig sei erwähnt, daß die obige Form auch diejenige Gestalt hat, die sich für die Programmierung am besten eignet, wobei man die Vorkonditionierung mit der Inversen der Diagonalmatrix $D$ natürlich am Anfang der Rechnung und nur einmal durchführt. Wichtig ist zu vermerken, daß man bei der obigen Iterationsvorschrift für die einzelnen Komponenten von unteren Indices zu höheren Indices durchläuft, also $x_i^\nu, i=1,\ldots,n$. Die umgekehrte Reihenfolge für den Durchlaufsinn der Indices ist natürlich ebenso möglich; hier vertauschen sich dann lediglich die Rollen der Matrizen $L$ und $R$.

Auch noch auf eine andere Art und Weise läßt sich das Gauß-Seidel Iterationsverfahren als modifiziertes Newton-Kantorovich Verfahren interpretieren. Mit der Zerlegung

$$ (L+D)x^{\nu+1} = b - Rx^\nu $$

erhält man

$$ \eqalign{ x^{\nu+1} &= (L+D)^{-1}(b-Rx^\nu) = x^\nu - (L+D)^{-1}\bigl((L+D)x^\nu + Rx^\nu - b\bigr)\cr &= x^\nu - (L+D)^{-1}(Ax^\nu - b).\cr } $$

Die Iterationsmatrix $W=L+D$ läßt hier jetzt die Interpretationen zu als Näherung für die Jacobimatrix $J$ der Funktion $f$, mit $f(x)=Ax-b$. Hier ersieht man die Asymmetrie die zutage tritt. Der Subdiagonalgehalt der Matrix $J$ wird hier vollständig gewertet, während hingegen der Superdiagonalgehalt, also hier $R$, gar nicht auftritt. Dies ist nicht verwunderlich an sich, soll jedoch vermerkt werden. Bei anderer Reihenfolge des Durchlaufens der Komponenten $x_i^{\nu+1}$ des Vektors $x^{\nu+1}$, hat man natürlich hier $W=D+R$, wie schon oben erwähnt.

Die letzte Bemerkung gilt selbstverständlich ganz allgemein. Mit der oben angegebenen Zerlegung $A=M-N$ im Kopfe, ist

$$ x = x - M^{-1}(Ax-b) = x - M^{-1}\bigl((M-N)x - b\bigr) = M^{-1}Nx + M^{-1}b, $$

also $Mx=Nx+b$. Für die Zerlegungsmatrizen $M$ und $N$ der wichtigsten Iterationsarten erhält man

Verfahren	$M$-Matrix	$N$-Matrix
JOR	$M={1\over\omega}D$	$N={1\over\omega}D-A$
SOR	$M=L+{1\over\omega}D$	$N={1\over\omega}\bigl(-\omega R+(1-\omega)D\bigr)$

Sowohl das JOR-Verfahren als auch das SOR-Verfahren lassen sich als konvergenzbeschleunigte Verfahren auffassen. In diesen beiden Fällen wird nur die einfachst denkbare Konvergenzbeschleunigung aus zwei Iterierten gewählt, man erhält also lediglich das gewichtete Zweiermitel der beiden letzten Iterierten, somit

$$ x^{\nu+1}\gets x^\nu + \omega(x^{\nu+1} - x^\nu) = \omega x^{\nu+1} + (1-\omega)x^\nu. $$

Dies ist natürlich nichts anderes als das Verfahren von Euler-Knopp. Euler, Leonhard (1707--1783), Knopp, Konrad Hermann Theodor (1882--1957).

Allgemein arbeitet man mit der unendlichen Block-Dreiecksmatrix

$$ \mathbb{P} = \pmatrix{ B_{11} & & & 0\cr B_{21} & B_{22}\cr B_{31} & B_{32} & B_{33}\cr \vdots & \vdots & \vdots & \ddots\cr }, $$

wobei die Matrizen $B_{ij}\in\mathbb{R}^{n\times n}$ sind und $B_{ij}=\bf 0$, falls $j>i$. Blockimplizite Verfahren werden hier nicht weiter betrachtet. Natürlich summieren sich die Zeilensummen von $\mathbb{P}$ stets auf eins auf, man hat also das Erfülltsein der Konsistenzbedingung

$$ \sum_{j=1}^i B_{ij} = I, \qquad\hbox{für alle } i=1,2,\ldots $$

Man rechnet dann $\mathbb{P}\bf x$, mit $x=(x^0, x^1,\ldots)$.

Diese Art der Konvergenzbeschleunigung hat nun eine Reihe von Bezeichnungen. Im folgenden werde der Bezeichnungsweise von Albrecht/Klein (1984) gefolgt. Autoren Prof. Dr. Peter Albrecht und Prof. Dr. Marcelo Pinheiro Klein. Die zuletzt angeführte allgemeine Art der linearen Konvergenzbeschleunigung heiße $\mathbb{P}$-Extrapolation. Ist $G$ die grundlegende Iterationsmatrix, also $x^{\nu+1}=Gx^\nu+c$, so heiße das beschleunigte Verfahren

$$ x^{\nu+1}\gets Bx^{\nu+1} + (I-B)x^\nu = \bigl(BG+(I-B)\bigr)x^\nu + Bc $$

entsprechend $B$-Extrapolation. Reduziert sich die Matrix $B\in\mathbb{R}^{n\times n}$ zu einem Skalar $\gamma\in\mathbb{R}$, so werde einfach von $\gamma$-Extrapolation gesprochen.

Wenn hier von beschleunigt gesprochen wird, so deutet dies lediglich die Veränderung der Iterationsvorschrift an, nicht jedoch ist damit automatisch gesagt, daß das neue Verfahren auch tatsächlich bei jeder Wahl der Extrapolationsparameter wirklich schneller konvergiert. Dies ist gerade eine der Aufgaben, nicht die einzige, durch geeignete Wahl der Parameter dies zu erzielen. Es ist durchaus zweckmässig und auch sinnvoll auf schnellere Konvergenz zu verzichten, dafür aber ein größeres Konvergenzgebiet zu erzielen und damit für eine wesentlich größere Klasse von Problemen die Garantie zu erhalten, daß das verwendete Iterationsverfahren überhaupt konvergiert. Man vgl. zu diesen Beschreibungen auch den Aufsatz von Varga/Eiermann/Niethammer (1987) und natürlich die entsprechenden Monographien, beispielsweise von Varga (1962) oder Birkhoff/Lynch (1984)](https://www.amazon.com/Numerical-Solution-Elliptic-Problems-Mathematics-6/dp/0898711975). Zu den Autoren: Richard Steven Varga (1928--2022), Michael Eiermann, Wilhelm Niethammer (1933--2023), Garrett Birkhoff (1911--1996), Robert E. Lynch (*1932).

2. Die Divergenzsätze von Hughes Hallett

Es ist nun von Wichtigkeit in Erfahrung zu bringen, unter welchen Umständen man durch $B$-Extrapolation überhaupt Konvergenz erzielen kann. Aussagen hierzu liefert u.a. Hughes Hallett (1984), Andrew Jonathan Hughes Hallett (1947--2019). Eine einfache Strukturaussage macht zuerst

1. Lemma: $A$-Extrapolation der $B$-Extrapolation (direkt hintereinander) liefert eine $AB$-Extrapolation. Insbesondere für kommutierende Matrizen $A$ und $B$ sind $AB$- und $BA$-Extrapolation dasselbe.

Von generellem Interesse ist das folgende Lemma, welches aber auch bei einem späteren Satz die Schlüsselrolle spielen wird.

2. Lemma: (1) Die $\gamma$-Extrapolation konvergiert für gewisse $\gamma>0$ genau dann, wenn sämtliche Realteile der Eigenwerte der Iterationsmatrix $G$ kleiner als eins sind, es muß also gelten $\mathop{\rm Re}\nolimits \lambda_i<1$, für alle $i=1,\ldots,n$ und den $\lambda_i$ aus dem Spektrum $\sigma(G)$. {\parskip=0pt plus 2pt\parindent=20pt\par} (2) Analog konvergiert die $\gamma$-Extrapolation für $\gamma<0$ genau dann, wenn $\mathop{\rm Re}\nolimits \lambda_i>1$, für alle $i=1,\ldots,n$.

Eines der wesentlichen Ergebnisse von Hughes Hallett ist nun der folgende Satz.

3. Proposition: Für eine beliebige Iterationsmatrix $G$ läßt sich nicht immer eine reelle Diagonalmatrix $R$ finden, so, daß die $R$-Extrapolation konvergent ist.

Der Beweis benutzt das oben angegebene Lemma. Die Beweisführung konzentriert sich nun darauf zu zeigen, daß die Eigenwerte von $R(I-G)$ nicht stets in der linken Halbebene $\mathbb{C}^{{}-{}}$ liegen können. Dies wiederum legt es nahe, bekannte Sätze aus der Stabilitätstheorie zu benutzen, wie z.B. die Sätze von Liénard-Chipart und Hurwitz. Hurwitz, Adolf (1859--1919), Liénard, Alfred-Marie (1869--1958), Chipart, A.H..

Beweis: (zum Diagonalextrapolationssatz von Hughes Hallett) Sei $G$ Grunditerationsmatrix. Die $R$-Extrapolation hat als Iterationsmatrix $RG+I-G = I+R(G-I)$, also hat die $\gamma$-Extrapolation der $R$-Extrapolation die Iterationsmatrix $I+\gamma R(G-I)$. Nach dem Lemma muß also gelten: Die Realteile der Eigenwerte von $R(G-I)$ müssen in der linken komplexen Halbebene $\mathop{\rm Re}\nolimits z<0$ liegen ($R\gets\gamma R$). Das charakteristische Polynom

$$ \left|R(G-I)-\lambda I\right| = a_0\lambda^n + a_1\lambda^{n-1} + \cdots + a_{n-1}\lambda + a_n $$

hat die Koeffizienten

$$ \eqalign{ a_1 &= - \sum_{1\le i\le n} \left(R(G-I)\right)_i^i = - \sum_{1\le i\le n} r_i (G-I)_i^i, \cr a_2 &= + \sum_{i_1\lt i_2} \left(R(G-I)\right)_{i_1i_2}^{i_1i_2} = + \sum_{i_1\lt i_2} r_{i_1} r_{i_2} (G-I)_{i_1i_2}^{i_1i_2}, \cr a_3 &= - \sum_{i_1\lt i_2\lt i_3} r_{i_1} r_{i_2} r_{i_3} (G-I)_{i_1i_2i_3}^{i_1i_2i_3}, \cr \vdots & \qquad\qquad\vdots\cr a_n &= (-1)^n r_1 r_2 \ldots r_n \left|G-I\right|. } $$

Nach dem Stabilitätskriterium von Liénard/Chipart ist aber abwechselnde Positivität geeigneter $a_i$ vonnöten. Dies lässt sich durchbrechen, falls $(G-I)_i^i=0$ $\forall i$, oder $(G-I)_{i_1i_2}^{i_1i_2}=0$ $\forall i_1

Es ist nun leicht möglich auf den nichtlinearen Fall zu verallgemeinern. Benutzt man das Verfahren $x^{\nu+1}=g(x^\nu)$ so ist ganz entsprechend wie im linearen Falle, die $R$-Extrapolation

$$ x^{\nu+1} = Rg(x^\nu) + (I-R)x^\nu. $$

Hughes Hallett (1984) zeigt nun ganz analog:

4. Satz: (1) Für die oben angegebene $R$-Extrapolation ist es nicht immer möglich eine reelle Diagonalmatrix $R$ zu finden, der Gestalt, daß das resultierende Verfahren dann konvergent ist.

(2) Dennoch ist es grundsätzlich möglich eine Nichtdiagonalmatrix zu finden, so, daß das Verfahren für einen Startwert, der nahe genug an der gewünschten Lösung liegt, konvergiert.

(3) Weiterhin ist es möglich eine Folge von Matrizen zu bestimmen, derart, daß das Verfahren für beliebige Funktionen $g({}\cdot{},{}\cdot{})$ konvergiert.

Der Beweis wird durch eine Linearisierung auf den vorhergehenden Satz zurückgespielt.

Es sind nun diese Ergebnisse mit den Vorbereitungen von oben, die ein zusätzliches Licht auf das Konvergenzversagen des modifizierten Newton-Verfahrens werfen. Dies komplementiert auch die Beobachtungen von Shampine (1980), welcher nicht generelle Konvergenzunmöglichkeiten in Erwägung zog, sondern seine Überlegungen auf den Konvergenztest fokussierte. Lawrence F. Shampine.

Shampine (1980) zitiert als experimentelle Stütze den Aufsatz von Byrne/Hindmarsh/Jackson/Brown (1977), in dem deutlich wird, daß das modifizierte Newton-Verfahren in den beiden Programmen GEAR und EPISODE, bei Verwendung der Richtungsableitung der Funktion $f$ als Näherung für die Diagonalelemente der Jacobimatrix $J$, nicht mehr zufriedenstellend arbeitet. Autoren: Byrne, George D., Hindmarsh, Alan C. (*1942), Jackson, Kenneth R., Brown, H. Gordon. Shampine (1980)](https://epubs.siam.org/doi/epdf/10.1137/0901005 schreibt:

Regardless of the Jacobian approximation, if the convergence test is reliable, the codes should deliver a good solution to the problem. Of course the efficiency is affected, but the accuracy of the results should not be. … it appears that the convergence test is unreliable, and that the potential unreliability can sometimes be exhibited as the result of a very poor approximate Jacobian.

Hierbei ist zu beachten, daß die Programme meistens maximal dreimal iterieren und damit lediglich drei Tests zur Erkennung von Divergenz durchgeführt werden. In dem Programm SYMPOL, zur Lösung von polynomialen nichtlinearen Gleichungssytemen, beispielsweise, wird bis zu 25-mal iteriert. In dem Programm BRENTM zur Lösung allgemeiner nichtlinearer Gleichungssysteme wird maximal 50-mal iteriert. Das Programm COLNEW iteriert pro Gitter nicht mehr als 40-mal.

3. Die Experimente von Byrne/Hindmarsh/Jackson/Brown

Das an anderer Stelle angegebene zweidimensionale Differentialgleichungsproblem P3 von Byrne/Hindmarsh/Jackson/Brown (1977), welches seinen Ursprung in der chemischen Kinetik hat, wurde nun von Byrne, Hindmarsh, Jackson und Brown mit den verschiedensten Parametereinstellungen der beiden Programme GEAR und EPISODE ausgetestet. Die dabei gewonnenen Erfahrungen und Ergebnisse sind für die weitere Analyse sehr wertvoll. Daher soll dieses Beispiel kurz näher untersucht und interpretiert werden. Die folgenden Daten wurden gemessen bei dem Problem P3.

$\varepsilon$	code	T	$\Vert y-Y\Vert$	nst	${\rm nfe\over nst}$	nfe	nje	${\rm nst\over nje}$	$J$-CPU	$%T$	$f$-CPU	%T
$10^{-3}$	E21	2.33	1.89	4428	1.8	7979	859	5.2	0.0904	4	0.389	17
"	E22	2.37	3.70	4418	1.8	7884	845	5.2	0.159	7	0.384	16
"	E23	1.92	1120	4337	1.7	7350	893	4.9	0	0	0.358	19

"	G13	2.74	2520	11681	1.1	13087	1362	8.6	0	0	0.638	23
"	G21	2.25	1.67	5619	1.6	9145	751	7.5	0.0790	4	0.446	20
"	G22	2.30	1.44	5573	1.7	9390	754	7.4	0.142	6	0.458	20
"	G23	1.45	11400	4532	1.5	6578	662	6.8	0	0	0.321	22

$10^{-6}$	E21	5.18	9.22	10507	1.6	16594	1263	8.3	0.133	3	0.809	16
"	E22	5.33	22.2	10610	1.6	16903	1337	7.9	0.251	5	0.824	15
"	E23	5.67	2190	13075	1.6	21099	2715	4.8	0	0	1.03	18

"	G21	5.75	5.32	14992	1.5	22919	1579	9.5	0.166	3	1.12	19
"	G22	5.95	4.68	14984	1.6	23229	1571	9.5	0.295	5	1.13	19
"	G23	4.77	3360	15187	1.5	22304	2129	7.1	0	0	1.09	23

$10^{-9}$	E21	16.9	74.2	31282	1.3	41622	2465	12.7	0.259	2	2.03	12
"	E22	17.1	50.3	31413	1.3	41794	2532	12.4	0.476	3	2.04	12
"	E23	18.7	4310	39078	1.4	55623	6304	6.2	0	0	2.71	14

"	G21	16.7	24.4	41058	1.4	56423	3581	11.5	0.377	2	2.75	16
"	G22	17.4	26.8	40963	1.4	56513	3613	11.3	0.679	4	2.76	16
"	G23	14.4	3840	42199	1.4	56990	4358	9.7	0	0	2.78	19

Dabei bezeichnet $T$ die Gesamtrechenzeit auf einer Rechenanlage vom Typ ^{IBM 370/195} in doppelter Genauigkeit. Der Eintrag code bezeichnet welches Programm, mit welcher Parametereinstellung benutzt wurde. Hierbei deuten die beiden letzten Ziffern den Wert von mf an. Beispielsweise wurde bei E21 das Programm EPISODE mit der Parametereinstellung mf=21 benutzt. Der durch beiderseitige Einrahmung hervorgehobene Eintrag $\|y-Y\|$ kennzeichnet den globalen Fehler, der begangen wurde. nstep gibt die Anzahl der Schritte an zur Vollendung der Integration. nfe gibt die Anzahl der Funktionsauswertungen an, und der Bruch $\hbox{nfe}\over\hbox{nst}$ gibt an, wieviele Funktionsauswertungen pro Schritt getätigt wurden. Schließlich wird durch nje die Anzahl der Jacobimatrixauswertungen angezeigt. Der Bruch $\hbox{nst}\over\hbox{nje}$ zeigt an, über wieviele Schritte im Mittel die Jacobimatrix konstant gelassen wurde. $J$-CPU bedeutet, wieviel Rechenzeit verbraucht wurde zur Ausführung des Unterprogrammes pset und aller Routinen, die dieses Unterprogramm selbst aufruft. Die sich hieran direkt anschließende Spalte $%T$ gibt den prozentualen Anteil an der Gesamtrechenzeit wieder. "$f$-CPU" offenbart die Gesamtrechenzeit, die zur Funktionsauswertung benötigt wurde, allerdings ohne die Zeit zur numerischen Approximation der Jacobimatrix. Die sich ebenfalls direkt rechts daneben anschließende letzte Spalte gibt den prozentualen Anteil am Gesamtrechenzeitbedarf an.

Der Eingabeparameter mf besteht stets aus zwei Ziffern. Die erste Ziffer ist immer aus $\{1,2\}$, wobei 1 das Adams-Verfahren anwählt und 2 die BDF anzeigt. Die zweite Ziffer ist immer aus $\{0,1,2,3,4,5\}$. Für die obige Tabelle sind nur wichtig die Ziffern 1, 2 und 3. Zur Vollständigkeit seien jedoch alle Ziffern kurz erläutert. Es sei erwähnt, daß die beiden letzten Ziffern 4 und 5 noch nicht in den beiden Programmen GEAR und EPISODE zur Verfügung standen, sondern erst in der späteren Version LSODE und den entsprechenden Modifikationen dieses Programmes. Dennoch, da an anderer Stelle, nämlich bei der Beschreibung von LSODA und LSODAR, kurz nocheinmal diese unterschiedlichen Paramtereinstellungen zur Sprache kommen, seien hier sämtliche Einstellungen aufgelistet.

0: Fixpunktiteration. Es wird keine Jacobimatrix ausgewertet und vom Benutzer muß auch keine Routine hierfür bereitgestellt werden.

1: Modifiziertes Newton Verfahren mit einer vollen Jacobimatrix. Diese Matrix muß vom Benutzer durch ein separat programmiertes Unterprogramm dem Programm zur Verfügung gestellt werden. Ob diese Matrix tatsächlich die passende Jacobimatrix $J$ zur Funktion $f$ ist, wird nicht überprüft.

2: Wie 1, jedoch wird die Jacobimatrix intern durch numerische Differentiation berechnet. Diese Einstellung ist für den Benutzer wesentlich bequemer und einfacher. Bei großen Differentialgleichungen ist diese Einstellung jedoch i.a. nicht so effizient, wie diejenige bei 1.

3: Diagonalapproximation der Jacobimatrix. Diese Paramter-Einstellung ist sehr speicherplatzökonomisch. Bei der Diagonalapproximation handelt es sich nicht wirklich um eine Diagonal_approximation_, sondern es wird lediglich eine gewichtete Richtungsableitung der Funktion $f$ zur Newton Iteration herangezogen.

4: Modifiziertes Newton-Verfahren mit einer vom Benutzer bereitgestellten Bandmatrix. Diese Matrix muß wie bei der Parameter-Einstellung 1 durch ein getrenntes Unterprogramm bereitgestellt werden, und wie bei 1 wird diese Matrix nicht überprüft.

5: Wie 4, nur wird hier die Bandmatrix durch numerische Differentiation ermittelt. Dies ist erneut für den Benutzer die bequemste und einfachste Art die Jacobimatrix zu spezifizieren.

Die Interpretation der oben angegebenen Tabelle ist nun wie folgt. Durchschnittlich werden von GEAR über alle Parametereinstellungen betrachtet, durchschnittlich 1.5 Funktionsauswertungen pro Schritt durchgeführt (Standardabweichung unter 0.2), und für EPISODE ergibt sich hier der Wert 1.6 (Standardabweichung ebenfalls unter 0.2). Dies heißt, daß beide Programme zu einem großem Teil im $PEC$-Modus arbeiten. Ferner bedeutet dies, daß eine Konvergenzrate sehr häufig nicht gebildet werden kann und somit die Konvergenzrate aus einem vergangenen Schritt genommen wird, bei dem Konvergenztest in beiden Programmen.

Der Rechenzeitanteil der Funktionsauswertungen an der Gesamtrechenzeit liegt bei dem Programm GEAR bei durchschnittlich 20% (Standardabweichung unter 3%) und für EPISODE bei durchschnittlich 15% (Standardabweichung ebenso unter 3%). Diese Zahlen sind für ein Mehrschrittverfahren mit variabler Schrittweite und variabler Ordnung nicht ungewöhnlich. Für das Programm TENDLER ergeben sich Zahlen ähnlicher Größenordnung.

Für die Strategien bei der Neuauswertung der Jacobimatrix $J$ ergibt sich das folgende Bild. Das Programm GEAR wartet im Mittel ungefähr 9 Schritte, bevor die Jacobimatrix neu ausgewertet wird (Standardabweichung unter 2 Schritten), während hingegen das Programm EPISODE im Mittel rund 8 Schritte wartet (Standardabweichung: ca. 3 Schritte). Erwartungsgemäß wird bei schärferer Genauigkeitsanforderung $\varepsilon$ länger gezögert, bis eine Neuauswertung vorgenommen wird, da ja die gewählten Schrittweiten naturgemäß kleiner sind und damit die Iterationsmatrix $W=I-h\gamma J$ sich nicht so stark ändert. Die Neuberechnung der Jacobimatrix $J$ wird ja entscheidend im Hindmarsh-Test gesteuert und beeinfußt, weniger durch Konvergenzversagen.

Auffällig an den angegebenen Daten in der Tabelle sind nun die eingerahmten globalen Fehler $\|y-Y\|$. Bei der Wahl des Eingabesteuerungsparameters mf=23 für beide Programme, also GEAR und sowohl auch EPISODE, ergeben sich deutlich auffallende, ungewöhnlich große Fehler. Für die Wahl mf=13 für das Programm GEAR gelten die Bemerkungen analog. Hierzu bemerken Byrne, Hindmarsh, Jackson und Brown:

mf=23 does not control the error well for this problem for either code.

Im Lichte der oben vorbereiteten Bemerkungen ist dies nicht mehr so überraschend. Es sind diese Gründe, die bewogen ein Iterationsverfahren mit Diagonalapproximation der Jacobimatrix nicht in das Programm TENDLER mit aufzunehmen. Insbesondere wird beim Schalten nicht zwischen modifizierter Newton-Kantorovich Iteration und Newton-Jacobi Iteration hin und her gewechselt, sondern lediglich zwischen modifiziertem Newton-Kantorovich Iteration und Picard Iteration.

Denkbar wäre, die Diagonalelemente von $J$ weiterzubenutzen, bei einem Wechsel der Iterationsart, also Verwendung des modifizierten Newton-Jacobi Verfahrens, siehe Ortega (1972). Lawrence Shampine (1982) schlägt dies ebenso vor. Er bemerkt hierzu:

We propose using simple iteration until the first Jacobian is formed and thereafter using Jacobi-iteration.

Autoren: Lawrence F. Shampine, James M. Ortega (*1932).

Dennoch, die $n$ Divisionen, $n$ Multiplikationen und $n$ Additionen und die zusätzlichen Speicherstellen sind zu bedenken. Das Programm LSODA bietet dem Benutzer nicht mehr an, entgegen den Offerierungen beim Programm LSODE, die Jacobimatrix durch eine Diagonalapproximation zu ersetzen. Bei einfach auszuwertenden Funktionen ist es vorteilhafter Picard Iteration zu verwenden, als ältere Informationen aus den Diagonalelementen der Jacobimatrix, ja sogar letztlich nur ältere Informationen aus mehr oder minder guten Näherungen für diese Diagonalelemente. In dem Programm von {Suleiman, M.B.}{Hall, George}Suleiman/Hall (1985) wird insoweit besonderer Gebrauch von den Diagonalelementen der Jacobimatrix $J$ gemacht, als daß die Spur der Jacobimatrix mit zum Schalten herangezogen wird. Das Programm TENDLER benutzt diese Informationen nicht. Bei Diagonaldominanz ergibt sich offensichtlich sofort ein Vorteil, bei der Benutzung des modifizierten Newton-Jacobi Verfahrens.

Es ist der zusätzliche Speicherbedarf und die Unsicherheit des Vorteils, die bewogen keinen Gebrauch der modifizierten Newton-Jacobi Iteration zu machen. Vom Speicherplatz sind zyklische lineare Verfahren einstufigen linearen Verfahren in linearen Termen der Dimension der Differentialgleichung _unterlegen:, da ja für jede Stufe eine geeignete Differenzentabelle bereit gehalten werden muß. Beispielsweise beträgt der lineare Anteil des Speichers für die drei Programme LSODA, STINT und TENDLER:

TENDLER	STINT	LSODA
$37n$ bzw. $42n$	$41n$	$11n$

Angeführt ist hier jeweils der Speicherplatzbedarf, falls die Höchstordnung auf 5 begrenzt wird. Dies liegt daran, daß die Programme LSODE, LSODA und LSODAR im steifen Modus maximal die BDF5 verwenden können. Die BDF6 wird aufgrund ihres kleinen Widlund-Winkels nicht in diesen Programmen benutzt. Die BDF$i$ ($i>6$) sind nicht $D$-stabil. Die zyklischen Verfahren von J.M. Tendler sind hingegen bis zur Ordnung 7 einsetzbar. Das Programm TENDLER lässt sich leicht dahingehend modifizieren, daß nur noch $37n$ Speicherzellen benutzt werden. Allerdings sind dann unter gewissen Umständen, während der Schrittweiten- und Ordnungssteuerung, Doppelberechnungen durchzuführen. Diese Doppelberechnungen müssen im Programm STINT auf jeden Fall durchgeführt werden.

Konvergenzresultate für feste Schrittweiten

Tue, 11 Jun 2024 13:40:00 +0200

1. Einführung und grundlegende Begriffe
2. Die Lemmata von Gronwall
3. Notation und Darstellungssatz für Differenzengleichungen
4. Stabilitätsfunktionale für feste Schrittweiten
5. Projektorstabilitätsfunktionale
6. Nichtäquidistante Gitter
7. Die Eigenwerte gewisser tridiagonaler Matrizen
8. Verfahren für parabolische Gleichungen

Es folgt ein recht allgemeiner Konvergenzbeweis für mehrstufige Verfahren, wobei allerdings vorausgesetzt wird, daß mit fester Schrittweite gearbeitet wird. Innerhalb des mehrstufigen Prozesses braucht das verwendete Gitter nicht äquidistant zu sein, wie z.B. bei Runge-Kutta Verfahren. Dabei wird allerdings hier ein etwas längerer Weg eingeschlagen. Zuerst werden in breiter Form Stabilitätsfunktionale vorgestellt und verschiedene, gleichwertige und äquivalente Darstellungen angegeben. Die Beweise für diese Stabilitätsfunktionale enthalten die eigentlichen Konvergenzbeweise, jedoch sind Stabilitätsfunktionale allgemeiner. Sie liefern direkt Stabilitätsungleichungen für die Differenzen zweier Lösungen von Differenzengleichungen, d.h. die Stabilitätsfunktionale liefern direkt Aussagen über das Auseinanderlaufen der Lösungen zweier Differenzengleichungen in Abhängigkeit von Störungen. An Differenzengleichungen werden nur lineare Gleichungen betrachtet, allerdings darf die Inhomogenität beliebig sein, wenn sie nur einer Lipschitzbedingung genügt.

Bevor die eigentlichen Überlegungen bzgl. der Stabilitätsfunktionale angestellt werden, sollen anhand einfacher, vorangestellter Überlegungen, einige grundsätzliche Probleme beleuchtet werden. Danach folgen die sehr wichtigen Aussagen von Gronwall. Das diskrete Lemma von Gronwall spielt eine entscheidende Rolle beim Hauptsatz über Stabilitätsfunktionale. Vielerorts befindet sich das diskrete Lemma von Gronwall versteckt in Konvergenzbeweisen und hier i.d.R. nur in sehr spezialisierter Form. Erst daran anschliessend werden die Stabilitätsfunktionale behandelt und verschiedene Äquivalenzen bewiesen.

$ \def\diag{\mathop{\rm diag}} \def\col{\mathop{\rm col}} \def\row{\mathop{\rm row}} \def\dcol{\mathop{\rm col\vphantom {dg}}} \def\drow{\mathop{\rm row\vphantom {dg}}} \def\rank{\mathop{\rm rank}} \def\grad{\mathop{\rm grad}} \def\adj#1{#1^*} \def\iadj#1{#1^*} \def\tr{\mathop{\rm tr}} \def\mapright#1{\mathop{\longrightarrow}\limits^{#1}} \def\fracstrut{} $

1. Einführung und grundlegende Begriffe

Es sei $\mathfrak{B}$ ein Banachraum und $h\in\mathbb{R}$ die Schrittweite. Die Klasse von Verfahren der Form

$$ u_{n+1} = Su_n+h\varphi(u_{n-1}),\qquad n=0,1,\ldots,N, \qquad u_i\in\mathfrak{B}, $$

berücksichtigt nicht block-implizite Verfahren, oder überhaupt implizite Verfahren, zumindestens ersteinmal nicht in sofort offenkundiger Weise. Hierbei ist

$$ N := \left|b-a\over h\right|, \qquad\hbox{also}\qquad \mathopen|h\mathclose| = {\mathopen|b-a\mathclose|\over N}. $$

Subsumiert sind also nicht Verfahren der Vorschrift

$$ A_1u_{n+1} + A_0u_n = h\cdot(B_1F_{n+1} + B_0F_n),\qquad n=0,1,\ldots,N. $$

bzgl. der rein formalen Schreibweise.

Verfahren der leicht allgemeineren Form

$$ u_{n+1} = Su_n+h\varphi(u_{n+1},u_n),\qquad n=0,1,\ldots,N,\tag{*} $$

berücksichtigen blockimplizite Verfahren und gewöhnliche implizite Verfahren. Die oben angeschriebene Rekurrenz-Vorschrift für $u_{n+1}$ stellt eine implizite Gleichung für $u_{n+1}$ dar. An $\varphi$ muß man daher gewisse Voraussetzungen stellen, um eindeutige Lösbarkeit der impliziten Differenzengleichung zu garantieren. Da die zu integrierende Funktion fast durchweg einer Lipschitzkonstanten genügt, ist es naheliegend dasselbe auch für die Inhomogenität der Differenzengleichung zu fordern. Es möge also gelten

$$ \eqalign{ \mathopen|\varphi(\hat u_{n+1},\hat u_n) - \varphi(u_{n+1},u_n)\mathclose| &{}\le K_1 \mathopen|\hat u_{n+1}-u_{n+1}\mathclose|,\cr \mathopen|\varphi(\hat u_{n+1},\hat u_n) - \varphi(u_{n+1},u_n)\mathclose| &{}\le K_2 \mathopen|\hat u_n-u_n\mathclose|.\cr } $$

1. Satz: Die Differenzengleichung $(*)$ besitzt, für genügend kleine Schrittweiten $\mathopen|h\mathclose|$, eine eindeutige Lösung. Diese eindeutig bestimmte Lösung lässt sich mit Picard-Iteration bestimmen.

Beweis: Die Keplersche Gleichung für $u_{n+1}$ hat wegen der vorausgesetzten Lipschitz-Stetigkeit in der letzten Komponente von $\varphi$, nach dem Banachschen Fixpunktsatz eine eindeutig bestimmte Lösung und lässt sich durch Fxpunktiteration gewinnen.

$$ \mathopen|h K_2\mathclose| \lt 1, \qquad\hbox{für geeigntes $h$}. $$

☐

2. Bemerkung: Für nicht genügend kleines $\mathopen|h\mathclose|$ kann die Gleichung in der Tat mehrere oder keine Lösung besitzen.

Es seien betrachtet die beiden Verfahren der Form

$$ \eqalign{ \hat u_{n+\ell}+A_{\ell-1}\hat u_{n+\ell-1}+\cdots+A_0\hat u_n &{}= h{\mskip 3mu}\varphi(\hat u_{n+\ell},\hat u_{n+\ell-1},\ldots,\hat u_n)+r_n,\cr u_{n+\ell}+A_{\ell-1}u_{n+\ell-1}+\cdots+A_0u_n &{}= h{\mskip 3mu}\varphi(u_{n+\ell},u_{n+\ell-1},\ldots,u_n).\cr } $$

Das erste Verfahren kann man als gestörtes Verfahren auffassen, während hingegen das zweite Verfahren das eigentliche Verfahren zur Berechnung der numerischen Lösung ist. Es sei

$$ P_1 := \pmatrix{I&0&\ldots&0},\qquad R_1 := \pmatrix{0\cr \vdots\cr 0\cr I\cr}, $$

und $\delta_{n+\ell} := \hat u_{n+\ell}-u_{n+\ell}$ und dazu

$$ \hat\delta_{n+\ell} := \varphi(\hat u_{n+\ell},\ldots,\hat u_n) - \varphi(u_{n+\ell},\ldots,u_n). $$

Es ist also $\hat\delta_{n+\ell}$ die Differenz der entsprechenden Werte für die Funktion $\varphi(\cdot)$, wenn sämtliche Argumente verschieden sind.

Weiter sei

$$ % bold overlined P1 (pee one) bold overlined R1 (err one) \def\bov#1#2{\overline{\bf #1}_{#2}} % boldface and overlined \def\bopo{\bov P1} \def\boro{\bov R1} \def\bfR{{\bf R}} \def\bovy#1{\bov Y{\!#1}} \def\ovbf#1{\overline{\bf #1}} U := \pmatrix{u_0\cr \vdots\cr u_N\cr},\qquad \hat U := \pmatrix{\hat u_0\cr \vdots\cr \hat u_N\cr},\qquad \bf R := \pmatrix{r_0\cr \vdots\cr r_{N+\ell-1}}. $$

Die einzelnen $u_i$ und $\hat u_i$ sind aus dem Vektorraum $\mathbb{R}$, nicht notwendig endlichdimensional. Hierbei sind die $A_i:\mathfrak{B}\to\mathfrak{B}$ stetige, lineare Operatoren zwischen Banach-Räumen. Bei linearen Operatoren ist bekanntlich die Stetigkeit in einem Punkte äquivalent mit der globalen Stetigkeit und dies äquivalent mit der Beschränktheit. $\mathfrak{B}$ ist hierbei entweder ein reller oder komplexer Banachraum. Die Vektorraumeigenschaften braucht man der Linearität wegen, die Normiertheit für die folgenden Funktionale, und die Vollständigkeit wird benötigt bei der Anwendung des Banachschen Fixpunktsatzes. Beispiele sind $\mathfrak{B}=\mathbb{C}^k$ und $\mathfrak{B}=\mathbb{R}^k$, mit $k\ge1$. Gelegentlich gelten die Sätze auch in nicht notwendigerweise kommutativen Ringen $\mathfrak{B}$. Für $\mathfrak{B}$ wird im folgenden stets $\mathbb{C}^k$ gewählt. Die Mengen $\mathbb{C}^{k\ell\times k\ell}$ wären dann entsprechend zu ersetzen durch $\mathbb{R}^{\ell\times\ell}$ und andere Mengen entsprechend.

Man beachte, daß die Abschätzung nun abhängig von $h$ ist, die Abschätzung aber ausschliesslich für gewisse sehr stark eingeschränkte Schrittweiten $h$ gilt. Ohne Einschränkung an die Schrittweite $h$ ist der Satz nicht richtig. Die Unabhängigkeit von den $B_\nu$, bei

$$ \sum_{\nu=0}^\ell A_\nu u_{n+\nu} = h{\mskip 3mu}\sum_{\nu=0}^\ell B_\nu{\mskip 3mu}F_{n+\nu}, \qquad n=0,1,\ldots,N, $$

verlangt eine entsprechende Einschränkung an die Schrittweite $h$. Für einen praktischen Einsatz ist zusätzlich ein entsprechend großer Stabilitätsbereich erforderlich. In die Größe des Stabilitätsbereiches gehen entscheidend die $B_\nu$ ein und die Art der Iteration, mit der die impliziten Gleichungen in jedem Zeitschritt gelöst werden. Der Satz verliert ebenfalls seine Gültigkeit bei “langen” Integrationsintervallen. $\mathopen|b-a\mathclose|$ wird dann beliebig groß. Der Satz zeigt, daß bei endlichem Integrationsintervall $\mathopen|b-a\mathclose|$, die $\mathopen|\hat U-U\mathclose|$-Norm mit der $\left| \bopo [C_1]^{-1} \boro \bfR\right|$-Norm äquivalent ist. Bei unendlich langen Integrationsintervallen, sind diese Normen nicht notwendigerweise mehr äquivalent.

2. Die Lemmata von Gronwall

Das Lemma von Gronwall, Thomas Gronwall, (1877--1932) für den kontinuierlichen Falle lautet

1. Satz: (Lemma von Gronwall) Seien $h$, $w$ und $k$ stetige, reell-wertige Funktionen auf dem Intervall $[a,b]$. (Es muß lediglich gelten $(\int_a^x f)'=f(x)$, sodaß man mit leicht schwächeren Bedingungen auskäme.) Es gelte auf diesem Intervall die Abschätzung

$$ h(x)\le w(x)+\int_a^x k(t){\mskip 3mu}h(t){\mskip 3mu}dt,\qquad\forall x\in[a,b]. $$

Das Integral auf der rechten Ungleichungsseite sei stets nicht-negativ, was beispielsweise für nicht-negative Funktionen $k$, $w$, $h$ auf dem Intervall $[a,b]$, sichergestellt werden kann. Dann gilt die Abschätzung für die Funktion $h$ auf dem gesamten Intervall zu

$$ h(x)\le w(x)+\int_a^x \exp\left(\int_t^x k(\tau){\mskip 3mu}d\tau\right)k(t){\mskip 3mu}w(t){\mskip 3mu}dt, \qquad\forall x\in[a,b]. $$

Beweis: siehe Helmut Werner und Herbert Arndt in Werner/Arndt (1986). Sei

$$ H(x) := \int_a^x k(t){\mskip 3mu}h(t){\mskip 3mu}dt. $$

Hiermit gilt dann aufgrund der Stetigkeit von $k$ und $h$

$$ H'(x) = k(x){\mskip 3mu}h(x), \qquad H(a)=0, \qquad \forall x\in[a,b]. $$

Aus dieser Differentialgleichung folgt aufgrund der vorausgesetzten Ungleichung für die Funktion $h$

$$ H'(x) = k(x){\mskip 3mu}h(x) \le k(x)\cdot\left(w(x)+H(x)\right), $$

also die lineare Differentialungleichung

$$ H'(x) - k(x){\mskip 3mu}H(x) \le k(x){\mskip 3mu}w(x),\qquad H(a)=0.\tag{*} $$

Multiplikation mit

$$ e^{-K(x)}\gt 0,\qquad K(x) := \int_a^x k(t){\mskip 3mu}dt, $$

führt zu

$$ e^{-K(x)}\left[H'(x) - k(x){\mskip 3mu}H(x)\right] = \left(e^{-K(x)}\cdot H(x)\right)' \buildrel{\displaystyle{(*)\atop\downarrow}}\over\le e^{-K(x)}\cdot k(x){\mskip 3mu}w(x). $$

Integration von $a$ nach $x$ liefert wegen der mittleren Gleichung (Integral ist monotones Funktional)

$$ e^{-K(x)}H(x) - e^{-K(a)}H(a) \le \int_a^x e^{-K(t)}\cdot k(t){\mskip 3mu}w(t){\mskip 3mu}dt, $$

also wegen $H(a)=0$ somit

$$ H(x)\le\int_a^x e^{K(x)-K(t)}\cdot k(t){\mskip 3mu}w(t){\mskip 3mu}dt $$

und aufgrund der Voraussetzung von $h(x)\le w(x)+H(x)$ sofort

$$ h(x)\le w(x)+\int_a^x\left(\exp\int_t^x k(\tau){\mskip 3mu}d\tau\right)\cdot k(t){\mskip 3mu}w(t){\mskip 3mu}dt. $$

☐

2. Folgerung: Gilt $h(x)\le w+k\int_a^x h(t){\mskip 3mu}dt$, mit festen, nicht-negativen Konstanten $w$ und $k$, so folgt die Abschätzung

$$ h(x)\le w{\mskip 3mu}e^{k\cdot(x-a)},\qquad\forall x\in[a,b]. $$

Ein völliges Analogon zum kontinuierlichen Lemma von Gronwall macht das diskrete Lemma von Gronwall, welches ebenfalls exponentielles Wachstum anzeigt, wenn eine Funktion geeignet auf beiden Seiten einer Ungleichung vorkommt. Es gilt nun der

3. Satz: (Diskretes Lemma von Gronwall) Es seien $(m+1)$ positive Zahlen $0\le\eta_0\le\eta_1\le\ldots\le\eta_m$ vorgegeben. Ferner sei $\delta\ge0$, $h_j\ge0$ und $x_{j+1}=x_j+h_j$. Es gelte die Ungleichung

$$ \varepsilon_0\le\eta_0\qquad\hbox{und}\qquad \varepsilon_{j+1}\le \eta_j + \delta\sum_{\nu=0}^j h_\nu\varepsilon_\nu, \qquad j=0,\ldots,m-1. $$

Dann gilt

$$ \varepsilon_j\le \eta_j{\mskip 3mu}e^{\delta\cdot(x_j-x_0)},\qquad j=0,\ldots,m. $$

Beweis: siehe erneut Helmut Werner und Herbert Arndt in Werner/Arndt (1986). Der Fall $\delta=0$ ist einfach, wegen $e^0=1$. Sei nun $\delta>0$. Induktionsverankerung mit $j=0$ ist klar, ebenfalls einfach, wegen $e^0=1$. Der eigentliche Beweis reduziert sich jetzt lediglich noch auf den Induktionsschluß von $j$ nach $j+1$, wobei $\delta>0$ vorausgesetzt werden kann. Hier gilt nun

$$ \eqalign{ \varepsilon_{j+1} &{}\le\eta_{j+1}+\delta\sum_{\nu=0}^j h_\nu{\mskip 3mu}\varepsilon_\nu\cr &{}\le\eta_{j+1}+\delta\sum_{\nu=0}^j h_\nu{\mskip 3mu}\eta_\nu{\mskip 3mu}e^{\delta\cdot(x_\nu-x_0)}\cr &{}\le\eta_{j+1}\cdot\left(1+\delta\sum_{\nu=0}^j h_\nu{\mskip 3mu}e^{\delta\cdot(x_\nu-x_0)}\right)\cr &{}\le\eta_{j+1} {\mskip 5mu} e^{\delta\cdot(x_{j+1}-x_0)}.\cr } $$

Für die Summe in der Klammer schätzte man ab (Untersumme einer streng monoton steigenden Funktion)

$$ \sum_{\nu=0}^j h_\nu{\mskip 3mu}e^{\delta\cdot(x_\nu-x_0)} \le \int_{x_0}^{x_{j+1}} e^{\delta\cdot(t-x_0)}{\mskip 3mu}dt = {1\over\delta} \left( e^{\delta(x_{j+1}-x_0)}-1 \right). $$

☐

3. Notation und Darstellungssatz für Differenzengleichungen

Man vgl. auch Matrixpolynome.

Zur multiplen Lipschitzkonstanten von $\varphi$:

$$ \mathopen|\varphi(u_\ell,\ldots,\hat u_i,\ldots,u_0) - \varphi(u_\ell,\ldots,u_i,\ldots,u_0)\mathclose| \le K_i \cdot \mathopen|\hat u_i-u_i\mathclose|,\qquad i=0,\ldots,\ell. $$

$(\ell+1)$-malige Anwendung der Dreiecksungleichung liefert

$$ \mathopen|\varphi(\hat u_\ell,\ldots,\hat u_0)-\varphi(u_\ell,\ldots,u_0)\mathclose| \le \sum_{i=0}^\ell K_i \mathopen|\hat u_i-u_i\mathclose| = \langle \pmatrix{K_0\cr \vdots\cr K_\ell\cr}, \pmatrix{\mathopen|\hat u_0-u_0\mathclose|\cr \vdots\cr \mathopen|\hat u_\ell-u_\ell\mathclose|\cr} \rangle $$

In der Schreibweise von $\varphi$ seien fortan zahlreiche hier nicht weiter interessierende Argumente der Schreibvereinfachung und der Klarheit wegen weggelassen. Es ist $\varphi(u_\ell,\ldots,u_1) = \varphi(t_\ell,h_\ell,u_\ell,\ldots,u_1)$.

1. Beispiel: Für die Verfahrensvorschrift der Form

$$ A_\ell u_{n+\ell}+\cdots+A_0u_n = h(B_\ell F_{n+\ell}+\cdots+B_0F_n), \qquad F_{n+i} := \pmatrix{f_{N\ell+*}\cr \vdots\cr f_{N\ell+*}\cr}, $$

wobei die $f_k$ Näherungswerte für $f(t_k,y(t_k))$ sind. Die Funktion $f$ der Differentialgleichung sei Lipschitz-stetig mit der Lipschitzkonstanten $L$ vorausgesetzt, also

$$ \left|f(t,\hat y)-f(t,y)\right| \le L \mathopen|\hat y-y\mathclose|. $$

Dann gilt für die obigen Lipschitzkonstanten $K_i$ die Verbindung mit der Lipschitzkonstanten der Differentialgleichung zu

$$ K_i = \left|B_i\right|\cdot L,\qquad\hbox{oder ggf.}\qquad K_i = \left|A^{-1}_\ell B_i\right|\cdot L. $$

2. Definition: Es sei $T$ eine gänzlich beliebige Matrix der Größe $k\times k$. Dann wird der Bidiagonaloperator $[T]$ zur Matrix $T$ der Größe $(N+1)k\times(N+1)k$, wie folgt definiert

$$ \left[T\right] := \pmatrix{ I & & & \cr -T & I & & \cr &\ddots&\ddots&\cr & & -T & I\cr}, \qquad \left[T\right]^{-1} = \pmatrix{ I & & & \cr T & I & & \cr \vdots & \vdots & \ddots & \cr T^n & T^{n-1} & \ldots & I\cr}. $$

Rechts daneben steht die Inverse, welche für eine beliebige Matrix $T$ stets existiert.

Die speziellen Operatoren $[\cdot]$ und $[\cdot]^{-1}$ tauchen im weiteren wiederholt auf. Aufgrund der Häufigkeit, wäre es zweckmässiger, die Rollen von $[\cdot]$ und $[\cdot]^{-1}$ zu vertauschen, jedoch stände dies dann im Gegensatz zur Schreibweise bei Skeel (1976), Robert David Skeel.

3. Satz: (Eigenschaften von $\mathop{\rm col}$, $\mathop{\rm row}$, $\mathop{\rm diag}$, $[\cdot]$) Es gilt

$\mathop{\rm col} A_\nu B_\nu = \mathop{\rm diag} A_\nu{\mskip 5mu}\mathop{\rm col} B_\nu$.
$\mathop{\rm col} A_\nu B = \left(\mathop{\rm col} A_\nu\right) B$; Rechtsdistributivität des $\mathop{\rm col}$-Operators.
$\mathop{\rm row} A_\nu B_\nu = \mathop{\rm row} A_\nu{\mskip 5mu}\mathop{\rm diag} B_\nu$.
$\mathop{\rm row} AB_\nu = A{\mskip 3mu} \mathop{\rm row} B_\nu$; Linksdistributivität des $\mathop{\rm row}$-Operators.
$\mathop{\rm diag} A_\nu B_\nu = \mathop{\rm diag} A_\nu{\mskip 5mu}\mathop{\rm diag} B_\nu$; multiplikative Distributivität des Bidiagonaloperators.
$\left[S^{-1}TS\right] = \mathop{\rm diag} S^{-1}{\mskip 3mu} \left[T\right]{\mskip 3mu} \mathop{\rm diag} S$.
$\left[S^{-1}TS\right]^{-1} = \mathop{\rm diag} S^{-1}{\mskip 5mu}\left[T\right]^{-1} \mathop{\rm diag} S$.

Beweis: Zu (1):

$$ \mathop{\rm col}_{\nu=0}^n A_\nu B_\nu = \pmatrix{A_0B_0\cr \vdots\cr A_nB_n\cr} = \pmatrix{A_0&&\cr &\ddots&\cr &&A_n\cr}\pmatrix{B_0\cr \vdots\cr B_n\cr}. $$

Zu (3):

$$ \mathop{\rm row}_{\nu=0}^n A_\nu B_\nu = (A_0B_0{\mskip 3mu}\ldots{\mskip 3mu}A_nB_n) = (A_0{\mskip 3mu}\ldots{\mskip 3mu}A_n)\pmatrix{B_0&&\cr &\ddots&\cr &&B_n\cr}. $$

Zu (5)

$$ \mathop{\rm diag}_{\nu=0}^N A_\nu B_\nu = \pmatrix{A_0B_0&&\cr &\ddots&\cr &&A_nB_n\cr} = \pmatrix{A_0&&\cr &\ddots&\cr &&A_n\cr} \pmatrix{B_0&&\cr &\ddots&\cr &&B_n\cr}. $$

Zu (6): Beachte die Definition von $[T]$ und benutze dann

$$ \pmatrix{ A_{11} & \ldots & A_{1n}\cr \vdots & \ddots & \vdots\cr A_{m1} & \ldots & A_{mn}\cr} \pmatrix{S_1&&\cr &\ddots&\cr &&S_n\cr} = \pmatrix{ A_{11}S_1 & \ldots & A_{1n}S_n\cr \vdots & \ddots & \vdots\cr A_{m1}S_1 & \ldots & A_{mn}S_n\cr}, $$

bzw.

$$ \pmatrix{S_1&&\cr &\ddots&\cr &&S_m\cr} \pmatrix{ A_{11} & \ldots & A_{1n}\cr \vdots & \ddots & \vdots\cr A_{m1} & \ldots & A_{mn}\cr} = \pmatrix{ S_1A_{11} & \ldots & S_1A_{1n}\cr \vdots & \ddots & \vdots\cr S_mA_{m1} & \ldots & S_mA_{mn}\cr}. $$

Zu (7): Folgt aus (4), wegen $(AB)^{-1}=B^{-1}A^{-1}$, wobei $[T]$ für gänzlich beliebige Matrizen $T$ invertierbar ist. Für $T=\bf 0$ ist $[T]$ die Einheitsmatrix der Größe $(n+1)k\times(n+1)k$. ☐

4. Beispiele: Es gilt

$$ \dcol_{i=0}^{\ell-1} (XT^i) = \left(\mathop{\rm diag}_{i=0}^{\ell-1} X\right) \left(\dcol_{i=0}^{\ell-1} T^i\right)\qquad\hbox{und}\qquad \drow_{i=0}^{\ell-1} (T^iY) = \left(\drow_{i=0}^{\ell-1} T^i\right) \left(\mathop{\rm diag}_{i=0}^{\ell-1} Y\right). $$

Im allgemeinen gilt

$$ \mathop{\rm diag}_{i\in U} \mathop{\rm diag}_{k\in V} A_k \ne \mathop{\rm diag}_{k\in V} \mathop{\rm diag}_{i\in U} A_i. $$

Als nächstes folgt die Darstellung der Differenz der Lösung zweier Differenzengleichungen. Dieser Satz spielt eine wiederholt wichtige Rolle bei den gleich folgenden Hauptsätzen.

5. Satz: (Darstellungssatz) Voraussetzungen: $\hat u_n$ und $u_n$ seien die Lösungen der beiden Differenzengleichungen

$$ \left. \eqalign{ \hat u_{n+\ell}+A_{\ell-1}\hat u_{n+\ell-1}+\cdots+A_0\hat u_n &= h{\mskip 3mu}\varphi(\hat u_{n+\ell},\ldots,\hat u_n)+r_{n+\ell}\cr u_{n+\ell}+A_{\ell-1}u_{n+\ell-1}+\cdots+A_0u_n &= h{\mskip 3mu}\varphi(u_{n+\ell},\ldots,u_n)\cr } \right\} \qquad n=0,1,\ldots,N. $$

Die “Störungen” $r_{n+\ell}$ korrespondieren zum Wert $\hat u_{n+\ell}$. Es seien zur Abkürzung gesetzt

$$ \left.\eqalign{ \delta_{n+\ell} &:= \hat u_{n+\ell} - u_{n+\ell}, \cr \hat\delta_{n+\ell} &:= \varphi(\hat u_{n+\ell},\ldots,\hat u_n) - \varphi(u_{n+\ell},\ldots,u_n) \cr }\right\} \qquad n=0,\ldots,N. $$

Die Differenzengleichung für $\hat u_n$ habe die Startwerte $\hat u_i := u_i + r_i$, für $i=0,\ldots,\ell-1$. Es sei $\delta_i := r_i$, für $i=0,\ldots,\ell-1$, und $r_\nu := \delta_\nu := \hat\delta_\nu := 0$, für $\nu>N$.

Behauptung:

$$ \eqalign{ \delta_n &= P_1 C_1^n \pmatrix{\delta_0\cr \vdots\cr \delta_{\ell-1}\cr} + P_1 \sum_{\nu=0}^{n-\ell} C_1^{n-1-\nu} R_1 \left( r_{\nu+\ell} + h \hat\delta_{\nu+\ell} \right) \cr &= P_1 C_1^n \pmatrix{\delta_0\cr \vdots\cr \delta_{\ell-1}\cr} + P_1 \sum_{\nu=0}^{n-\ell} C_1^{n-1-\nu} R_1 r_{\nu+\ell} + P_1 \sum_{\nu=0}^{n-\ell} C_1^{n-1-\nu} R_1 \hat\delta_{\nu+\ell} \cr } $$

Beweis: Folgt aus dem allgemeinen Satz über die Lösung inhomogener, linearer Matrix-Differenzengleichungen. Die allgemeine Lösung der Differenzengleichung

$$ x_{n+\ell}+A_{\ell-1}x_{n+\ell-1}+\cdots+A_0x_n = y_n, \qquad n=0,1,\ldots,N $$

lautet

$$ x_n = P_1 C_1^n z_0 + P_1 \sum_{\nu=0}^{n-1} C_1^{n-1-\nu} R_1 y_\nu. $$

Mit den obigen Abkürzungen für $\delta_n$ und $\hat\delta_n$, ergibt sich eine Differenzengleichung für $\delta_n$ zu

$$ \delta_{n+\ell}+A_{\ell-1}\delta_{n+\ell-1}+\cdots+A_0\delta_n = h \hat\delta_n + r_n. $$

Diese Gleichung hat die Lösungsdarstellung

$$ \delta_n = P_1 C_1^n \pmatrix{\delta_0\cr \vdots\cr \delta_{\ell-1}\cr} + P_1 \sum_{\nu=0}^{n-1} C_1^{n-1-\nu} R_1 \left(h\hat\delta_{\nu+\ell} + r_{\nu+\ell}\right) . $$

In der ersten Summe verschwinden die letzten $(\ell-1)$ Terme, wegen $P_1 C_1^i R_1=0$, für $i=0,\ldots,\ell-2$. Daß hier $P_1 C_1^{\ell-1} R_1 = I$, braucht man noch nicht. Für $\nu>n-\ell$ ist $n-1-\nu \le \ell-2$. Daher folgt genau die behauptete Gleichung, wie oben angegeben. ☐

Die folgende Ungleichung liefert nicht die bestmögliche Abschätzung für $\ell\ge2$, jedoch bleibt sie einfach zu handhaben und wird nachher beim Beweis des Hauptsatzes benötigt.

6. Hilfssatz: (Abschätzungssatz) Voraussetzung: $\varphi(\cdot)$ sei in jeder Komponente Lipschitz-stetig mit den Lipschitzkonstanten $K_i$. Die Werte $\delta_{\nu+\ell}$ und $\hat\delta_{\nu+\ell}$ seien wie oben definiert.

Behauptung:

$$ \eqalign{ \sum_{\nu=0}^{n-\ell} \mathopen| \hat\delta_{\nu+\ell} \mathclose| &\le K_\ell\mathopen|\delta_n\mathclose| + \left(\sum_{i=0}^\ell K_i\right) \left(\sum_{\nu=0}^{n-1} \mathopen|\delta_\nu\mathclose|\right)\cr & \le \left(\sum_{i=0}^\ell K_i\right) \left(\sum_{\nu=0}^n \mathopen|\delta_\nu\mathclose|\right)\cr & \le (\ell+1)\cdot\left( \max_{i=0}^\ell K_i\right)\sum_{\nu=0}^n \mathopen|\delta_\nu\mathclose|.\cr } $$

Beweis: Für $\nu=0,\ldots,n-1$ ist

$$ \eqalign{ \mathopen| \hat\delta_{\nu+\ell} \mathclose| &= \left| \varphi(\hat u_{\nu+\ell},\ldots,\hat u_\nu) - \varphi(u_{\nu+\ell},\ldots,u_\nu) \right|\cr &\le K_0 \mathopen|\delta_\nu\mathclose| + K_1 \mathopen|\delta_{\nu+1}\mathclose| + \cdots + K_\ell \mathopen|\delta_{\nu+\ell}\mathclose|.\cr } $$

Sei jetzt, in einer nicht zu Mißverständnissen führenden Doppelbezeichnung, zur Schreibvereinfachung gesetzt $\delta_\nu \gets \mathopen|\delta_\nu\mathclose|$ und $\hat\delta_\nu \gets \mathopen|\hat\delta_\nu\mathclose|$, d.h. die Betragsstriche werden einfach weggelassen. Nun ist hiermit

$$ %\setbox1=\hbox{$\displaystyle{K_1\left(\delta_1+\cdots+\delta_{n-\ell+2} \right)}$} %\dimen1=\wd1 \eqalign{ \sum_{\nu=0}^{n-\ell} \hat\delta_\nu &\le \left(K_0\delta_0+\cdots+K_\ell\delta_\ell\right)+ \left(K_0\delta_1+\cdots+K_\ell\delta_{\ell+1}\right)+\cdots+ \left(K_0\delta_{n-\ell+1}+\cdots+K_\ell\delta_n\right)\cr &= K_0\left(\delta_0+\cdots+\delta_{n-\ell+1}\right)\cr & \qquad+K_1\left(\delta_1+\cdots+\delta_{n-\ell+2} \right)\cr & \qquad\qquad+\qquad\cdots\cr & \qquad\qquad\qquad+K_\ell\left(\delta_\ell+\cdots+\delta_n\right)\cr } $$

Summation und Abschätzung bzgl. der Spalten im obigen Schema zeigt sofort die erste Abschätzung, wenn man die allerletzte Spalte mit $K_\ell$ und $\delta_n$ gesondert behandelt. Die weiteren behaupteten Ungleichungen ergeben sich sofort aus der ersten. ☐

Zur handlichen Notation der im folgenden Hauptsatz auftauchenden Stabilitätsfunktionale seien die folgenden abkürzenden Bezeichnungen eingeführt. Es war

$$ P_1 := \pmatrix{I&0&\ldots&0\cr} \in\mathbb{C}^{k\times k\ell},\qquad R_1 := \pmatrix{0\cr \vdots\cr 0\cr I\cr} \in\mathbb{C}^{k\ell\times k}. $$

und die erste Begleitmatrix lautet $(I=I_{k\times k})$

$$ C_1 := \pmatrix{ 0 & I & 0 & \ldots & 0\cr 0 & 0 & I & \ldots & 0\cr \vdots & \vdots & \vdots & \ddots & \vdots\cr & & & \ldots & I\cr -A_0 & -A_1 & & \ldots & -A_{\ell-1}\cr} \in\mathbb{C}^{k\ell\times k\ell}. $$

Desweiteren sei

$$ \bopo := \mathop{\rm diag}_{\nu=0}^N P_1 = \pmatrix{ I & 0 & \ldots & 0 &&&&&&&&&\cr & & & & I & 0 & \ldots & 0 &&&&&\cr & & & & & & & & \ddots &&&&\cr & & & & & & & & & I & 0 & \ldots & 0\cr} \in\mathbb{C}^{(N+1)k\times(N+1)k\ell}, $$

und

$$ \boro := \mathop{\rm diag}\left(I_{k\ell\times k\ell},{\mskip 3mu} \mathop{\rm diag}_{\nu=1}^N R_1\right) = \pmatrix{ I_{k\ell\times k\ell} &&&\cr & 0 &&\cr & \vdots &&\cr & 0 &&\cr & I &&\cr & & \ddots &\cr & && 0\cr & && \vdots\cr & && 0\cr & && I\cr} \in\mathbb{C}^{(N+1)k\ell\times(N+\ell)k}, $$

wobei

$$ \bfR := \mathop{\rm col}_{\nu=0}^{N+\ell-1} r_\nu = \pmatrix{r_0\cr \vdots\cr r_{N+\ell-1}\cr} \in\mathbb{C}^{(N+\ell)k}. $$

Für das Produkt gilt: $[C_1]^{-1} \boro \in \mathbb{C}^{(N+1)k\ell \times (N+\ell)k}$. Es sei $(X,T,Y)$ ein beliebiges Standard-Tripel. Weiter sei

$$ \ovbf X := \mathop{\rm diag}_{\nu=0}^N X = \pmatrix{ X&&&\cr &X&&\cr &&\ddots&\cr &&&X\cr} $$

und

$$ \ovbf Y := \mathop{\rm diag}\left[\left(\dcol_{i=0}^{\ell-1} XT^i\right)^{-1}, \mathop{\rm diag}_{\nu=1}^N Y\right] = \pmatrix{ \left(\mathop{\rm col}_{i=0}^{\ell-1} XT^i\right)^{-1} &&&\cr &Y&&\cr &&\ddots&\cr &&&Y\cr}. $$

Es ist, aufgrund der Biorthogonalitätsbeziehung,

$$ \left( \mathop{\rm col}_{i=0}^{\ell-1} XT^i \right)^{-1} = \left( \mathop{\rm row}_{i=0}^{\ell-1} T^iY \right) B, $$

mit der Block-Hankel-Matrix $B$ zu

$$ B = \pmatrix{ A_1 & \ldots & A_\ell\cr \vdots & \unicode{x22F0} & \cr A_\ell & & \cr }, \qquad A_\ell = I. $$

Die Sonderbehandlung der Blockmatrix bei $\boro$ und $\ovbf Y$ in dem ersten “Diagonalelement” hat seinen Ursprung in der Lösungsdarstellung einer Differenzengleichung für Matrixpolynome der Form

$$ x_n = XJ^n\left(\mathop{\rm col}_{i=0}^{\ell-1} XJ^i\right)\pmatrix{y_0\cr \vdots\cr y_{\ell-1}\cr} + X \sum_{\nu=0}^{n-1} J^{n-1-\nu} Y y_{\nu+\ell}. $$

Für den Fall $\ell=1$, also $\rho(\mu)=I\mu-A$ reduzieren sich $P_1$ und $R_1$ zu Einheitsmatrizen der Größe $n\times n$. Die Biorthogonalitätsbeziehung schrumpft zu $X=Y^{-1}$ oder $X^{-1}=Y$.

4. Stabilitätsfunktionale für feste Schrittweiten

Man vgl. Peter Albrecht, "Die numerische Behandlung gewöhnlicher Differentialgleichungen: Eine Einführung unter besonderer Berücksichtigung zyklischer Verfahren", 1979. Sowie Peter Albrecht, 1985.

Zuerst sei zur Übersichtlichkeit ein Teil des Beweises des nachfolgenden Hauptsatzes nach vorne gezogen. Später wird dieser Hilfssatz erweitert. Es gibt noch weitere Äquivalenzen zwischen Stabilitätsfunktionalen.

1. Hilfssatz: Voraussetzung: Es sei $C_1^i := {\bf0} \in \mathbb{C}^{k\ell\times k\ell}$, falls $i<0$.

Behauptung: Das verkürzte Stabilitätsfunktional ist mit dem ursprünglichen es erzeugenden Stabilitätsfunktional normmässig äquivalent, d.h. es gilt

$$ \left| [C_1]^{-1} \boro \bfR \right| \sim \left| \bopo [C_1]^{-1} \boro \bfR \right|. $$

Beweis: Der Beweis wird in zwei Teile aufgespalten. Man schätzt beide Stabilitätsfunktionale gegeneinander ab. Die Abschätzung $\left|\bopo [C_1]^{-1} \boro \bfR\right| \le \left|\bopo\right| \left|[C_1]^{-1} \boro \bfR\right|$ ist klar, wobei die Zeilensummennorm von $\bopo$ unabhängig von $N$ ist. Die andere Abschätzungsrichtung berücksichtigt das Verhalten der Begleitmatrix $C_1$ intensiver. Man vergleiche hier auch die beiden nachstehenden Beispiele zur Verdeutlichung des “quasi-nilpotenten” Charakters der Potenzen der Matrizen $C_1$. Man benutzt

$$ \left| C_1^n z_0 \right| \le \left| C_1^{\ell-1} \right| {\mskip 3mu} \left| C_1^{n-\ell+1} z_0 \right| = \left| C_1^{\ell-1} \right| {\mskip 3mu} \max_{i=0}^{\ell-1} \left| P_1 C_1^{n+i-\ell+1} z_0 \right|, $$

wegen

$$ \left| C_1^n z_0 \right| = \max_{i=0}^{\ell-1} \left| P_1 C_1^{n+i} z_0 \right|. $$

Diese letzte Identität hat ihre Wurzel in der eben genannten “quasi-nilpotenten” Eigenschaft der Begleitmatrix $C_1$. Das Herausziehen von $C_1^{\ell-1}$ ist zulässig, da bei der $\sup$-Norm bei $\left| \bopo [C_1]^{-1} \boro \bfR \right|$ weiterhin über alle Zeilen das Supremun gebildet wird. Es geht kein Wert bei der Supremunsbildung verloren. Schließlich

$$ \left| C_1^{n-1-\nu} R_1 r_{\nu+\ell} \right| \le \left| C_1^{\ell-1} \right| {\mskip 3mu} \left| C_1^{n-\ell-\nu} R_1 r_{\nu+\ell} \right| = \left| C_1^{\ell-1} \right| {\mskip 3mu} \max_{i=0}^{\ell-1} \left| P_1 C_1^{n-\ell-\nu+i} R_1 r_{\nu+\ell} \right|, $$

wegen $r_\nu := 0$, für $\nu>N$. ☐

2. Beispiel: Sei $\ell=2$ und sei $N:=n:=3$. Es ist $\rho(\mu)=I\mu^2+A_1\mu+A_0\in\mathbb{C}^{k\times k}$ und die Potenzen der ersten Begleitmatrix $C_1$ lauten $C_1^\nu$, für $\nu=1,\ldots,N$:

$$ C_1 = \pmatrix{ 0 & I\cr -A_0 & -A_1\cr},\quad C_1^2 = \pmatrix{ -A_0 & -A_1\cr A_1A_0 & -A_0+A_1^2\cr},\quad C_1^3 = \pmatrix{ A_1A_0 & -A_0+A_1^2\cr A_0^2-A_1^2A_0 & A_0A_1+A_1A_0-A_1^3\cr}. $$

Es war

$$ P_1 = \pmatrix{I&0\cr} \in\mathbb{C}^{k\times 2k},\qquad R_1 = \pmatrix{0\cr I\cr}\in\mathbb{C}^{2k\times k}. $$

Die Matrizen $\bopo$ und $\boro$ haben das Aussehen

$$ \bopo = \pmatrix{ I&0 && &&\cr && I&0 &&\cr && && I&0\cr}\in\mathbb{C}^{3k\times6k},\qquad \boro = \pmatrix{ I&&&\cr &I&&\cr &&0&\cr &&I&\cr &&&0\cr &&&I\cr}\in\mathbb{C}^{6k\times4k}. $$

Man berechnet

$$ [C_1]^{-1} \boro = \pmatrix{ I & \cr C_1 & R_1 &\cr C_1^2 & C_1R_1 & R_1 &\cr C_1^3 & C_1^2R_1 & C_1R_1 & R_1\cr } \in \mathbb{C}^{8k\times5k} $$

$$ \begin{pmatrix} \matrix{I&0\cr 0&I\cr} &\\[1em] %\noalign{\vskip 9pt} \matrix{0&I\cr -A_0&-A_1\cr} & \matrix{0\cr I\cr} &\\[1em] %\noalign{\vskip 9pt} \matrix{-A_0&-A_1\cr A_1A_0&-A_0+A_1\cr} & \matrix{I\cr -A_1\cr} & \matrix{0\cr I\cr} &\\[1em] %\noalign{\vskip 9pt} \matrix{A_1A_0&-A_0+A_1^2\cr A_0^2-A_1^2A_0&A_0A_1+A_1A_0-A_1^3\cr}& \matrix{-A_1\cr -A_0+A_1^2\cr} & \matrix{I\cr -A_1\cr} & \matrix{0\cr I\cr} \cr \end{pmatrix} $$

An einer weiteren Demonstration ersieht man das sehr schnelle “Großwerden” der überstrichenen Matrizen.

3. Beispiel: Es sei nun $\ell=3$ und $N=3$. Es ist $\rho(\mu)=I\mu^3+A_2\mu^2+A_1\mu+A_0 \in \mathbb{C}^{k\times k}$. Nun berechnet man $C_1^\nu$ für $\nu=1,\ldots,N$:

$$ C_1 = \pmatrix{ 0 & I & 0\cr 0 & 0 & I\cr -A_0 & -A_1 & -A_2\cr},\qquad C_1^2 = \pmatrix{ 0 & 0 & I\cr -A_0 & -A_1 & -A_2\cr A_2A_0 & -A_0+A_2A_1 & -A_1+A_2^2\cr} $$

und

$$ C_1^3 = \pmatrix{ -A_0 & -A_1 & -A_3\cr A_2A_0 & -A_0+A_2A_1 & -A_1+A_2^2\cr -A_2A_0^2 & A_2A_0+A_1^2-A_2A-1 & -A_0+A_1A_2+A_2A_1-A_2^3\cr}. $$

Weiter ist $\bopo\in\mathbb{C}^{4k\times4k\ell}$ und $\boro\in\mathbb{C}^{4k\ell\times6k}$ mit

$$ \bopo = \pmatrix{ I&0&0 &&& &&& &&&\cr &&& I&0&0 &&& &&&\cr &&& &&& I&0&0 &&&\cr &&& &&& &&& I&0&0\cr},\qquad \boro = \pmatrix{ I && &&&\cr & I & &&&\cr && I &&&\cr &&& 0 &&\cr &&& 0 &&\cr &&& I &&\cr &&& & 0 &\cr &&& & 0 &\cr &&& & I &\cr &&& && 0\cr &&& && 0\cr &&& && I\cr}. $$

Nun ist $[C_1]^{-1} \boro \in \mathbb{C}^{12k\times6k}$ mit

$$ [C_1]^{-1} \boro = \pmatrix{ I &&&\cr C_1 & R_1 &&\cr C_1^2 & C_1R_1 & R_1 &\cr C_1^3 & C_1^2R_1 & C_1R_1 & R_1 } \in \mathbb{C}^{4k\ell\times6k}, $$

also

$$ \begin{pmatrix} \matrix{I&&\cr &I&\cr &&I\cr} &&&\\[1em] %\noalign{\vskip 9pt} \matrix{0 & I & 0\cr 0 & 0 & I\cr -A_0 & -A_1 & -A_2\cr} & \matrix{0\cr 0\cr I\cr}&&\\[1em] %\noalign{\vskip 9pt} \matrix{ 0 & 0 & I\cr -A_0 & -A_1 & -A_2\cr A_2A_0 & -A_0+A_2A_1 & -A_1+A_2^2\cr} & \matrix{0\cr I\cr -A_2\cr} & \matrix{0\cr 0\cr I\cr} &\\[1em] %\noalign{\vskip 9pt} \matrix{ -A_0 & -A_1 & -A_3\cr A_2A_0 & -A_0+A_2A_1 & -A_1+A_2^2\cr -A_2A_0^2 & A_2A_0+A_1^2-A_2A-1 & -A_0+A_1A_2+A_2A_1-A_2^3\cr} & \matrix{I\cr -A_2\cr -A_1+A_2^2\cr} & \matrix{0\cr I\cr -A_2\cr} & \matrix{0\cr 0\cr I\cr}\cr \end{pmatrix} $$

Das zugrunde liegende Schema ist hier

$$ \begin{matrix} \matrix{1&1&1\cr 2&2&2\cr 3&3&3\cr}\\[1em] %\noalign{\vskip 9pt} \matrix{2&2&2\cr 3&3&3\cr 4&4&4\cr}\\[1em] %\noalign{\vskip 9pt} \matrix{\vdots & \vdots & \vdots\cr}\cr \end{matrix} $$

Es folgt nun der angekündigte Hauptsatz, aus dem sich leicht ein entsprechender Konvergenzsatz für sehr allgemeine Diskretisierungsverfahren ableiten lässt. Desweiteren zeigt der Satz mehrere Querverbindungen zwischen verschiedenen Stabilitätsfunktionalen auf. In gewissen Situationen hat jedes der vorkommenden Funktionale seine spezifischen Vor- und Nachteile, und es lohnt sich mehrere Darstellungen, oder äquivalente Funktionale zur Verfügung zu haben. Insbesondere sollte jede der Darstellungen in gegenseitiger Befruchtung gepflegt werden. Später werden noch zwei andere Darstellungen hinzukommen, die bei gewissen Untersuchungen abermals vereinfachend wirken.

4. Hauptsatz: Voraussetzungen: $(P_1,C_1,R_1)$ sei das erste Begleiter-Tripel zum Matrixpolynom

$$ \rho(\mu) := I\mu^\ell+A_{\ell-1}\mu^{\ell-1}+\cdots+A_1\mu+A_0, $$

vom Grade $\ell\ge1$. Die Funktion $\varphi$ sei Lipschitz-stetig in jeder Komponente mit den Lipschitzkonstanten $K_i$, also

$$ \left|\varphi(u_\ell,\ldots,\hat u_i,\ldots,u_0)- \varphi(u_\ell,\ldots,u_i,\ldots,u_0)\right| \le K_i\cdot\left|\hat u_i-u_i\right|,\quad\hbox{für}\quad i=0,\ldots,\ell. $$

Die Potenzen der Matrix $C_1$ seien beschränkt durch die obere Schranke $D$, also $\left|C_1^\nu\right|\le D$, $\forall\nu\in\mathbb{N}$. Seien $\xi$ und $\hat\xi$ definiert durch

$$ \xi := \left|P_1\right| D \left|R_1\right| K_\ell,\qquad \hat\xi := \left|P_1\right| D \left|R_1\right| \left(\sum_{i=0}^\ell K_i\right). $$

Die Größe $\hat\xi$ ist eine Funktion von mehreren Veränderlichen, es ist $\hat\xi=\hat\xi(P_1,D,R_1,K_0,\ldots,K_\ell)$. Es sei $\mathopen|b-a\mathclose|\ne0$. Die Schrittweite $h$ sei so gewählt, daß erstens $\mathopen|b-a\mathclose| / \mathopen|h\mathclose|$ natürlich ist und zweitens gleichzeitig gilt

$$ \mathopen|h\mathclose| \lt \cases {1/\xi,&falls $\xi\gt 0$;\cr \infty,&falls $\xi=0$.\cr} $$

und $N$ sei implizit definiert durch $N\mathopen|h\mathclose| = \mathopen|b-a\mathclose|$.

Behauptungen: (1) Beide (möglicherweise) impliziten Differenzengleichungen

$$ \eqalign{ \hat u_{n+\ell}+A_{\ell-1}\hat u_{n+\ell-1}+\cdots+A_0\hat u_n &= h{\mskip 3mu}\varphi(\hat u_{n+\ell},\ldots,\hat u_n)+r_{n+\ell}\cr u_{n+\ell}+A_{\ell-1}u_{n+\ell-1}+\cdots+A_0u_n &= h{\mskip 3mu}\varphi(u_{n+\ell},\ldots,u_n)\cr } $$

besitzen für jedes $n$, eine eindeutig bestimmte Lösung $u_{n+\ell}$ bzw. $\hat u_{n+\ell}$, die man mit Picard-Iteration berechnen kann.

(2) Für die maximale normmässige Abweichung $\left|\hat u_n-u_n\right|$ gilt die beidseitige Abschätzung bzgl. der additiven Störglieder $r_n$, wie folgt

$$ c_1 \left| \bopo [C_1]^{-1} \boro \bfR \right| \le \left| \hat U-U\right| \le c_2 \left| \bopo [C_1]^{-1} \boro \bfR \right| \le c_3 N \left| \bfR \right| . $$

(3) Die positiven Konstanten $c_i$, für $i=1,2,3$, sind gegeben durch

$$ c_1 = {1\over1+\hat\xi\mathopen|b-a\mathclose|},\qquad c_2 = {1\over1-\mathopen|h\mathclose|\xi} \exp{\hat\xi\mathopen|b-a\mathclose| \over 1-\mathopen|h\mathclose|\xi},\qquad c_3 = c_2 \left|P_1\right| D \left|R_1\right|. $$

(4) Die Abschätzung bei (3) ist unabhängig von der Wahl des Standard-Tripels, d.h. es gilt

$$ \bov X1 [T_1]^{-1} \bovy1 \bfR = \bov X2 [T_2]^{-1} \bovy2 \bfR, $$

für zwei beliebige Standard-Tripel $(X_1,T_1,Y_1)$ und $(X_2,T_2,Y_2)$ zum Matrixpolynom $\rho$.

(5) Das verkürzte Funktional $\left|[C_1]^{-1} \boro \bfR \right|$ ist ebenfalls Stabilitätsfunktional und zum unverkürzten Funktional äquivalent, unabhängig von $N$, d.h. es gilt

$$ \left| \bopo [C_1]^{-1} \boro \bfR \right| \sim \left| [C_1]^{-1} \boro \bfR \right|. $$

(6) Verkürzte Stabilitätsfunktionale sind bei Wechsel des Standard-Tripels untereinander äquivalent, jedoch nicht notwendig mehr gleich. Es gilt

$$ \left| [T_1]^{-1} \bovy1 \bfR \right| \sim \left| [T_2]^{-1} \bovy2 \bfR \right|. $$

Beweis: Zur Abkürzung werde wieder benutzt

$$ \delta_{n+\ell} := \hat u_{n+\ell} - u_{n+\ell},\qquad \hat\delta_{n+\ell} := \varphi(\hat u_{n+\ell},\ldots,\hat u_n) - \varphi(u_{n+\ell},\ldots,u_n). $$

Zu (1): Beide Differenzengleichungen stellen für jedes $n$ eine Lipschitz-stetige Keplersche Gleichung in $\hat u_{n+\ell}$ bzw. $u_{n+\ell}$ dar. Die Fixpunktgleichungen bzgl. $\hat F$ und $F$, mit

$$ \hat u_{n+\ell} = \hat F(\hat u_{n+\ell} := h\varphi(\hat u_{n+\ell},\ldots{\mskip 5mu})+\hat\psi, \qquad\hbox{bzw.}\qquad u_{n+\ell} = F(u_{n+\ell}) := h\varphi(u_{n+\ell},\ldots{\mskip 5mu})+\psi, $$

sind kontrahierend, falls $\mathopen|h\mathclose| K_\ell < 1$. Durch die oben vorausgesetzte Einschränkung an $h$, nämlich $\mathopen|h\mathclose|\xi<1$, ist diese hinreichende Bedingung für Kontraktion erfüllt. Auf einem geeigneten vollständigen Teilraum, lässt sich dann Existenz und Eindeutigkeit eines Fixpunktes deduzieren.

Zu (2): a) Nach dem Hilfssatz über die Darstellung der Differenz der Lösung zweier Differenzengleichungen (siehe Darstellungssatz), folgt sofort durch Umstellung, die Abschätzungskette

$$ \eqalignno{ \left|P_1 C_1^n \pmatrix{r_0\cr \vdots\cr r_{\ell-1}\cr} + P_1 \sum_{\nu=0}^{n-1} C_1^{n-1-\nu} R_1 r_{\nu+\ell}\right| &\le \mathopen|\delta_n\mathclose| + \mathopen|h\mathclose| \left|P_1\right| D \left|R_1\right| \sum_{\nu=0}^{n-\ell} \left|\hat\delta_{\nu+\ell}\right| \cr &\le \mathopen|\delta_n\mathclose| + \mathopen|h\mathclose| \left|P_1\right| D \left|R_1\right| \left(\sum_{i=0}^\ell K_i\right) \sum_{\nu=0}^{n-1} \mathopen|\delta_\nu\mathclose| \cr &\le \mathopen|\delta_n\mathclose| + \mathopen|b-a\mathclose| \left|P_1\right| D \left|R_1\right| \left(\sum_{i=0}^\ell K_i\right) \sup_{\nu=0}^{n-1} \mathopen|\delta_\nu\mathclose| \cr &\le \sup_{\nu=0}^n \mathopen|\delta_\nu\mathclose| + \mathopen|b-a\mathclose| \left|P_1\right| D \left|R_1\right| \left(\sum_{i=0}^\ell K_i\right) \sup_{\nu=0}^n \mathopen|\delta_\nu\mathclose| \cr &= \left( 1+\hat\xi\mathopen|b-a\mathclose| \right) \sup_{\nu=0}^n \mathopen|\delta_\nu\mathclose| . \cr } $$

Hierbei wurde die Abschätzung

$$ \sum_{\nu=0}^{n-\ell} \left|\hat\delta_{\nu+\ell}\right| \le K_\ell \mathopen|\delta_n\mathclose| + \left(\sum_{i=0}^{\ell-1} K_i\right) \left(\sum_{\nu=0}^{n-1} \mathopen|\delta_\nu\mathclose|\right) $$

des Abschätzungssatzes benutzt, desweiteren die Gleichung $N\mathopen|h\mathclose| = \mathopen|b-a\mathclose|$ und schließlich die Abschätzung $\sum_{\nu=0}^n \mathopen|\delta_\nu\mathclose| \le N\sup_{\nu=0}^{n-1} \mathopen|\delta_\nu\mathclose|$. Durchmultiplikation mit

$$ { 1 \over 1 + \hat\xi \mathopen|b-a\mathclose| } $$

liefert die erste Ungleichung der Behauptung (2), wobei sich entsprechend die Konstante $c_1$ ergibt.

b) Wiederum nach dem Satz über die Darstellung der Differenz der Lösungen zweier Differenzengleichungen (siehe Darstellungssatz), folgt sofort beim Übergang zu Normen

$$ \eqalign{ \mathopen|\delta_n\mathclose| &\le \mathopen|h\mathclose|{\mskip 3mu}\left|P_1\right| D \left|R_1\right| \sum_{\nu=0}^{n-\ell} \left|\hat\delta_{\nu+\ell}\right| + \left| P_1 C_1^n \pmatrix{r_0\cr \vdots\cr r_{\ell-1}\cr} + P_1 \sum_{\nu=0}^{n-1} C_1^{n-1-\nu} R_1 r_{\nu+\ell} \right| \cr &\le \mathopen|h\mathclose| {\mskip 3mu} \underbrace{ \left|P_1\right| D \left|R_1\right| \left( \sum_{i=0}^\ell K_i \right) }_{\displaystyle{{}=\hat\xi}} \sum_{\nu=0}^{n-1} \mathopen|\delta_\nu\mathclose| + \mathopen|h\mathclose| {\mskip 3mu} \underbrace{ \left|P_1\right| D \left|R_1\right| K_\ell } _{\displaystyle{{}=\xi}} \mathopen|\delta_n\mathclose| + \left| \bopo [C_1]^{-1} \boro \bfR \right|, \cr } $$

wobei wieder der Abschätzungssatz benutzt wurde, also durch Umformung

$$ \left(1-\mathopen|h\mathclose|\xi\right) \mathopen|\delta_n\mathclose| \le \mathopen|h\mathclose|\hat\xi \sum_{\nu=0}^{n-1} \mathopen|\delta_\nu\mathclose| + \left| \bopo [C_1]^{-1} \boro \bfR \right|. $$

Wegen der Voraussetzung an $h$, nämlich $\mathopen|h\mathclose|<1/\xi$, ist $1-\mathopen|h\mathclose|\xi>0$. Mit Hilfe des diskreten Lemmas von Gronwall, wobei man in der dort angegebenen Bezeichnung setzt

$$ \varepsilon_{j+1} \gets \mathopen|\delta_n\mathclose|,\qquad \eta_j \gets {\left|\bopo[C_1]^{-1}\boro\bfR\right| \over1-\mathopen|h\mathclose|\xi}, \qquad \delta \gets {\hat\xi \over 1-\mathopen|h\mathclose|\xi},\qquad h_\nu \gets \mathopen|h\mathclose|, $$

erhält man jetzt die Abschätzung

$$ \mathopen|\delta_n\mathclose| \le {\left|\bopo[C_1]^{-1}\boro\bfR\right| \over 1-\mathopen|h\mathclose|\xi} \exp {\hat\xi\mathopen|b-a\mathclose| \over 1-\mathopen|h\mathclose|\xi}. $$

Anhand dieser Darstellung ersieht man auch das Zustandekommen der Konstanten $c_2$. Die Konstante $c_3$ ergibt sich sofort durch typische Abschätzungen.

Zu (4): Das Standard-Tripel $(X_1,T_1,Y_1)$ ist ähnlich zum Standard-Tripel $(X_2,T_2,Y_2)$ genau dann, wenn

$$ X_2=X_1S,\qquad T_2=S^{-1}T_1S,\qquad Y_2=S^{-1}Y_1, $$

oder

$$ X_1 = X_2 S^{-1}, \qquad T_1 = S T_2 S^{-1}, \qquad Y_1 = S Y_2. $$

Nun ist

$$ \eqalignno{ \bov X1 [T_1]^{-1} \bovy1 \bfR &= \left(\mathop{\rm diag}_{\nu=0}^N X_1\right) [T_1]^{-1} \mathop{\rm diag}\left[ \left(\drow_{i=0}^{\ell-1} T_1^iY\right) B, {\mskip 3mu} \mathop{\rm diag}_{\nu=1}^N Y_1 \right] \bfR \cr &= \left(\mathop{\rm diag}_{\nu=0}^N(X_2S^{-1})\right) [ST_2S^{-1}]^{-1} \mathop{\rm diag}\left\{ \drow_{i=0}^{\ell-1}\left[ \left(ST_2S^{-1}\right)^i SY_2\right] B, {\mskip 3mu} \mathop{\rm diag}_{\nu=1}^N (SY_2) \right\} \bfR \cr &= \left(\mathop{\rm diag}_{\nu=0}^N X_2\right) [T_2]^{-1} \mathop{\rm diag}\left[ \left(\drow_{i=0}^{\ell-1} T_2^iY\right) B, {\mskip 3mu} \mathop{\rm diag}_{\nu=1}^N Y_2 \right] \bfR \cr &= \bov X2 [T_2]^{-1} \bovy2 \bfR. \cr } $$

Hierbei wurden die Recheneigenschaften der Operatoren $\mathop{\rm diag}$, $\mathop{\rm row}$ und $[\cdot]$ benutzt.

Zu (5): Dies wurde im vorausgeschickten Hilfssatz bewiesen.

Zu (6): Mit der gleichen Notation wie beim Beweis zu (4) rechnet man

$$ \eqalign{ [T_1]^{-1} \bovy1 \bfR &= [ST_2S^{-1}]^{-1} \mathop{\rm diag}\left\{ \drow_{i=0}^{\ell-1}\left[ \left(ST_2S^{-1}\right)^i SY_2\right] B, {\mskip 3mu} \mathop{\rm diag}_{\nu=1}^N SY_2 \right\} \bfR \cr &= \left(\mathop{\rm diag}_{\nu=0}^N S\right) [T_2]^{-1} \mathop{\rm diag}\left[ \drow_{i=0}^{\ell-1} \left(T_2^iY_2\right) B,{\mskip 3mu} \mathop{\rm diag}_{\nu=1}^N Y_2 \right] \bovy2 \bfR \cr &= \left(\mathop{\rm diag}_{\nu=0}^N S\right) [T_2]^{-1} \bovy2 \bfR . \cr } $$

Durch Multiplikation von links mit $\mathop{\rm diag}_{\nu=0}^N S^{-1}$ folgt sofort

$$ [T_2]^{-1} \bovy2 \bfR = \left(\mathop{\rm diag}_{\nu=0}^N S^{-1}\right) [T_1]^{-1} \bovy1 \bfR. $$

Damit sind beide verkürzten Funktionale äquivalent. ☐

5. Bemerkungen: Zur Voraussetzung: Aufgrund der Einschränkung der Schrittweite $h$ in Abhängigkeit der Konstanten $\xi$, ist das Ergebnis nur von praktischer Bedeutung bei kurzen Integrationsintervallen und nicht-steifen Differentialgleichungsproblemen, also Problemen mit kleiner Lipschitzkonstante. Bei steifen Problemen werden die Konstanten $\xi$, $\hat\xi$, $c_1$, $c_2$, $c_3$ schnell unangemessen groß. Die Konstanten $\xi$ und $\hat\xi$ enthalten direkt als multiplikativen Faktor die obere Schranke $D$ für die Matrixpotenzen. $\xi$ seinerseits geht exponentiell in die Abschätzung ein. Die Aufspaltung in zwei sehr ähnliche Konstanten $\xi$ und $\hat\xi$ geschieht nur, weil $\xi$ i.a. kleiner ist als $\hat\xi$ und damit schärfere Schranken liefert. Man könnte mit $\hat\xi$ alleine auskommen. Dabei würde man $\xi$ vollständig durch $\hat\xi$ ersetzen. $\hat\xi$ lässt sich wiederum ersetzen durch $\left|P_1\right| D \left|R_1\right| \left(\ell+1\right) \left(\max_{i=0}^\ell K_i\right)$, man vergl. hier den entsprechenden Hilfssatz mit den diesbezüglichen Abschätzungen, siehe Abschätzungssatz. Erfüllt die erste Begleitmatrix $C_1$ die Bedingung $\left|C_1^\nu\right|\le D$, $\forall\nu\in\mathbb{N}$, so auch jede zu $C_1$ ähnliche Matrix, allerdings mit u.U. verändertem $D$.

Numerische Ergebnisse von Tischer/Sacks-Davis (1983)3 und Tischer/Gupta (1985)2 zeigen, daß selbst bei steifen Differentialgleichungen das Stabilitätsfunktional die richtige Konvergenzordnung anzeigt und dies obwohl $\mathopen|h\mathclose| \xi < 1$, verletzt ist. Autoren Peter E. Tischer, Ron Sacks-Davis und Gopal K. Gupta. Dies deutet darauf hin, daß die Ergebnisse des Hauptsatzes allgemeiner gelten als der den Hauptsatz tragende Beweis.

Kerninhalt der Beweise im Hauptsatz sind der Darstellungssatz, das diskrete Lemma von Gronwall und die Abschätzungen im Hilfssatz 6. Die obere Schranke $D$ für die Matrixpotenzen $\left|C_1^i\right|$ hängt u.U. ab von der Dimension der zugrundeliegenden Differentialgleichung $\dot y=f(t,y)$, aufgrund der Beziehung $\left|C_1\otimes I\right|=\left|C_1\right|$, falls die zur Maximumnorm kompatible Zeilensummennorm verwendet wird. Die Lipschitzkonstanten $K_i$ sind abhängig von der Lipschitzkonstante von $f$.

Zu (1): Die Lösungen der möglicherweise impliziten Differenzengleichungen müssen nicht mit Picard-Iteration berechnet werden. Ebenso gut kann das Newton-Raphson-Iterationsverfahren oder das Newton-Kantorovich-Iterationsverfahren benutzt werden. Der Startfehlersatz von Liniger liefert eine obere Schranke für die Anzahl der nötigen Iterationen. Es zeigt sich, daß eine einzige Iteration vielfach vollkommen ausreicht. Weitere Iterationen schaffen keinerlei Verbesserungen an denen man interessiert ist. Für nicht genügend kleine Schrittweiten $\mathopen|h\mathclose|$ können in der Tat die beiden angegebenen Differenzengleichungen und damit die entsprechenden Keplerschen Gleichungen keine oder mehr als eine Lösung besitzen.

Zu (2): Die Stabilitätsfunktionale $\left|\bopo [C_1]^{-1} \bopo \bfR\right|$, $\left|[C_1]^{-1} \boro \bfR\right|$ und $\left|\bfR\right|$ sind unabhängig von $\varphi(\cdot)$, und unabhängig von den Lipschitzkonstanten $K_i$, jedoch abhängig von $N$ und damit letztlich abhängig von $h$ und/oder der Länge des Integrationsintervalles $\mathopen|b-a\mathclose|$. Die Konstanten $c_1$, $c_2$, $c_3$ hängen von den Lipschitzkonstanten $K_i$ ab. Da bei dem Hauptsatz allerdings als zentrale Voraussetzung die Lipschitzkonstanten eingehen und beim obigen Beweis auch benötigt werden, hängen in soweit auch die Funktionale hiervon ab. Das durch die Konstante $c_2$ induzierte exponentielle Wachstum kann bei den gegebenen Voraussetzungen des Hauptsatzes nicht so ohne weiteres verbessert werden, wie z.B. die beiden Differentialgleichungen $\dot y=0$ und $\dot y=y$ mit $y(0)=1$ zeigen, wenn man das explizite Eulerverfahren $y_{n+1}=y_n+hf_n$ anwendet. Daß hierdurch auch das qualitative Verhalten gänzlich überschätzt werden kann, zeigen die beiden Differentialgleichungen $\dot y=0$ und $\dot y=-y$, mit $y(0)=1$, wenn man das implizite Eulerverfahren $y_{n+1}=y_n+hf_{n+1}$ verwendet. Dieses Verhalten ist schon beim kontinuierlichen Lemma von Gronwall und den hieraus sich ableitenden Sätzen wohl bekannt. Dort sind allerdings auch Sätze bekannt, die dieses falsche Voraussagen des qualitativen Verhaltens vermeiden, siehe Hairer/Wanner/N\o rsett (1987). {Hairer, Ernst}{Wanner, Gerhard}_{N\o rsett, Syvert Paul}% Hier benutzt man u.a. die logarithmische Norm $\mu$ definiert zu

$$ \mu(A) := \lim_{\varepsilon\to0,{\mskip 3mu}\varepsilon\gt 0} {\left|I+\varepsilon A\right| - 1 \over \varepsilon}, A \in \mathbb{C}^{k\times k}. $$

Für die euklidische-, Maximum- und die 1-Norm ergeben sich

$$ \eqalignno{ \mu(A) &= \lambda_{\rm max}, \qquad \lambda_{\rm max} \hbox{ größter Eigenwert von } {1\over2}(A^\top+A),\cr \mu(A) &= \max_{k=1}^n \left(\mathop{\rm Re}\nolimits a_{ii} + \sum_{i\ne k} \left|a_{ik}\right|\right),\cr \mu(A) &= \max_{i=1}^n \left(\mathop{\rm Re}\nolimits a_ii + \sum_{k\ne i} \left|a_{ki}\right|\right). } $$

Zu (4): Die Aussage (4) zeigt, daß das Stabilitätsfunktional unabhängig von der Basisdarstellung und Basiswahl ist. Die Abschätzungen sind invariant unter der Wahl des Standard-Tripels. Vielfach geeignet ist das Stabilitätsfunktional zum Jordan-Tripel, zum ersten Begleiter-Tripel $(P_1,C_1,R_1)$ oder zum zweiten Begleiter-Tripel $(P_1B^{-1},C_2,BR_1)$, mit der Block-Hankel-Matrix zu

$$ B := \pmatrix{ A_1 & A_2 & \ldots & A_{\ell-1} & I\cr A_2 & \unicode{x22F0} & \unicode{x22F0} & \unicode{x22F0} & \cr \vdots & \unicode{x22F0} & \unicode{x22F0} & & \cr A_{\ell-1} & I & & & \cr I & & & & \cr}. $$

Zu (5) und (6): Gleichheit beider verkürzter Stabilitätsfunktionale ist nicht mehr zu erwarten. Desweiteren erkennt man, daß die Rechtseigenvektoren, also die Spalten in $X_1$ bzw. $X_2$ (falls einer der beiden zu einem Jordan-Tripel gehört) keinen “kalkülmässigen Einfluß” haben, entgegen den Linkseigenvektoren. Natürlich haben die Rechtseigenvektoren Einfluß auf das Gesamtverhalten, denn ändern sich die Rechtseigenvektoren, repräsentiert durch $X$, so ändern sich sich nach der Biorthogonalitätsbeziehung

$$ \left(\mathop{\rm row}\_{i=0}^{\ell-1} T^iY\right) {\mskip 3mu} B {\mskip 3mu} \left(\mathop{\rm col}_{i=0}^{\ell-1} XT^i\right) = I_{k\ell\times k\ell}, $$

auch die Linkseigenvektoren, repräsentiert durch $Y$. Jedoch brauchen die Rechtseigenvektoren oder gar die Inverse von $\mathop{\rm col}(XT^i)$ nicht berechnet zu werden. Dies ist hier mit “kalkülmässig” unabhängig gemeint.

6. Corollar: Voraussetzung: $\xi$, $c_1$, $c_2$ wie beim Hauptsatz.

Behauptung: $\xi\to0$, $c_1\to1$, $c_2\to1$, falls $\displaystyle\max_{i=0}^\ell K_i \to 0$.

D.h. die beidseitige Ungleichungskette entartet zu einer Gleichungskette, falls alle Lipschitzkonstanten $K_\rho$ gegen Null gehen, was insbesondere bei Quadraturproblemen auftritt. Eine Anfangswertaufgabe für Differentialgleichungen enthält mit $\dot y=f(t)$, $y(a)=0$, $y(b)=>?$, das Quadraturproblem $\int_a^b f(t) dt$ als Spezialfall.

In Komponentenschreibweise liest sich der Hauptsatz wie folgt.

7. Hauptsatz: Voraussetzung: Es sei

$$ \mathopen|h\mathclose| \lt {1\over \left|P_1\right| D \left|R_1\right| K_\ell} $$

Behauptung: (2) Für die maximale normmässige Abweichung $\left|\hat u_n-u_n\right|$ gilt die beidseitige Abschätzung bzgl. der additiven Störglieder $r_n$, wie folgt

$$ \eqalign{ c_1 \sup_{n=0}^N \left| P_1 C_1^n \pmatrix{r_0\cr \vdots\cr r_{\ell-1}\cr} + P_1 \sum_{\nu=0}^{n-1} C_1^{n-1-\nu} R_1 r_{\nu+\ell} \right| &\le \sup_{n=0}^N \left|\hat u_n-u_n\right| \cr &\le c_2 \sup_{n=0}^N \left| P_1 C_1^n \pmatrix{r_0\cr \vdots\cr r_{\ell-1}\cr} + P_1 \sum_{\nu=0}^{n-1} C_1^{n-1-\nu} R_1 r_{\nu+\ell} \right| \cr &\le c_3 N \sup_{n=0}^N \left|r_n\right| . \cr } $$

(4) Die Abschätzung bei (3) ist invariant unter der Wahl eines Standard-Tripels, d.h. es gilt

$$ \displaylines{ \sup_{n=0}^N \left| \hat X \hat T^n \left(\mathop{\rm row}\_{i=0}^{\ell-1} \hat T^i \hat Y\right) B \pmatrix{r_0\cr \vdots\cr r_{\ell-1}\cr} + \hat X \sum_{\nu=0}^{n-1} \hat T^{n-1-\nu}\hat Y r_{\nu+\ell} \right| \cr {}= \sup_{n=0}^N \left| X T^n \left(\mathop{\rm row}\_{i=0}^{\ell-1} T^i Y\right) B \pmatrix{r_0\cr \vdots\cr r_{\ell-1}\cr} + X \sum_{\nu=0}^{n-1} T^{n-1-\nu} Y r_{\nu+\ell} \right| , \cr } $$

für zwei Standard-Tripel $(X,T,Y)$ und $(\hat X,\hat T,\hat Y)$.

8. Bemerkung: Zu (2): Man erkennt an der Komponentendarstellung, daß $r_n$ eingeht in das Stabilitätsfunktional, ohne von rechts “echt” mit $R_1$ (bzw. $Y$) multipliziert zu werden; $\nu=n-\ell$, für $\nu>n-\ell$, also $n-1-\nu\le\ell-2$, man denke an $P_1C_1^{n-1-\nu}R_1=0$. M.a.W. für $\nu=n-\ell$ kann $r_n$ nicht in den Kernbereich von $P_1C_1^{n-1-\nu}R_1$ gelangen. Im Falle von Diskretisierungsverfahren, wo die $r_\nu$ die lokalen Diskretisierungsfehlervektoren darstellen, hat dies zur Konsequenz: Die Ordnung in $h$ der $r_\nu$ kann nie überschritten werden. Durch Summation kann selbstverständlich eine Reduktion der Ordnung anfallen. Ist beispielsweise $r_\nu={\cal O}(h^{p+1})$, so ist $\left|\bopo [C_1]^{-1} \boro \bfR\right| = {\cal O}(h^{p+1+\varepsilon})$ mit $\varepsilon>0$ unmöglich. Für ein Diskretisierungsverfahren ist dennoch ein Ordnungssprung größer 1 möglich, falls gewisse Komponenten von $r_\nu\in\mathbb{C}^k$ bei der Ordnungsfindung unberücksichtigt bleiben. Dies ist z.B. bei Runge-Kutta-Verfahren der Fall.

5. Projektorstabilitätsfunktionale

Im weiteren sei vorausgesetzt, daß die Eigenwerte von $C_1$ auf dem Einheitskreis nur aus der 1 bestehen, also nicht von der Form $e^{i\varphi}$ [$\varphi\ne0 \pmod{2\pi}$] sind, da andernfalls die typischen Projektoreigenschaften verloren gehen. Sei $E$ diejenige Matrix, die lediglich die spektralen Eigenschaften von $C_1$ zu dem (den) dominanten Eigenwert(en) $\mu=1$ trägt. $T$ sei die Transformationsmatrix von $C_1$ auf Jordannormalform, also

$$ C_1=TJT^{-1},\qquad J=\mathop{\rm diag}(1,\ldots,1,\hbox{weitere Jordanblöcke}). $$

Die Matrix $E$ filtert aus dieser Darstellung nur den (die) Eigenwert(e) $\mu=1$ heraus, also

$$ E := T\hat JT^{-1}, \qquad \hat J := \mathop{\rm diag}(1,\ldots,1,0,\ldots,0). $$

Es zeigt sich, daß $E$ nicht speziell von der Matrix $C_1$ abhängt.

Die Eigenwerte $\mu=1$ sind ja gerade diejenigen Eigenwerte, die für den dominanten lokalen Fehler verantwortlich sind. Es gilt $\sum_{\nu=0}^N 1\to\infty$, falls $N\to\infty$, aber $\sum_{\nu=0}^\infty \mu^\nu

Die Matrix $E$ hat nun eine Reihe von Recheneigenschaften, die in nachstehendem Satz zusammengefaßt sind. $\mathbb{N}$ ist hier wie üblich die Menge der natürlichen Zahlen von eins an.

1. Satz: (Projektorsatz) $E$ sei wie oben definiert. $S$ sei eine beliebige Matrix, ähnlich zu $C_1$, also $S=H^{-1}C_1H$. Dann gelten

$E$ ist idempotent, d.h. $E^2=E$, also allgemein $E^i=E$, für alle $i\in\mathbb{N}$. $E$ ist damit ein Projektor.
$E$ ist unabhängig von $C_1$. $E$ hängt nur ab von $V$, beim Standard-Tripel $(X,V,Y)$ zu $C_1$.
$SE=E=ES$. $S^\nu E=E=ES^\nu$, $\forall\nu\in\mathbb{N}$.
$S^n=E+(S-E)^n$, $\forall n\in\mathbb{N}$.
$[E]^{-1}[S]=[S-E]$, also $\left|[E]^{-1}[S]\right| = 1+\left|S-E\right|$.
$[S]^{-1}[E]=[S-E]^{-1}$, also $\left|[S]^{-1}[E]\right| = 1 +\left|S-E\right| + \cdots + \left|(S-E)^N\right|.$
Es gilt

$$ [S-E]^{-1} \mathop{\rm diag}_{\nu=0}^N E = \mathop{\rm diag}_{\nu=0}^N E = \left(\mathop{\rm diag}_{\nu=0}^N E\right) [S-E]^{-1} $$

und

$$ [S-E] \mathop{\rm diag}_{\nu=0}^N E = \mathop{\rm diag}_{\nu=0}^N E = \left(\mathop{\rm diag}_{\nu=0}^N E\right) [S-E], \qquad [S]^{-1} = [S-E]^{-1} + [E]^{-1} - I. $$

Beweis: Zu (1): $E^2=(T\hat JT^{-1})(T\hat JT^{-1})=T\hat J^2T^{-1}= T\hat JT^{-1}=E$.

Zu (2): Liegt an der Ähnlichkeit von Standard-Tripeln.

Zu (3): $SE=(TJT^{-1})(T\hat JT^{-1})=TJ\hat JT^{-1}=T\hat JT^{-1}=E$. Für $E=ES$ verfährt man analog.

Zu (4): Aufgrund der Vertauschbarkeit von $S$ und $E$ nach (2) und der Projektoreigenschaft nach (1) rechnet man

$$ (S-E)^n = \sum_{\nu=0}^n {n\choose\nu} (-1)^\nu S^{n-\nu} E^\nu = S^n + \sum_{\nu=1}^n {n\choose\nu} (-1)^\nu E = S^n - E, $$

wegen

$$ 0 = (1-1)^n = \sum_{\nu=0}^n {n\choose\nu} (-1)^\nu \qquad\hbox{und}\qquad {n\choose 0}=1. $$

Zu (5): Berechne direkt anhand der Definition von $[X]$ das Matrixprodukt $[E]^{-1}[S]$ aus. Auf der ersten Subdiagonalen erscheint stets $E-S$ und auf allen darunterliegenden Diagonalen steht $E-ES,$ was nach Behauptung (3) gleich der Nullmatrix ist.

Zu (6): Folgt sofort aus (5). Invertierung von $[E]^{-1}[S]$ liefert $[S]^{-1}[E]$. Die Invertierbarkeit ist aufgrund der Definition von $[X]$ gesichert. Die angegebenen Normen (Zeilensummennorm oder Spaltensummenorm) ergeben sich unmittelbar.

Zu (7): Ist klar. ☐

2. Beispiel: Zu (5): Für das Produkt $[E]^{-1}[S]$ im Falle $N=3$ berechnet man

$$ \pmatrix{ I & 0 & 0 & 0\cr E & I & 0 & 0\cr E^2 & E & I & 0\cr E^3 & E^2 & E & I\cr} \pmatrix{ I & 0 & 0 & 0\cr -S & I & 0 & 0\cr 0 & -S & I & 0\cr 0 & 0 & -S & I\cr} = \pmatrix{ I & 0 & 0 & 0\cr E-S & I & 0 & 0\cr E-ES & E-S & I & 0\cr E-ES & E-ES & E-S & I\cr} $$

Es ist $E-ES=0$.

Bei den Ausdrücken hinter $\sup(\cdot)$ steht immer eine endliche Menge, für welches natürlich stets das Maximum existiert. Warum schreibt man $\sup(\cdot)$ und nicht $\max(\cdot)$? Weil man später beim Konvergenzsatz zu $N\to\infty$ übergehen will und man dann zu $\limsup(\cdot)$ gelangt.

3. Satz: Voraussetzungen: $\left|S^\nu\right| \le D$, $\forall\nu\in\mathbb{N}$, $S=XJY$, wobei $J$ die (bis auf Permutation eindeutige) Jordanmatrix ist. $X$ ist die Matrix, die die Rechtseigenvektoren trägt und $Y$ enthält in entsprechend umgekehrter Reihenfolge die Linkseigenvektoren, d.h. es gilt $SX=XJ$ und $YS=JY$. Weiter gilt $S=XJx^{-1}=Y^{-1}JY=XJY$.

Behauptungen: (1) $\displaystyle \hat c_1 \left| [S]^{-1} \bfR \right| \le \left| \hat U-U \right| \le \hat c_2 \left| [S]^{-1} \bfR \right| \le \hat c_3 N \left|\bfR\right|$.

(2) $\displaystyle \left|[S]^{-1}\bfR\right| \sim \left|[E]^{-1}\bfR\right| \sim \left|[J]^{-1} \ovbf Y \bfR\right|$.

(3) Die Konstanten $\hat c_1$, $\hat c_2$ und $\hat c_3$ sind gegeben durch

$$ \hat c_1 = {c_1\over1+\left|S-E\right|},\qquad \hat c_2 = c_2 \sum_{\nu=0}^\infty \left|(S-E)^\nu\right|,\qquad \hat c_3 = \hat c_2 N \max\left(1,\left|E\right|\right). $$

Die Werte $\hat c_1$ und $\hat c_2$ sind unabhängig von $N$, $\hat c_3$ hingegen nicht. $c_1, c_2$ wie beim Hauptsatz.

Beweis: Man schätzt $\left|[E]^{-1}\bfR\right|$ und $\left|[S]^{-1}\bfR\right|$ gegeneinander ab. Es ist

$$ [E]^{-1} \bfR = [E]^{-1}[S] [S]^{-1}\bfR = [S-E] \cdot [S]^{-1} \bfR. $$

Durchmultiplikation mit $[S-E]^{-1}$ würde jetzt liefern $[S-E]^{-1} [E]^{-1} \bfR = [S]^{-1} \bfR$. Alternativ könnte man rechnen

$$ [S]^{-1}\bfR = [S]^{-1}[E] [E]^{-1} \bfR = [S-E]^{-1} \cdot [E]^{-1}\bfR. $$

Nach dem Projektorsatz sind $\left|[S-E]\right|$ und $\left|[S-E]^{-1}\right|$ für alle $N$ beschränkt und damit sind beide Normen äquivalent. Die letzte Äquivalenz hat ihre Ursache in

$$ [S]^{-1} \bfR = \ovbf X [J]^{-1} \ovbf Y \bfR. $$

☐

In Komponentenschreibweise liest sich der obige Satz wie folgend.

4. Satz: Voraussetzungen: $\left|S^\nu\right|\le D$, $\forall \nu\in\mathbb{N}$. $E$ ist wie oben und es gelten die restlichen Voraussetzungen des Hauptsatzes.

Behauptungen: (1) Die Stabilitätsnormen

$$ \sup_{n=0}^N\left|\sum_{\nu=0}^n S^{n-\nu} r_\nu\right| \qquad\hbox{und}\qquad \sup_{n=0}^N\left|r_n + E \sum_{\nu=0}^{n-1} r_\nu\right| $$

sind zueinander äquivalent.

(2) Es gilt die zweiseitige Fehlerabschätzung

$$ \hat c_1\sup_{n=0}^N\left|r_n + E \sum_{\nu=0}^{n-1} r_\nu\right| \le \sup_{n=0}^N \left|\hat u_n-u_n\right| \le \hat c_2 \sup_{n=0}^N\left|r_n + E \sum_{\nu=0}^{n-1} r_\nu\right| \le \hat c_3 \sup_{n=0}^N \left|r_n\right|. $$

5. Satz: Voraussetzungen: $(P_1,C_1,R_1)$ sei das erste Begleitertripel, $(X,J,Y)$ sei das Jordan-Tripel zum Matrixpolynom

$$ \rho(\mu) = A_\ell\mu^\ell + A_{\ell-1}\mu^{\ell-1} + \cdots + A_0 \in \mathbb{C}^{k\times k}, $$

$C_1^\nu\le D$, $\forall\nu\in\mathbb{N}$. Die Matrix $\hat J$ filtere aus $J$ nur die Jordanblöcke zum Eigenwert $\mu=1$ heraus. $J$ selber enthalte keine weiteren dominanten Eigenwerte, $J$ ist also stark $D$-stabil. Zwischen $C_1$ und $J$ besteht grundsätzlich der Zusammenhang

$$ C_1 \mathop{\rm col}_{i=0}^{\ell-1} XJ^i = \left(\mathop{\rm col}_{i=0}^{\ell-1} XJ^i\right) J. $$

Nun sei $\hat C_1$ die entsprechend zu $\hat J$ ähnliche Matrix. $\hat C_1$ ist wie $\hat J$ Projektor.

Behauptung: Das verkürzte Projektorstabilitätsfunktional ist äquivalent zum Projektorstabilitätsfunktional, welches seinerseits äquivalent ist zum ursprünglichen Stabilitätsfunktional, d.h. es gilt unabhängig von $N$, daß

$$ \left| [\hat C_1]^{-1} \boro \bfR \right| \sim \left| \bopo [\hat C_1]^{-1} \boro \bfR \right| \sim \left| [C_1]^{-1} \boro \bfR \right| \sim \left| \bopo [C_1]^{-1} \boro \bfR \right|. $$

Diese Äquivalenzen sind unabhängig von der Wahl der Standard-Tripel, d.h. es gilt genauso

$$ \left| [\hat J]^{-1} \ovbf Y \bfR \right| \sim \left| \ovbf X [\hat J]^{-1} \ovbf Y \bfR \right| \sim \left| [J]^{-1} \ovbf Y \bfR \right| \sim \left| \ovbf X [J]^{-1} \ovbf Y \bfR \right|. $$

Beweis: Die dritte Äquivalenz wurde schon im Hauptsatz postuliert und bewiesen, desgleichen die Invarianz vom Standard-Tripel. Für die weiteren gegenseitigen Äbschätzungen rechnet man

$$ \left| [\hat C_1]^{-1} \boro \bfR \right| = \left| [\hat C_1]^{-1} [C_1] [C_1]^{-1} \boro \bfR \right| = \left| [C_1-\hat C_1] [C_1]^{-1} \boro \bfR \right| $$

Für die Rückabschätzung rechnet man

$$ \left| \bopo [C_1]^{-1} \boro \bfR \right| = \left| \bopo [C_1]^{-1} [\hat C_1] [\hat C_1]^{-1} \boro \bfR \right| = \left| \bopo [C_1-\hat C_1]^{-1} [\hat C_1]^{-1} \boro \bfR \right|. $$

Die Beschränktheit der Normen von $[C_1-\hat C_1]$ und $[C_1-\hat C_1]^{-1}$, und zwar gänzlich unabhängig von $N$, wurde schon vorher gezeigt. An dieser Stelle benutzt man dann $\left|C_1^\nu\right|\le D$, $\forall\nu\in\mathbb{N}$. Die Beschränktheit von $\left|\bopo\right|$, unabhängig von $N$, ist ebenfalls klar. ☐

Wünscht man lediglich ein Konvergenzresultat, so beschränke man sich auf das diskrete Lemma von Gronwall, den Darstellungssatz und beim Abschätzungssatz genügt völlig die letzte, sehr leicht einzusehende Abschätzung. Schließlich beim Hauptsatz genügt lediglich (1), (2) und (3). Weitere Vereinfachungen ergeben sich, falls man sich auf lineare Matrixpolynome der Form $\rho(\mu)=I\mu-S$ beschränkt, also überall lediglich den Fall $\ell=1$ betrachtet. Die untersuchten Verfahrenstypen bleiben dabei die gleichen, man verliert also letztlich nichts an Allgemeinheit. Die Werte $\left|P_1\right|$, $\left|R_1\right|$ entfallen dann völlig. Die Beweise werden kürzer, aber ggf. muß $S$ in den Anwendungen auf Diskretisierungsprobleme unnötig groß gewählt werden. Allerdings kann häufig $S$ kleiner als $C_1$ ausfallen, jedoch spielt $C_1$ für praktische Rechnungen nicht die entscheidende Rolle, vielmehr ist es $Y$.

6. Nichtäquidistante Gitter

1. Aufgrund der Konsistenzbedingung $\rho(1)=0$ für jede Stufe eines zusammengesetzten Verfahrens, gilt $Sw=w$ ($S\in\mathbb{C}^{k\times k}$), mit $w=(1,\ldots,1)^\top \in\mathbb{C}^k$, wenn man das Verfahren in der Form $u_{n+1}=Su_n+h\varphi(u_n,u_{n+1})$ notiert. Bei zyklischer oder auch nicht-zyklischer Kombination mehrerer Verfahren gelangt man zu $u_{n+1}=S_\nu S_{\nu-1} \ldots S_2 S_1 u_n + \ldots{\mskip 3mu}$. Als notwendige und hinreichende Bedingung für Stabilität erhält man

$$ \left\|S_i S_{i+1} \ldots S_{j-1} S_j\right\| \hbox{ beschränkt,}\qquad \forall i\gt j. $$

Ein typischer Fall ist die Benutzung eines Grundverfahrens $u_{n+1}=Su_n+h\varphi$ und Variation der Schrittweite $h$. Gilt die obige Stabilitätsbedingung, so spricht man auch von schrittwechsel-stabil. Die Stabilitätsbedingung in der obigen Form ist nicht einfach zu verifizieren. Hinreichende Kriterien sind allerdings viel einfacher zu handhaben, man fordert also mehr und zwar:

Bei einem Schrittweitenwechsel sind sämtliche Matrizen $S_i$ identisch, oder/und
nach einem Schrittweitenwechsel wird ein Sonderverfahren mit einer Matrix $T$ nachgeschaltet, mit der Eigenschaft, daß $S_iT=T$ gilt, "Matrix-fressende Eigenschaft".

Den ersten Fall kann man so erreichen, daß man die $\alpha_{ij}$-Koeffizienten eines Verfahrens vorgibt und anschließend die $\beta_{ij}$-Koeffizienten berechnet. Das Verfahren ist offensichtlich stabil (die zu $\alpha_{ij}$ gehörende Matrix $S$ war ja stabil) und es konvergiert mit der gleichen Konvergenzordnung wie $S$, wenn man die $\beta_{ij}$ so wählt, daß eine ausreichend hohe Konsistenzordnung erreicht wird. Das ist aber stets möglich nach dem Dimensionssatz für die Konsistenzmatrix $C_{p+1,k}$.

2. Beim zweiten Fall absorbiert bildlich gesprochen die Matrix $T$ sämtliche vorhergehenden Matrizen. Dies lässt sich bewerkstelligen, wenn die Spaltenvektoren von $T$ Rechsteigenvektoren von $S_i$ sind, zum Eigenwert 1. Da aber alle $S_i$ konsistent sind, gilt $S_i w = w$ $\forall i$. Damit hat $T$ die Gestalt

$$ T = (\varepsilon_1 w, \varepsilon_2 w, \ldots, \varepsilon_k w), \qquad \varepsilon_1,\ldots,\varepsilon_k \in \mathbb{C} $$

Aufgrund der Konsistenz von $T$ muß gelten $Tw=w$, also

$$ \varepsilon w = 1, \qquad \varepsilon=(\varepsilon_1,\ldots,\varepsilon_k), $$

im Falle von $w=(1,\ldots,1)$ also $\varepsilon_1+\cdots+\varepsilon_k=1$.

Egal wie $w$ aussieht, $T$ ist ein Projektor, also $T^2=T$, oder was dasselbe ist: $\ker T$ und $\mathop{\rm Im} T$ sind zueinander komplementäre % Unterräume des $\mathbb{C}^k$ ($A_1\cap A_2=\emptyset$, $A_1+A_2=\mathbb{C}^k$). Offensichtlich ist aber nicht jeder konsistente Projektor von der Gestalt $T=(\varepsilon_1 w,\ldots,\varepsilon_k w)$. Dennoch zeigt sich, daß man bei einem stark stabilen, konsistenten Projektor nur einige Zeit warten muß, bis er die gewünschte Form annimmt, also man eine gewisse Potenz dieser Matrix zu bilden hat. Eine äquivalente Charakterisierung liefert

3. Satz: Spektraldarstellung schrittwechsel-stabiler Matrizen.

Voraussetzung: Es seien $T_\nu \in \mathbb{C}^{k\times k}$ ($\nu=1,\ldots,k-1$) mit

$$ T_1 = \pmatrix{ 0&&&&\cr &0&&&\cr &&\ddots&&\cr &&&0&\cr &&&&1\cr }, \quad T_2 = \pmatrix{ 0&1&&&\cr &0&&&\cr &&\ddots&&\cr &&&0&\cr &&&&1\cr }, \quad \ldots, \quad T_{k-1} = \pmatrix{ 0&1&&&\cr &0&1&&\cr &&0&&\cr &&&\ddots&\cr &&&&1\cr }. % \in \mathbb{C}^k. $$

Behauptung: Es gilt

$$ \left. \eqalign{ &T=(\varepsilon_1 w,\ldots,\varepsilon_k w)\cr &\varepsilon w=1\cr} \right\} \iff T\sim T_1 = T_\nu^\nu % (\nu=1,\ldots,k) $$

Beweis: “$\Rightarrow$”: Die Matrix $T=(\varepsilon_1 w,\ldots,\varepsilon_k w)$ hat den Rang genau 1: Jeder Minor mit 2 oder mehr Zeilen verschwindet (Spalten Vielfaches voneinander oder Nullspalte); aus $w\ne0$ folgt $T\ne0$. Damit ist $T$ ähnlich zu einer der Matrizen $T_\nu$, $\nu=1,\ldots,k-1$. Da eine Projektoreigenschaft invariant unter einem Basiswechsel ist % ($P^2=P$ $\Rightarrow$ $S^{-1}PSS^{-1}PS=S^{-1}PS$) muß $T$ ähnlich zu $T_1$ sein.

“$\Leftarrow$”: Es sei $w$ Rechtseigenvektor von $T$ und $X$ sei die Matrix der Rechtsjordanvektoren und $Y$ sei die Matrix der Linksjordanvektoren, also $T=XT_1Y$, mit

$$ X=(*,\ldots,*,w), \qquad Y=\pmatrix{*\cr\vdots\cr*\cr v\cr}. $$

Multiplikation von rechts mit $T_1$ filtert aus $X$ gerade $w$ heraus, Multiplikation von links mit $T_1$ filtert aus $Y$ gerade $v$ heraus, also

$$ XT_1=(0,\ldots,0,w), \qquad T_1Y=\pmatrix{0\cr\vdots\cr0\cr v\cr}. $$

Offensichtlich hat $T=(XT_1)(T_1Y)$ dann die verlangte Gestalt $T=(v_1w,\ldots,v_kw)$ (dyadisches Produkt), mit $vw=1$ aufgrund der Biorthogonalitätsbeziehung $XY=YX=I$. ☐

Der Beweis zeigt gleichzeitig, daß $\varepsilon T=\varepsilon$, also $\varepsilon$ Linkseigenvektor von $T$ ist, was man natürlich auch so gesehen hätte. Die Beschränkung beim zweiten Fall auf ein einziges Sonderverfahren, ist nach dem ersten Fall unerheblich, wenn man z.B. immer das gleiche Sonderverfahren mehrmals anwendet. Wie man sieht, muß man $T_\nu$ nur sooft wiederholen, wie der Nilpotenzgrad von 0 angibt, also $\nu$-mal. Bei einem $k$-Schritt Adams-Moulton-Verfahren also $(k-1)$-mal und bei einem Runge-Kutta-Verfahren einmal.

4. Satz: Die stark stabile Matrix $A\in\mathbb{C}^{n\times n}$ habe die Eigenwerte $\lambda_1=1$ und $\left|\lambda_i\right|<1$ ($i=2,\ldots,n$). Es gelte $Tw=w$, $v_1T=v_1T$, $v_1w=1$, $v_1=\mathop{\rm row}(v_{1i})$ und es sei $T^\infty := (v_{11}w,\ldots,v_{1n}w)$. Dann gilt: $T^\nu\to T^\infty$ ($\nu\to\infty$).

Beweis: Für $S:=T-T^\infty$ gilt wegen $TT^\infty=T^\infty=(T^\infty)^\nu=T^\infty T$ offensichtlich $S^\nu=T^\nu-T^\infty$ ($\nu\in\mathbb{N}$). Mit den Linkseigenvektoren $v_2,\ldots,v_n$ zu $\lambda_2,\ldots,\lambda_n$ ergibt sich $v_iTT^\infty=v_iT^\infty=\lambda_iv_iT^\infty$, also $v_iT^\infty=0$, somit $v_i(T-T^\infty)=v_iS=\lambda_iv_i$, für $i=2,\ldots,n$. Weiter ist $v_1T^\infty=v_1=v_1T$, daher $v_1(T-T^\infty)=v_1S=0$, folglich hat $S$ die Eigenwerte $0,\lambda_2,\ldots,\lambda_n$, ergo $\rho(S)<1$. ☐

Erneut muß $w$ nicht gleich $(1,\ldots,1)^\top$ sein. $v_1w=1$ lässt sich immer erreichen. Bei stark stabilen konsistenten Matrizen ist entweder $T\sim T_1\nu$, was äquivalent ist mit $T^\nu=(\varepsilon_1w,\ldots,\varepsilon_nw)$, oder aber zumindestens konvergiert eine Potenz von $T$ gegen diese Gestalt. Das Wiederholen eines stark stabilen Zykluses hat hierin seine Erklärung.

7. Die Eigenwerte gewisser tridiagonaler Matrizen

Sei

$$ A = \mathop{\rm tridiag}(c,a,b) := \pmatrix{ a & b & & \cr c & \ddots & \ddots & \cr & \ddots & \ddots & b\cr & & c & a\cr } \in \mathbb{R}^{n\times n} , $$

und es sei $c\cdot b > 0$. Dann ist $A$ diagonalisierbar mit den $n$ Eigenwerten

$$ \lambda_i = a + 2\sqrt{bc}{\mskip 3mu}\cos{i\pi\over n+1}, \qquad i=1,\ldots n. $$

Sei

$$ E = \mathop{\rm tridiag}(1,0,1) = \pmatrix{ 0 & 1 & & \cr 1 & \ddots & \ddots & \cr & \ddots & \ddots & 1\cr & & 1 & 0\cr } \in \mathbb{R}^{n\times n}. $$

$E$ ist diagonalisierbar mit den Eigenwerten

$$ \lambda_i = 2\cos{i\pi\over n+1},\qquad i=1,\ldots n. $$

Weiter gelten

$$ B = \mathop{\rm tridiag}(-1,a,-1) = \pmatrix{ a & -1 & & \cr -1 & \ddots & \ddots & \cr & \ddots & \ddots & -1\cr & & -1 & a\cr } \in \mathbb{C}^{n\times n},\quad \lambda_i=a-2\cos{i\pi\over n+1},\quad i=1,\ldots n. $$

und

$$ T = \mathop{\rm tridiag}(-1,2,-1) = \pmatrix{ 2 & -1 & & \cr -1 & \ddots & \ddots & \cr & \ddots & \ddots & -1\cr & & -1 & 2\cr } \in \mathbb{R}^{n\times n},\quad \lambda_i=4\sin{i\pi\over 2(n+1)},\quad i=1,\ldots n. $$

Mit $T$ gilt dann für $C:=1-\alpha T=\mathop{\rm tridiag}(\alpha,1-2\alpha,\alpha)$, also

$$ C = \pmatrix{ 1-2\alpha & \alpha & & \cr \alpha & \ddots & \ddots & \cr & \ddots & \ddots & \alpha\cr & & \alpha & 1-2\alpha\cr } \in \mathbb{R}^{n\times n},\quad \lambda_i=1-4\alpha\sin{i\pi\over 2(n+1)},\quad i=1,\ldots n. $$

8. Verfahren für parabolische Gleichungen

Parabolische, partielle Differentialgleichugen kann man durch Semidiskretisierung der Ortsvariablen auf ein i.a. vergleichsweise großes gewöhnliches Differentialgleichungssystem umformen. Dieses kann man dann mit den üblichen Verfahren numerisch lösen. Ein anderer Weg ist, vollständig zu diskretisieren. Dies bietet u.U. die Möglichkeit Verbindungen zwischen Orts- und Zeitdiskretisierungen zu nutzen. Dies soll hier kurz dargestellt werden.

Betrachtet werde die inhomogene Wärmeleitungsgleichung

$$ u_t = \sigma u_{xx}+f(t,x,u), \qquad \sigma\gt 0 \hbox{ (Materialkonstante)} $$

mit den Rand- und Anfangsdaten

$$ \eqalign{ u(0,x) &= \eta(x),\cr u(t,0) &= u(t,x_e),\cr }\quad \eqalign{ &\hbox{für alle}\quad x\in[0,x_e],\cr &\hbox{für alle}\quad t\in[0,t_e].\cr } $$

Die Differentialausdrücke für $u_t$ und $u_{xx}$ werden jetzt durch Differenzenausdrücke ersetzt und zwar 1)

$$ \eqalign{ u_t &= {u(t+\tau)-u(t)\over\tau}+{\cal O}(\tau), \qquad\hbox{Vorwärtsdifferenz}\cr u_{xx} &= {u(x+h)-2u(x)+u(x-h)\over h^2}+{\cal O}(h),\qquad \hbox{zentrale Differenz}\cr } $$

$$ \eqalign{ u_t &= {u(t+\tau)-u(t-\tau)\over2\tau}+{\cal O}(\tau^2),\qquad \hbox{zentrale Differenz}\cr u_{xx} &= {u(x+h)-2u(x)+u(x-h)\over h^2}+{\cal O}(h),\qquad \hbox{zentrale Differenz}\cr } $$

$$ \eqalign{ u_t &= {u(t+\tau)-u(t)\over\tau}+{\cal O}(\tau), \qquad\hbox{Vorwärtsdifferenz}\cr u_{xx} &= {u(x+h)-\bigl(u(t+\tau,x)+u(t-\tau,x)\bigr)+u(x-h)\over h^2} +{\cal O}(h^2).\cr } $$

$$ \eqalign{ u_t &= {u(t+\tau)-u(t)\over\tau}+{\cal O}(\tau), \qquad\hbox{Vorwärtsdifferenz}\cr u_{xx} &= {u(t+\tau,x+h)-2u(t+\tau,x)+u(t+\tau,x-h)\over h^2} +{\cal O}(h), \qquad\hbox{zentrale Differenz bei $(t+\tau,x)$}.\cr } $$

5) Man wende die Trapezregel ($\vartheta={1\over2}$-Verfahren) an, wobei jedoch, wie oben dauernd geschehen, $f$ nicht mit in die Implizitheit mit hineinbezogen wird.

$$ \eqalign{ u_t &= {u(t+\tau)-u(t)\over\tau}+{\cal O}(\tau), \qquad\hbox{Vorwärtsdifferenz}\cr u_{xx} &= \left({u(x+h)-2u+u(x-h)\over h^2} +{u(t+\tau,x+h)-2u(t+\tau,x)+u(t+\tau,x-h)\over h^2}\right)\bigg/2 +{\cal O}(h^2),\fracstrut\cr &\qquad\hbox{Mittelwert zweier zentraler Differenzen}.\cr } $$

Hierbei wurden zur notationellen Vereinfachung die nicht weiter interessierenden Argumente unterdrückt. Stets ist ist $t$ bzw. $x$ gemeint, also z.B. $u(x+h)$ meint $u(t,x+h)$ und so fort. Diese Schreibweise von $u(t,*)=u(*)$ und $u(*,x)=u(x)$ betont die funktionalen Abhängighkeiten. $u$ ist immer eine Funktion zweier Veränderlicher. Vorauszusetzen ist natürlich $u\in C^4([0,t_e]\times[0,x_e])$.

Es sei $N:=\lceil t_e/\tau\rceil$ die Anzahl der Zeitschritte und $M:=\lceil x_e/h\rceil$ sei die Anzahl der Ortsschritte. Weiter sei $t_i:=i\tau=0,\tau,2\tau,\ldots$, für $i=0,\ldots,N$, und $x_k:=kh=0,h,2h,\ldots$, für $k=0,\ldots,M$. Die Näherung für $u(t_i,x_k)$ werde mit $u^i_k$ bezeichnet. Entsprechend sei $f^i_k$ die Näherung für $f(t_i,x_k,u(t_i,x_k))$. Offensichtlich ist $t_N=t_e$ und $x_M=x_e$.

1) Mit der Diskretisierung 1) erhält man jetzt das explizite Einschrittverfahren, wenn man $u_t$ und $u_{xx}$ entsprechend ersetzt. Durch Zusammenfassung ergibt sich

$$ u^{i+1}_k=\left(1-{2\sigma\tau\over h^2}\right)u^i_k +{\sigma\tau\over h^2}\left(u^i_{k+1}+u^i_{k-1}\right)+\tau f^i_k, \qquad i=0,\ldots,N-1. $$

2) Einsetzen ergibt das explizite Zweischrittverfahren

$$ u^{i+1}_k=u^{i-1}_k+{2\sigma\tau\over h^2}\left(u^i_{k+1}-2u^i_k+u^i_{k-1}\right) +2\tau f^i_k,\qquad i=1,\ldots,N-1. $$

Der Wertevektor $u^1_k$ muß hierbei auf andere Weise erhalten werden, zum Beispiel durch das obige explizite Einschrittverfahren.

3) Einsetzen liefert das implizite Zweischrittverfahren von DuFort/Frankel aus dem Jahre 1953: {DuFort, E.C.}{Frankel, S.P.}%

$$ (1+2\alpha)u^{i+1}_k=2\alpha\left(u^i_{k+1}+u^i_{k-1}\right) +(1-2\alpha)u^{i-1}_k+2\tau f^i_k,\qquad i=1,\ldots,N-1. $$

4) Einsetzen liefert das implizite Einschrittverfahren von Crank/Nicolson aus dem Jahre 1947:{Crank, J.}{Nicolson, P.}

$$ -{\sigma\tau\over h^2}u^{i+1}_{k+1}+\left(1+2\sigma\tau\over h^2\right)u^{i+1}_k -{\sigma\tau\over h^2}u^{i+1}_{k-1}=u^i_k+\tau f^i_k, \qquad i=0,\ldots,N-1. $$

5) Einsetzen ergibt das implizite Einschrittverfahren von Crank/Nicolson (II). Dies entspricht also in etwa der Trapezregel:

$$ -{\alpha\over2}u^{i+1}_{k-1}+(1+\alpha)u^{i+1}_k-{\alpha\over2}u^{i+1}_{k+1} ={\alpha\over2}u^i_{k-1}+(1-\alpha)u^i_k+{\alpha\over2}u^i_{k+1}+\tau f^i_k, \qquad i=0,\ldots,N-1. $$

Zur Abkürzung wurde oben benutzt $\alpha:=\sigma\tau/h^2$, welches auch im folgenden benutzt werden wird. Alle oben angegebenen Verfahren lassen sich in Matrixschreibweise notieren, anhand dessen man dann das Stabilitätsverhalten besser untersuchen kann, als in der Komponentenform. Zur Abkürzung sei daher im weiteren

$$ v^i := \pmatrix{u^i_1\cr \vdots\cr u^i_{M-1}\cr},\qquad v^0 := \pmatrix{\eta(x_1)\cr \vdots\cr \eta(x_{M-1}\cr},\qquad f^i := \pmatrix{f^i_1\cr \vdots\cr f^i_{M-1}\cr} $$

und seien die Matrizen definiert

$$ \displaylines{ A_1 := \mathop{\rm tridiag}(\alpha,1-2\alpha,\alpha),\quad A_2 := \mathop{\rm tridiag}(1,-2,1),\quad A_3 := \mathop{\rm tridiag}(1,0,1),\cr A_4 := \mathop{\rm tridiag}(-\alpha,1+2\alpha,-\alpha),\quad A_5 := \mathop{\rm tridiag}\left(-{\alpha\over2},1+\alpha,-{\alpha\over2}\right),\quad B_5 := \mathop{\rm tridiag}\left({\alpha\over2},1-\alpha,{\alpha\over2}\right). \fracstrut\cr } $$

Mit diesen Vektoren $v^i$, $f^i$ und den Matrizen $A_\nu$, $B_5$ schreiben sich jetzt die alle oben angegebenen Verfahren, wie folgt.

1) $v^{i+1}=A_1v^i+\tau f^i$, für $i=0,\ldots,N-1$.

2) $v^{i+1}=2\alpha A_2v^i+v^{i-1}+2\tau f^i$, für $i=1,\ldots,N-1$.

3) $(1+2\alpha)v^{i+1}=2\alpha A_3v^i+(1-2\alpha)v^{i-1}+2\tau f^i$, für $i=1,\ldots,N-1$.

4) $A_4v^{i+1}=v^i+\tau f^i$, für $i=0,\ldots,N-1$.

5) $A_5v^{i+1}=B_5v^i+\tau f^i$, für $i=0,\ldots,N-1$.

Die Stabilität aller Verfahren ergibt sich damit aus den spektralen Daten der zum Verfahren gehörenden Matrixpolynome. Von tridiagonlen Matrizen, der obigen Gestalt, nämlich Differenzenmatrizen, sind jedoch die Eigenwerte sämtlich angebbar. Es gilt:

1) $A_1$ mit den Eigenwerten $\lambda_{1,i} := 1-4\alpha\sin^2{i\pi\over2M}$, für $i=1,\ldots,M-1$.

2) $A_2$ mit den Eigenwerten $\lambda_{2,i} := 4\sin{i\pi\over2M} > 0$, für $i=1,\ldots,M-1$.

3) $A_3$ mit den Eigenwerten $\lambda_{3,i} := 2\cos{i\pi\over M}$, für $i=1,\ldots,M-1$.

4) $A_4$ mit den Eigenwerten $\lambda_{4,i} := 1+4\alpha\sin^2{i\pi\over2M}$, für $i=1,\ldots,M-1$.

5) Für $A_5$ rechnet man

$$ A_5=\mathop{\rm tridiag}\left(-{\alpha\over2},1+\alpha,-{\alpha\over2}\right) ={1\over2}\left[2I+\alpha\mathop{\rm tridiag}(-1,2,-1)\right] $$

und daher $\lambda_{5a,i} := {1\over2}\left(2+4\alpha\sin{i\pi\over2M}\right)$, und

$$ B_5=\mathop{\rm tridiag}\left({\alpha\over2},1-\alpha,{\alpha\over2}\right) ={1\over2}\left[2I-\alpha\mathop{\rm tridiag}(-1,2,-1)\right] $$

mit den Eigenwerten $\lambda_{5b,i} := {1\over2}\left(2-4\alpha\sin{i\pi\over2M}\right)$, für $i=1,\ldots,M-1$.

Für die Stabilität ergeben sich nun unter Berücksichtigung der obigen Matrizen, die folgenden Aussagen.

1) Das Matrixpolynom $I\mu-A_1$ hat die Eigenwerte $\mu_i=\lambda_{1,i}$. Diese sind genau dann betragsmässig kleiner eins, falls $\alpha\le1/2$. Wegen $\alpha=\sigma\tau/h^2$, führt dies auf die Begrenzung der Zeitschrittweite $\tau$ zu

$$ \tau\le{1\over2\sigma}h^2, $$

insbesondere ist die Zeitdiskretisierung nicht unabhängig von der Ortsdiskretisierung. Eine sehr feine Ortsdiskretisierung führt somit automatisch auch zu einer sehr restringierten Zeitschrittweite, und dies obwohl vielleicht in der Zeit eine viel größere Schrittweite angemessener wäre. Dies ist ein typisches Phänomen für explizite Verfahren und ein Grund zur Betrachtung impliziter Verfahren.

2) Für das Matrixpolynom $I\mu^2-2\alpha A_2\mu-I$ ergeben sich nach Durchmultiplikation mit $D$, wobei $D$ die Transformationsmatrix für $A_2$ ist, also $A_2=D\mathop{\rm diag}(\lambda_{2,i})D^{-1}$, die Eigenwerte des Matrixpolynoms zu

$$ \det D\cdot\det\left(I\mu^2-2\alpha\mathop{\rm diag}(\lambda_{2,i})-I\right) \cdot\det D^{-1}=0, $$

also

$$ \mu_{i,1/2}=\lambda_{2,i}\pm\sqrt{\alpha^2\lambda_{2,i}^2+1}, \qquad i=1,\ldots,M-1. $$

Der Spektralradius ist also für jedes $\alpha$ größer als 1. Das Verfahren ist demzufolge für alle $\tau$ und $h$ instabil und damit nicht global konvergent, insbesondere als Einzelverfahren nicht brauchbar. In der Kombination mit anderen Verfahren, z.B. im Rahmen eines zyklischen Verfahrens, könnte es u.U. konvergent gemacht werden.

3) Das Matrixpolynom lautet $(1+2\alpha)I\mu^2-2\alpha A_3\mu-(1-2\alpha)I$. Sei $D$ die Transformationsmatrix auf Diagonalgestalt für die Matrix $A_3$. Damit erhält man die $2M-2$ Eigenwerte des Matrixpolynoms als Nullstellen der Gleichung

$$ \det D\cdot\det\left(1+2\alpha)I\mu^2 -2\alpha\cdot2\cos{i\pi\over M}\mu-(1-2\alpha)I\right)\cdot\det D^{-1}=0, $$

also

$$ (1+2\alpha)\mu_i^2-4\alpha\mu_i\cos\varphi-(1-2\alpha)=0, $$

mit $\varphi := i\pi/M$. Mit der Lösungsformel für quadratische Gleichungen der Form $ax^2+bx+c=0$, $a\ne0$, nämlich

$$ x_{1/2}={-b\pm\sqrt{b^2-4ac}\over2a}, $$

erhält man

$$ \mu_{i,1/2}={2\alpha\pm\sqrt{1-4\alpha^2\sin^2\varphi} \over 1+2\alpha}. $$

Längere, aber elementare Rechnungen zeigen, daß die beiden Funktionen $\mu_{i,\nu}\colon(\alpha,\varphi)\mapsto\mu_{i,\nu}(\alpha,\varphi)$, $\nu=1,2$, auf dem Rechteck $\left[0,+\infty\right[ \times \left[0^\circ,180^\circ\right]$ ihre Extrema annehmen für $\alpha=0$ oder $\varphi=0^\circ$ bzw. $\varphi=180^\circ$. Hierbei sind eine Reihe von Fallunterscheidungen nötig ($\alpha\to\infty$, Radikand positiv oder negativ, $\ldots$). In der Tat also $\mathopen|\mu_{i,1/2}\mathclose| < 1$, für alle $\alpha>0$ und alle $\varphi \in \left]0^\circ,180^\circ\right[$. Das Verfahren von DuFort/Frankel ist damit unbeschränkt stabil.

4) Das Matrixpolynom $I\mu-A_4^{-1}$ hat die Eigenwerte $\mu_i=\lambda_{4,i}^{-1} < 1$, für alle $\alpha$, da $\lambda_{4,i} > 1$. Das Verfahren von Crank/Nicolson ist damit für alle $\tau$ und $h$ stabil. Aufgrund der Konsistenz folgt damit die Konvergenz.

5) Für das entsprechende Matrixpolynom korrespondierend zu $A_5\mu=B_5$, ergeben sich die Eigenwerte als Quotient der Eigenwerte von $A_5$ und $B_5$, also

$$ \mu_i = {1-2\alpha\sin(i\pi/2M) \over 1+2\alpha\sin(i\pi/2M) }. $$

Damit ist auch dieses Verfahren unbeschränkt stabil, unabhängig also von den beiden Diskretisierungsgrößen $\tau$ und $h$.

Lösung linearer Gleichungssysteme

Mon, 10 Jun 2024 14:00:00 +0200

1. Konditionszahlen von Matrizen
2. Elementare Zeilen- und Spaltenoperationen
3. Die $LU$-Zerlegung
4. Die Gauß-Elimination

Bei jedem Iterationsschritt eines Newton-Raphson-Verfahrens, bzw. bei jeder Aktualisierung der Iterationsmatrix beim Newton-Kantorovich Iterationsverfahren, fällt die Lösung eines linearen Gleichungssystems an. Es ist daher von Wichtigkeit, für diesen Prozeß ein zuverlässiges und zugleich effizientes Lösungsverfahren parat zu haben.

1. Konditionszahlen von Matrizen

Wie könnte man erkennen, ob ein berechnetes Ergebnis $x$ für ein lineares Gleichungssystem $Ax=b$, eine gute Näherung darstellt? Naheliegend wäre die Betrachtung des Residuums $Ax-b$. Wie das folgende Beispiel deutlich macht, ist dieses Maß mit Vorsicht zu geniessen.

1. Beispiel: Sei betrachtet das lineare Gleichungssystem

$$ \pmatrix{10&7&8&7\cr 7&5&6&5\cr 8&6&1& 9\cr 7&5&9&10\cr} \pmatrix{x_1\cr x_2\cr x_3\cr x_4\cr}= \pmatrix{32\cr 23\cr 33\cr 31\cr}. $$

Einige Vektoren $x$ und ihre Bilder $Ax$ lauten

$$ \pmatrix{6\cr -7.2\cr 2.9\cr -0.1\cr}\mapsto \pmatrix{32.1\cr 22.9\cr 32.9\cr 31.1\cr},\qquad \pmatrix{1.50\cr 0.18\cr 1.19\cr 0.89\cr}\mapsto \pmatrix{32.01\cr 22.99\cr 32.99\cr 31.01\cr},\qquad \hbox{richtig:}\quad\pmatrix{1\cr 1\cr 1\cr 1\cr}\mapsto \pmatrix{32\cr 23\cr 33\cr 31\cr}. $$

Die ersten beiden Vektoren könnten den Eindruck erwecken, daß man sich schon sehr nahe der richtigen Lösung befindet, jedoch ist beim ersten Mal keine der angegebenen Nachkommaziffern richtig, und beim zweiten Mal ist nur bei 2 Komponenten eine einzige Stelle hinter dem Komma richtig. Dies alles ist möglich, obwohl die Matrix symmetrisch ist und alle Komponenten der Matrix dem gleichen Größenbereich entstammen.

Ähnliche Verhältnisse liegen vor bei dem nachstehenden Beispiel.

2. Beispiel: Betrachtet sei nun das lediglich zweidimensionale Problem mit den Daten

$$ A:=\pmatrix{1.2969&0.8648\cr 0.2161&0.1441\cr},\qquad b:=\pmatrix{0.8642\cr 0.1440\cr},\qquad \overline x:=\pmatrix{\phantom{-}0.9911\cr -0.4870\cr}. $$

Der Residuenvektor ist hier exakt $(-10^{-8},{\mskip 3mu}10^{-8})$, dennoch lautet die exakte Lösung $(2,{\mskip 3mu}-2)$.

Die Empfindlichkeit, mit der ein lineares Gleichungsystem auf Änderung der Eingabedaten reagiert, wird quantitativ erfaßt durch die Konditionszahl.

3. Satz: Voraussetzungen: Die Matrix $A$ sei invertierbar und $x$ sei die exakte Lösung des linearen Gleichungssystems $Ax=b$. Bekannt sei eine Näherungslösung $\hat x$. $r$ sei das Residuum, also $r:=b-A\hat x$.

Behauptung: Es gilt die beidseitige, scharfe Abschätzung

$$ {1\over|A|\cdot|A^{-1}|}{|r|\over|b|}\buildrel 1)\over\le {|x-\hat x|\over|x|}\buildrel 2)\over\le \left|A\right| \cdot \left|A^{-1}\right| {\left|r\right|\over\left|b\right|}. $$

Beweis: Zu 1). Wegen $|r|=|b-A\hat x|=|Ax-A\hat x|\le|A|\cdot|x-\hat x|$, ist $|r|/|A|\le|x-\hat x|$. Andererseits ist $|x|=|A^{-1}b|\le|A^{-1}|\cdot|b|$, also $1/(|A^{-1}|\cdot|b|)\le1/|x|$, nach Multiplikation entsprechender Seiten der beiden hergeleiteten Ungleichung die unter 1) gemachte Behauptung

$$ {1\over|A|\cdot|A^{-1}|}{|r|\over|b|}\le{|x-\hat x|\over|x|}. $$

Diese Abschätzung ist scharf. Sei zur Abkürzung $e:=x-\hat x$. Die Ungleichung wird zu einer Gleichung, wenn $|Ae|=|A|\cdot|e|$ und $|A^{-1}b|=|A^{-1}|\cdot|b|$. Nach Definition von Matrixnormen gibt es solche Vektoren $e^*$ und $b^*$, sodaß $|Ae^*|=|A|\cdot|e^*|$ und $|A^{-1}b^*|=|A^{-1}|\cdot|b^*|$. Für $b^*$ und $\tilde x:=A^{-1}b^*-e$ wird die Ungleichung zu einer Gleichung.

Zu 2). Erstens wegen $\left|b\right| = \left|Ax\right| \le \left|A\right| \cdot \left|x\right|$ ist $1 / \left|x\right| \le \left|A\right| / \left|b\right|$. Zweitens ist $\left|x-\hat x\right| = \left|A^{-1}b-A^{-1}A\hat x\right| = \left|A^{-1}r\right| \le \left|A^{-1}\right| \cdot \left|r\right|$. Insgesamt ergibt sich wieder durch Multiplikation entsprechender Seiten

$$ {|x-\hat x|\over|x|}\le|A|\cdot|A^{-1}|\cdot{|r|\over|b|}. $$

Auch diese Ungleichung ist scharf. Wähle $x^*$ und $r^*$ mit $|Ax^*|=|A|\cdot|x^*|$ und $\left|A^{-1}r^*\right| = \left|A^{-1}\right| \cdot \left|r^*\right|$. Für $b^*:=A^{-1}x^*$ und $\hat x^*:=x^*-A^{-1}r^*$ gilt Gleichheit. ☐

4. Definition: Die Zahl $\kappa(A) := \left|A\right| \cdot \left|A^{-1}\right|$ heißt Konditionszahl der quadratischen Matrix $A$ zur Norm $\left|\cdot\right|$.

Für rein theoretische Überlegungen stellt die Konditionszahl eine gute Beschreibung von Störungs- und Empfindlichkeitsphänomenen dar. Jedoch ist man ja gerade an der Inverse, bzw. an der Lösung des linearen Gleichungssystems interessiert. Die hier bei der Analyse auftauchende Konditionszahl ist also bei der praktischen Auflösung nicht bekannt. Es gibt Verfahren, mit denen die Konditionszahl geschätzt werden kann. Diese Verfahren sind stellenweise mit der Gaußschen Eliminationsmethode auf das engste gekoppelt und ergeben sich daher zusammen mit der Rechnung. Direkt aus der Defintion der Konditionszahl ersieht man

5. Eigenschaften: Es gelten $\kappa(A)\ge1$ und $\kappa(AB)\le\kappa(A)\cdot\kappa(B)$.

Mit Hilfe der Konditionszahl lässt sich eine Abschätzung einfach schreiben, welche charakterisiert, inwieweit Störungen in der Matrix $A$ zu Veränderungen in der eigentlich gewünschten Lösung $x$ führen.

6. Satz: Löst man statt des exakten Systems $Ax=b$ das gestörte System $\hat A\hat x=b$, so gilt die scharfe Abschätzung nach oben

$$ {\left|x-\hat x\right|\over\left|\hat x\right|} \le \kappa(A) {|A-\hat A|\over\left|A\right|}. $$

Beweis: Zunächst ist $x=A^{-1}b=A^{-1}\hat A\hat x=A^{-1}(A+\hat A-A)\hat x=\hat x+A^{-1}(\hat A-A)\hat x$, also $x-\hat x=A^{-1}(\hat A-A)\hat x$, somit $|x-\hat x|\le|A^{-1}|\cdot|A-\hat A|\cdot|\hat x|$. Durch Erweitern mit $|A|\ne0$ schließlich

$$ {\left|x-\hat x\right|\over\left|\hat x\right|} \le \left|A^{-1}\right| \cdot |A-\hat A| %\left|A-\hat A\right| = \left|A^{-1}\right| \cdot \left|A\right| % {\left|A-\hat A\right|\over\left|A\right|}. {|A-\hat A|\over\left|A\right|}. $$

Die Abschätzung ist offensichtlich scharf. ☐

Die Konditionszahl charakterisiert gleichzeitig auch den Abstand zu allen nicht-invertierbaren Matrizen “in der Nähe” von $A$.

7. Satz: Für alle invertierbaren Matrizen $A$ gilt

$$ \min\left\{{|A-B|\over|A|}:\hbox{$B$ nicht invertierbar}\right\} \ge{1\over\kappa(A)}. $$

Für die Maximumnorm, die 1-Norm und die euklidische Norm gilt Gleichheit.

Beweis: Ist $B$ eine beliebige nicht-invertierbare Matrix, so gibt es also einen Vektor $x\ne0$, mit $Bx=0$. Für diesen Vektor $x$ gilt:

$$ \eqalign{ |x| = |A^{-1}Ax| &\le |A^{-1}|\cdot|Ax|=|A^{-1}|\cdot|(A-B)x|\cr &\le |A^{-1}|\cdot|A-B|\cdot|x|.\cr } $$

Daher $1\le|A^{-1}|\cdot|A-B|$, somit

$$ {1\over\kappa(A)}={1\over|A|\cdot|A^{-1}|}\le{|A-B|\over|A|}. $$

Für die behauptete Gleichheit bei der Maximumnorm, schätzt man in umgekehrter Richtung ab und zeigt damit indirekt durch Eingabelung, die Gleichheit.

Zur genaueren Unterscheidung von Normen und Betragsstrichen, werde mit $\left|\cdot\right|_\infty$ die Maximumnorm bezeichnet und mit $\left|\cdot\right|$ die gewöhnliche Betragsfunktion für skalare Größen. Sei $v$ ein Einheitsvektor mit den beiden Eigenschaften $\left|v\right|_\infty = 1$ und $\left|A^{-1}v\right|_\infty = \left|A^{-1}\right|_\infty \cdot \left|v\right|_\infty$. Nach Definition der zu einer Vektornorm gehörenden Matrixnorm gibt es solch einen Vektor $v$.

Weiter sei $y:=A^{-1}v$, $\left|y\right|_\infty =: \left|y_m\right|$. Sei $e_m$ der $m$-te Einheitsvektor; $z:=y_m^{-1}e_m$ und $B:=A-vz^\top$. Wegen $By=AA^{-1}v-y_m^{-1}y_mv=0$ und $y\ne0$, ist $B$ nicht invertierbar. Für beliebige Vektoren $x$ gilt

$$ \eqalign { \left\|(A-B)x\right\|_\infty &= \left\|y_m^{-1}x_mv\right\|_\infty = \left|y_m\right|^{-1} \cdot \left|x_m\right| \cdot \left\|v\right\|_\infty\cr &= \left|y_m\right|^{-1} \left|x_m\right|\cr &= \left\|y\right\|_\infty \cdot \left|x_m\right|\cr &= (\left\|A^{-1}v\right\|_\infty)^{-1} \cdot \left|x_m\right|\cr &= \left\|A^{-1}\right\|_\infty^{-1} \cdot \left\|v\right\|_\infty \cdot \left|x_m\right|\cr &= \left\|A^{-1}\right\|_\infty^{-1} \cdot \left\|x\right\|_\infty.\cr } $$

Da $x$ beliebig war gilt somit

$$ \left\|A-B\right\|_\infty \le {1\over\left\|A^{-1}\right\|_\infty}, $$

also

$$ {\left\|A-B\right\|_\infty\over\left\|A\right\|_\infty} \le {1\over\left\|A^{-1}\right\|_\infty \cdot \left\|A\right\|_\infty} = {1\over\kappa_\infty(A)}. $$

Für die $1$-Norm führt man den Beweis ganz ähnlich und für die euklidische Norm benutzt man den Satz über die Existenz einer Singulärwertzerlegung. Der genaue Beweis werde hier nicht ausgeführt. ☐

In anderer Formulierung des oben schon bewiesenen Satzes, lässt sich schreiben:

8. Satz: Es sei $A$ invertierbar und es sei $A(x+\Delta x)=b+\Delta b$. Dann gilt

$$ {\left|\Delta x\right|\over\left|x\right|} \le \kappa(A){\left|\Delta b\right|\over\left|b\right|}. $$

Der nächste Satz zeigt, daß das symmetrisierte Gleichungssystem $A^\top Ax=A^\top b$ eine größere, und damit schlechtere Konditionszahl besitzt, als das ursprüngliche System. Insbesondere gelten diese Überlegungen für das Normalgleichungssystem (Ausgleichung im Sinne kleinster Quadrate), obwohl dort die entsprechenden Matrizen nicht quadratisch, also erst recht nicht invertierbar sind.

9. Satz: Für eine beliebige invertierbare Matrix $A$ gilt $\kappa_s(A)\le\kappa_s(A^\top A)$.

Beweis: Seien $\mu_{\rm max}$ und $\mu_{\rm min}$ entsprechend die größten und kleinsten Eigenwerte von $A^\top A$. Dann ist $\left|A\right|_s=\sqrt{\mu_{\rm max}}$ und $\left|A^{-1}\right|=\sqrt{\mu_{\rm min}^{-1}}$. Weiter ist $\left|A^\top A\right|_s = \mu_{\rm max}$ und $\left|(A^\top A)^{-1}\right| = \mu_{\rm min}^{-1}$. Daher ist

$$ \kappa_s = \sqrt{\mu_{\rm max}\over\mu_{\rm min}} \le {\mu_{\rm max}\over\mu_{\rm min}} = \kappa_s(A^\top A). $$

☐

Welche ist nun die beste Vorkonditionierung mit einer Diagonalmatrix? Es zeigt sich nun, daß dies gerade die normmässige Äquilibrierung aller Zeilenvektoren der Matrix $A$ ist.

10. Satz: Ist die invertierbare $(n\times n)$-Matrix $A=(a_{ik})$ normiert (äquilibriert) gemäß

$$ \sum_{k=1}^n \left|a_{ik}\right| = 1,\qquad\hbox{für}\quad i=1,\ldots,n, $$

so gilt für jede Diagonmalmatrix $D$ mit $\det D\ne0$, die Ungleichung

$$ \kappa_\infty(DA)\ge\kappa_\infty(A). $$

$\kappa_\infty$ bezeichnet hier die Konditionszahl bezüglich der Zeilensummennorm (verträgliche Norm für die Maximumnorm $\left|\cdot\right|$).

Beweis: siehe Werner (1975)*1972+1, Werner, Helmut. Es sei $\left|\cdot\right|$ die Maximum-Vektornorm bzw. die Zeilensummennorm bei Matrizen. Für jede Diagonalmatrix $D=(d_{ii})$, mit $\det D\ne0$ gilt

$$ \left|DA\right| = \max_{i=1}^n \left(\left|d_{ii}\right| \sum_{k=1}^n \left|a_{ik}\right| \right) = \max_{i=1}^n \left|d_{ii}\right| $$

und

$$ \left|(DA)^{-1}\right| = \left|A^{-1}D^{-1}\right| = \max_{i=1}^n \left( \sum_{k=1}^n \left|\tilde a_{ik}\right| {1\over d_{kk}}\right) \ge \left|A^{-1}\right| \cdot \min_{i=1}^n {1\over \left|d_{ii}\right|}. $$

Hierbei bezeichnete $\tilde a_{ik}$ die Komponenten der inversen Matrix $A^{-1}$. Aus den beiden obigen Gleichungen folgt

$$ \kappa_\infty(DA) = \left|DA\right| \cdot \left|(DA)^{-1}\right| \ge \left|A^{-1}\right| \cdot \underbrace{\max_{i=1}^n \left|d_{ii}\right| \cdot \min_{i=1}^n {1\over\left|d_{ii}\right|}} _{\displaystyle{{} = 1 = \left|A\right|}} =\kappa_\infty(A). $$

☐

Versucht man nun hingegen auf beiden Seiten der Matrix eine Äquilibrierung zu erreichen, so kann man sich nach den Worten von Dahlquist und Björck u.U. ebenfalls “in die Nesseln setzen”.

11. Beispiel: Man betrachte für $0 < \left|\varepsilon\right| < 1$ die Matrix

$$ A := \pmatrix{\varepsilon&-1&1\cr -1&1&1\cr 1&1&1\cr},\qquad A^{-1} = {1\over4}\pmatrix{0&-2&2\cr -2&1-\varepsilon&1+\varepsilon\cr 2&1+\varepsilon&1-\varepsilon\cr}. $$

Sein nun $D_1 := \mathop{\rm diag}(1,\varepsilon,\varepsilon)$ und $D_2 := \mathop{\rm diag}(\varepsilon^{-1},1,1)$. Die skalierte Matrix

$$ B:=D_2AD_1=\pmatrix{1&-1&1\cr -1&\varepsilon&\varepsilon\cr 1&\varepsilon&\varepsilon\cr} $$

ist zwar jetzt zeilenäquilibriert, jedoch beträgt die Konditionszahl jetzt $\kappa(B)\approx3/\varepsilon$, während hingegen $\kappa(A)=3$.

2. Elementare Zeilen- und Spaltenoperationen

Wenn auch die Cramersche Regel eine elegante Darstellung der Lösung eines linearen Gleichungssystems liefert, so ist doch selbst bei bestmöglicher Ausrechnung aller Determinanten der Aufwand höher, als derjenige von Verfahren, die im folgenden vorgestellt werden. Würde man die $(n+1)$ Determinanten ($n$ Zählerdeterminanten und eine Nennerdeterminante) als Summe von jeweils $n$ Faktoren berechnen, so gelänge man zu Rechengrößenordnungen der Form $(n!)$. Schon $n=50$ würde mehr als $10^{64}$ Gleitkommamultiplikationen erfordern, was selbst für Größtrechenanlagen unvertretbar lange Rechenzeiten heraufbeschwören würde. Aber, wie eben erwähnt, selbst bei bestmöglicher und effizientester Auswertung von Determinanten, wären immer noch größenordnungsmässig $n^4$ Operationen nötig, während hingegen die im weiteren Verlaufe dargestellten Verfahren in der Größenordnung $n^3$ liegen.

1. Ein lineares Gleichungssystem mit $p$ Gleichungen und $n$ Unbekannten $x_1$, $x_2$ bis $x_n$ hat die Form

$$ \eqalign{ a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n &= b_1\cr a_{21}x_1+a_{22}x_2+\cdots+a_{2n}x_n &= b_2\cr &\: \vdots\cr a_{p1}x_1+a_{p2}x_2+\cdots+a_{pn}x_n &= b_p\cr } $$

Hierbei sind $a_{ij}$ und $b_k$ feste gegebene Zahlen. Die $a_{ij}$ heißen die Koeffizienten. Der erste Index (hier $i$) heißt Zeilenindex, der zweite Index (hier $j$) heißt Spaltenindex. Der Vektor $(b_1,\ldots,b_p)$ heißt Vektor der rechten Seite [^{right hand side (RHS)}]. Das lineare Gleichungssystem heißt homogen, falls der Vektor der rechten Seite gleich dem Nullvektor ist, also $b_k=0$, für alle $k=1,\ldots,p$.

2. Daß man beim Operieren mit Gleichungssystemen vorsichtig sein muß, zeigt das folgende Beispielgleichungssystem

$$ \eqalign{ x+y+z &= 1\cr x-y+z &= 0\cr -x+y-z &= 0 } $$

Zieht man nun die drei Folgerungen, erstens die 1.te Gleichung beizubehalten, zweitens alle drei Gleichungen zusammenzuaddieren und drittens die 2.te zur 3.ten Gleichung zu addieren, so erhält man

$$ \eqalign{ x+y+z &= 1\cr x+y+z &= 1\cr 0 &= 0\cr } $$

Aufgrund der Konstruktion ist jedes Tripel $(x,y,z)$, welches das ursprüngliche Gleichungssystem löst auch gleichzeitig Lösung des neuen umgeformten Systems. Die Umkehrung gilt jedoch nicht! Das Tripel mit $x=y=z=1/3$ löst zwar das neue, umgeformte System, nicht aber das ursprüngliche. Durch Ziehen von Folgerungen können also Lösungen hinzukommen!

3. Gibt es Umformungen, die die Lösungsmenge nicht verändern? Ja, es gibt bei linearen Gleichungssystemen unendlich viele Umformungsmöglichkeiten, die die Lösungsgesamtheit nicht verändern. Drei besonders wichtige sind die nachstehenden Umformungen:

Vertauschen zweier Gleichungen, und
Multiplikation einer der Gleichungen mit einer Zahl${}\ne0$,
Addition einer mit einer beliebigen Zahl multiplizierten Gleichung zu einer weiteren Gleichung.

Die nicht betroffenen Gleichungen des Systems werden beibehalten, so wie sie sind. Es gilt nun, daß die obigen drei Vertreter von Umformungen, die Lösungsgesamtheit nicht verändern. Diese drei ausgezeichneten Umformungen, heißen {\it ^{elementare Umformungen}}. Mit den obigen drei Umformungen, wäre das obige malheur nicht passiert.

4. Satz: Bei elementaren Umformungen ändert sich die Lösungsmenge nicht.

Beweis: Das oben angeschriebene Gleichungssystem lautet nach Anwendung der dritten Umformungsregel

$$ \eqalign{ a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n &= b_1\cr (a_{21}+\lambda a_{11})x_1+(a_{22}+\lambda a_{12})x_2+\cdots +(a_{2n}+\lambda a_{1n})x_n &= b_2+\lambda b_1\cr &\: \vdots\cr a_{p1}x_1+a_{p2}x_2+\cdots+a_{pn}x_n &= b_p\cr } $$

Die erste Zeile des Gleichungssytems wurde mit $\lambda$ multipliziert und zur zweiten Gleichung hinzuaddiert. Die restlichen Zeilen wurden völlig unverändert übernommen. Die Lösung des alten Gleichungssystems ist zugleich auch Lösung des neuen umgeformten Lösungssystems, da aber diese Umformung rückgängig gemacht werden kann, hat das neue System genau die gleichen Lösungen. Die Rückgängigmachung geschähe dadurch, daß man die erste Gleichung mit $(-\lambda)$ multipliziert und zur zweiten Gleichung hinzuaddiert. Der Rest der Gleichungen wird wieder belassen. ☐

5. Zu diesen drei elementaren Umformungen korrespondieren die folgenden sogenannten Elementarmatrizen. Die Vertauschung zweier Zeilen

$$ \begin{pmatrix} & & & & & & & \cr & 1 & & & & & & 0\cr & & \ddots & & & & \unicode{x22F0} & \cr i\to & & & 0 & \ldots & 1 & \ldots & \cr & \vdots & & \vdots & & \vdots & & \vdots\cr j\to & & \ldots & 1 & \ldots & 0 & & \cr & \vdots & \unicode{x22F0} & & & & \ddots & \vdots\cr & 0 & & & & & & 1\cr \end{pmatrix} $$

Addiert man zur $i$-ten Zeile von $L(\lambda)$ die $j$-te Zeile multipliziert mit $f(\lambda)$, so ist die äquivaltent mit der Linksmultiplikation mit

$$ \begin{pmatrix} & & & & & j\downarrow & & \cr & 1 & & & & & & \cr & & \ddots & & & & & \cr i\to & & & 1 & \ldots & f(\lambda) & & \cr & & & & \ddots & \vdots & & \cr & & & & & 1 & & \cr & & & & & & \ddots & \cr & & & & & & & 1\cr \end{pmatrix} $$

Die gleiche Operation für die Spalten ist äquivalent mit der Multiplikation von rechts mit der transponierten Matrix

$$ \begin{pmatrix} & & & & & & & \cr & 1 & & & & & & \cr & & \ddots & & & & & \cr i\to & & & 1 & & & & \cr & & & \vdots & \ddots & & & \cr j\to & & & f(\lambda) & \ldots & 1 & & \cr & & & & & & \ddots & \cr & & & & & & & 1\cr \end{pmatrix} $$

Schließlich die Multiplikation der $i$-ten Zeile (Spalte) von $L(\lambda)$ mit einer Zahl $a\ne0$ ist äquivalent mit der Multiplikation von links (rechts) mit der Matrix

$$ \begin{pmatrix} & & & & j\downarrow & & & \cr & 1 & & & & & & \cr & & \ddots & & & & & \cr & & & 1 & & & & \cr i\to & & & & a & & & \cr & & & & & 1 & & \cr & & & & & & \ddots & \cr & & & & & & & 1\cr \end{pmatrix} $$

3. Die $LU$-Zerlegung

1. Die Produktzerlegung der Matrix $A\in\mathbb{C}^{n\times n}$ in $A=LU$, mit Subdiagonalmatrix $L$ und Superdiagonalmatrix $U$ heißt eine $LU$-Zerlegung. Wie das Beispiel der Matrix $A=({0\atop1}{1\atop0})$ zeigt, braucht es nicht unbedingt immer eine $LU$-Zerlegung zu geben.

Gibt es jedoch eine $LU$-Zerlegung, so ist diese Zerlegung unter gewissen Normierungsbedingungen auch eindeutig.

2. Satz: Die Zerlegung einer invertierbaren Matrix $A$ in das Produkt $A=L\cdot U$, einer Superdiagonalmatrix $U$ und einer normierten Subdiagonalmatrix $L$ (Subdiagonalmatrix mit lauter Einsen in der Diagonalen), ist eindeutig, wenn sie existiert.

Beweis: $A$ sei zerlegbar in die beiden Produkte $A=LU=\hat L\hat U$, mit Superdiagonalmatrizen $U$, $\hat U$ und normierten Subdiagonalmatrizen $L$, $\hat L$. Das Produkt von Superdiagonalmatrizen ist wieder eine Superdiagonalmatrix. Durch Transponierung erhält man entsprechend, daß das Produkt von Subdiagonalmatrizen wieder eine Subdiagonalmatrix ist und, daß weiter das Produkt von zwei normierten Subdiagonalmatrizen wieder eine normierte Subdiagonalmatrix ist. Nun folgt aus der Gleichung $U\hat U^{-1}=L^{-1}\hat L=I$, dann $\hat L^{-1}=L^{-1}$ und $\hat U^{-1}=U^{-1}$ (Eindeutigkeit der Inversen). ☐

Die folgende Tabelle gibt an: die wesentliche Anzahl der durchzuführenden Operationen und der benötigte Speicherplatzbedarf bei vollbesetzten Matrizen, bei symmetrischen $(m,m)$-Bandmatrizen und dies in Abhängigkeit davon, ob eine Pivotsuche durchgeführt wird oder nicht.

	vollbesetzte Matrix	Bandmatrix ohne Pivot	Bandmatrix mit Pivot
Speicherplatzbedarf	$n^2$	$(2m+1)n$	$(3m+1)n$
LU-Faktorisierung	$n^3/3$	$(m+1)mn$	$(2m+1)mn$
Rücksubstitution	$n^2$	$(2m+1)n$	$(3m+1)n$

3. Definition: $A\in\mathbb{C}^{n\times n}$ heißt ^{$(m,k)$-Bandmatrix} ($m,k\in\{0,\ldots,n-1\}$) mit linksseitiger Bandbreite $m$ und rechtsseitiger Bandbreite $k$, wenn $A$ insgesamt $m$ ^{Unterdiagonalen} und $k$ ^{Oberdiagonalen} enthält, welche Nichtnullelemente besitzen und sonst $A$ nur aus Nullen besteht. Eine ^{Tridiagonalmatrix} ist somit eine $(1,1)$-Bandmatrix, eine ^{obere Hessenbergmatrix} ist eine $(1,n-1)$-Bandmatrix und eine ^{untere Hessenbergmatrix} ist eine $(n-1,1)$-Bandmatrix. Hermitesche $(m,k)$-Bandmatrizen sind stets $(m,m)$-Bandmatrizen. Gelegentlich ist es nützlich zu sagen, daß $A$ eine $(m,k)$-Bandmatrix ist, wenn die entsprechenden Unter- oder Überdiagonalen Nichtnullelemente enthalten können. So wäre dann die Nullmatrix eine $(m,k)$-Bandmatrix für jedes $m,k\in\{0,\ldots,n-1\}$.

4. Lemma: $A\in\mathbb{C}^{n\times n}$ sei eine $(m,k)$-Bandmatrix und besitze eine $LU$-Zerlegung. Dann ist $L$ eine $(m,0)$-Bandmatrix und $U$ ist eine $(0,k)$-Bandmatrix.

Beweis: Es ist $a_{ij}=\sum_{\nu=1}^{\min(i,j)} \ell_{i\nu} u_{\nu j}$, also

$$ \eqalignno{ &u_{1j} = a_{1j} \quad (j=1,\ldots,n), \qquad \ell_{i1} = {a_{i1} \over u_{11}}, \cr &i=2,\ldots,n: \cr &\qquad\qquad u_{ij} = a_{ij} - \sum_{\nu=1}^{i-1} \ell_{i\nu} u_{\nu j} \quad(j=i,i+1,\ldots,n),\cr &\qquad\qquad \ell_{ji} = {1\over u_{ii}} \left(a_{ji} - \sum_{\nu=1}^{i-1} \ell_{i\nu} u_{\nu j}\right) \quad(j=i+1,\ldots,n).\cr } $$

☐

Umgekehrt ist das Produkt einer $(m,0)$-Bandmatrix mit einer $(0,k)$-Bandmatrix immer mindestens eine $(m,k)$-Bandmatrix.

5. Beispiel: Die Inversen von Bandmatrizen können einen hohen Auffüllungsgrad aufweisen. Dies zeigt

$$ A = \pmatrix{ 1 & -1 & & & \cr -1 & 2 & -1 & & \cr & -1 & 2 & -1 & \cr & & -1 & 2 & -1\cr & & & -1 & 2\cr}, \qquad L^\top = U = \pmatrix{ 1 & -1 & & & \cr & 1 & -1 & & \cr & & 1 & -1 & \cr & & & 1 & -1\cr & & & & 1\cr}. $$

Die Matrix $A$ ist diagonal-dominant mit positiven Diagonaleinträgen, also positiv definit. $L$ und $U$ sind Dreiecksbandmatrizen, dennoch ist die Inverse von $A$ vollbesetzt, nämlich

$$ A^{-1} = \pmatrix{ 5 & 4 & 3 & 2 & 1\cr 4 & 4 & 3 & 2 & 1\cr 3 & 3 & 3 & 2 & 1\cr 2 & 2 & 2 & 2 & 1\cr 1 & 1 & 1 & 1 & 1\cr}. $$

4. Die Gauß-Elimination

1. Definition: Es bezeichnet $B[k]:=(b_{ij})_{i,j=1,\ldots,k}$ die Matrix gebildet aus den führenden $k$ Zeilen und Spalten. Entsprechend entgegengesetzt bezeichnet $B\!\left]\ell\right[:=(b_{ij})_{i,j=n-\ell,\ldots,n}$ die Matrix zusammengesetzt aus den letzten $\ell$ Zeilen und Spalten.

2. Das Gaußsche Eliminationsverfahren mit totaler Pivotsuche (GECP: Gaussian Elimination with complete pivoting) geht wie folgt vor sich. Es sei $A$ eine komplexe $n\times n$ Matrix.

Durchsuche die Matrix $A$ nach dem betragsmässig größten Element, das sogenannte Pivotelement. Dieses sei $a_{ij}$. Vertausche die $i$-te Zeile mit der ersten Zeile und vertausche die $j$-te Spalte mit der ersten Spalte. Die so neu gebildete Matrix heiße $A^{(0)}$.
Pivotiere bzgl. $a_{11}^{(0)}$, d.h. addiere ein Vielfaches der ersten Zeile zu allen darunterliegenden Zeilen, sodaß in der ersten Spalte, nur noch Nullen stehen (außer in dem ersten Element, dem Pivot, stehen dann nur noch Nullen im ersten Spaltenvektor). Die jetzt neu erhaltene Matrix heiße $A^{(1)}$.
In gleicher Weise führe den zweiten Schritt durch für $k=2,\ldots,n-1$, d.h. finde in $A^{(k-1)}\!\left]n-k+1\right[$ das betragsmässig größte Element, nenne es Pivotelement, und vertausche dann die entsprechende Zeile und Spalte, sodaß dieses Element in die Position $(k,k)$ kommt. Wenn das Pivotelement nicht Null ist, so pivotiere, andernfalls breche das Verfahren ganz ab: die Matrix $A$ ist nicht invertierbar. Die neu erhaltene Matrix nenne $A^{(k)}$.

Die Gauß-Elimination liefert entweder eine obere Dreiecksmatrix $A^{(n-1}$, oder die Information, daß $A$ nicht invertierbar ist. Wie erwähnt, heißen die Diagonalelemente $a_{kk}^{(k-1)}$ Pivotelemente. Eine Matrix $A$ ist genau dann invertierbar, wenn alle Pivotelemente nicht verschwinden (Produkt der Diagonalelemente ist bis auf das Vorzeichen der Wert einer Determinante). Wenn man das betragsmässig größte Element der Restmatrix $A\!\left]n-k\right[$ nur in der ersten Spalte von $A\!\left]n-k\right[$ sucht, also das betragsmässig größte Element der jeweils ersten Spalte sucht, anstatt in der ganzen Matrix $A\!\left]n-k\right[$ danach zu suchen (w.o.), so spricht man von Gauß-Elimination mit partieller Pivotsuche. Beschränkt man sich sogar bei der Suche auf ein beliebiges nicht verschwindendes Element, so spricht man von gewöhnlicher Gauß-Elimination. Bei Handrechnung verwendet man häufig die gewöhnliche Gauß-Elimination und wählt als Pivot möglichst kleine ganze Zahlen, falls vorhanden, z.B. 1. Programmiert wird i.d.R. die Gauß-Elimination mit partieller Pivotwahl, während GECP eher selten angewendet wird.

Nach dem $k$-ten Eliminationsschritt sieht die umgeformte Matrix $A$ dann wie folgt aus

$$ A^{(k-1)} = \left( \begin{array}{cccc|cccc} a_{11}^{(0)} & a_{12}^{(0)} & \ldots & a_{1,k}^{(0)} && a_{1,k+1}^{(0)} & \ldots & a_{1,n}^{(0)}\cr & a_{22}^{(1)} & \ldots & a_{2,k}^{(1)} && a_{2,k+1}^{(1)} & \ldots & a_{2,n}^{(1)}\cr & & \ddots & \vdots && \vdots & \ddots & \vdots\cr 0 & & & a_{k,k}^{(k-1)}{\mskip 5mu} && a_{k,k+1}^{(k-1)} & \ldots & a_{k,n}^{(k-1)}\cr \hline &&& * && * & \ldots & *\cr &&& * && * & \ldots & *\cr &0&& \vdots && \vdots & \ddots & \vdots\cr &&& * && * & \ldots & *\cr \end{array} \right) $$

Wüßte man im voraus, welche Zeilen und Spalten jeweils zu vertauschen wären, so könnte man diese gleich im voraus durchführen. Mit dieser so vorpräparierten Matrix bräuchte man dann keinerlei Zeilen- und Spaltenvertauschungen durchzuführen. Alle diejenigen Matrizen, bei denen diese Vorvertauschungen schon durchgeführt sind, sollen CP heißen (^{completely pivoted}).

Es sei

$$ g(A) = {\displaystyle{\max_{i,j,k} |a_{ij}^{(k)}|} \over \displaystyle{\max_{i,j} \left|a_{ij}\right|} } . $$

Diese Größe heißt Wachstum der Pivotstrategie.

Bei partieller Pivotwahl kann gelten $g(A)=2^n$. Bei totaler Pivotsuche sagt die 1992 falsifizierte Wilkinsonsche Vermutung (local copy), nach James Hardy Wilkinson (1919--1986), $g(A)\le n$. Diese Abschätzung ist scharf, wie man anhand von sogenanten Hadamard-Matrizen, Jacques S. Hadamard, (1865--1963), zeigen kann. Für komplexe Matrizen sind diese Schranken zu erhöhen. Hierzu Jane Day und Brian Peterson in Day/Peterson (1988):

Wilkinson's conjecture is very intriguing---easy to state, soon believed, and apparently very difficult to resolve.

3. Proposition: $A\in\mathbb{C}^{n\times n}$ sei invertierbar und CP. Es werde der $k$-te Eliminationsschritt der GECP durchgeführt, $k

$$ a_{ij}^{(k)} = {A_{1,\ldots,k,i}^{1,\ldots,k,j} \over A_{1\ldots k}^{1\ldots k}}. $$

Beweis: Da $A$ CP und da Pivotierung (nämlich Linearkombination von Zeilen) die Determinante und die Hauptminoren von $A$ nicht ändern, ergibt der Laplacesche Entwicklungssatz nach der $k$-ten Spalte (oder Zeile) sofort $\displaystyle{ A_{1,\ldots,k,j}^{1,\ldots,k,i} = a_{ij}^{(k)} A_{1\ldots k}^{1\ldots k}. }$ Aufgrund der Invertierbarkeit von $A$ folgt die gemachte Aussage. ☐

4. Corollar: Das Pivotelement $a_{kk}^{(k-1)}$ bei GECP ist $a_{kk}^{(k-1)}={1\over x}$; $x$ ist das $(k,k)$-Element der Inverse der Matrix $A[1\ldots k]$, also von $(A[1\ldots k])^{-1}$. Insbesondere ist $a_{nn}^{(n-1)}$ das Reziproke eines Elementes von $A^{-1}$.

Beweis: Sei $B:=A[1\ldots k]$. Nach der Proposition ist dann

$$ (B^{-1})_{k,k} = {B_{1,\ldots,k-1}^{1,\ldots,k-1} \over B_{1\ldots k}^{1\ldots k}} = {A_{1,\ldots,k-1}^{1,\ldots,k-1} \over A_{1\ldots k}^{1\ldots k}} = {1 \over a_{kk}^{(k-1)}} . $$

☐

5. Bemerkungen: (1) GECP kann wie folgt interpretiert werden: Hat man die ersten $k-1$ Zeilen und Spalten gewählt, so wählt man die $k$-te Zeile und Spalte deswegen aus, weil dann die führende $k\times k$ Determinante maximalen Betrag aufweist.

(2) Alternativ kann man argumentieren: Hat man die ersten $k-1$ Zeilen und Spalten gewählt, so wählt man die $k$-te Zeile und Spalte deswegen aus, weil dann $\det A\!\left]n-k\right[$ minimalen Betrag aufweist.

(3) Geometrisch gesprochen: man wählt die $k$-te Zeile und Spalte deswegen aus, weil dann das $k$-Volumen des Spates gebildet aus den ersten $k$ Zeilenvektoren und Spaltenvektoren aus $A[k]$ maximal wird.

(4) Umgekehrt: man wählt die $k$-te Zeile und Spalte deswegen aus, weil damit das $(n-k)$-Volumen des projizierten Spates minimal wird, denn Pivotieren bzgl. $a_{kk}^{(k-1)}$ heißt Projizieren des Spates aufgespannt durch die Zeilen von $A^{(k-1)}\!\left]n-k+1\right[$ in den Spat aufgespannt durch die Zeilen von $A^{(k)}\!\left]n-k\right[$.

6. Satz: Darstellungssatz für die Gauß-Elimination nach Day/Peterson (1988). Es sei $A\in\mathbb{C}^{n\times n}$ invertierbar und $k

$$ A^{(k)}\!\left]n-k\right[ = \left(A^{-1}\!\left]n-k\right[\right)^{-1}. $$

D.h. die nach dem $k$-ten Eliminationsschritt noch nicht in Dreiecksform vorliegende Restmatrix $A^{(k)}\!\left]n-k\right[$ ist nichts anderes als von der eigentlichen Inversen $A^{-1}$ die invertierte Restmatrix $\bigl(A^{-1}\!\left]n-k\right[\bigr)^{-1}$. Damit stellt die Restmatrix nicht nur irgendeine Hilfsmatrix dar, sondern steht im Gegenteil mit der Inversen schon in engster Verbindung.

Beweis: Für $i,j>k$ sei $a_{ij}^{(k)}$ ein beliebiges Element von $A^{(k)}\!\left]n-k\right[$. Nach der vorhergehenden Proposition und dem Satz über Minoren Inverser gilt

$$ a_{ij}^{(k)} = {A_{1,\ldots,k,i}^{1,\ldots,k,j} \over A_{1\ldots k}^{1\ldots k}} = {(-1)^{i+j} \left|A\right| (A^{-1})_{k+1,\ldots,\hat\imath,\ldots,n} ^{k+1,\ldots,\hat\jmath,\ldots,n}\over A_{1\ldots k}^{1\ldots k}} $$

Erneut wegen dem Satz über die Minoren Inverser gilt

$$ \left|A^{-1}\!\left]n-k\right[\right| = (A^{-1})_{k+1,\ldots,n}^{k+1,\ldots,n} = {\alpha_{k+1,\ldots,n}^{k+1,\ldots,n} \over \left|A\right|} = {A_{1\ldots k}^{1\ldots k} \over \left|A\right|} $$

Durch Einsetzen

$$ a_{ij}^{(k)} = (-1)^{i+j} {(A^{-1})_{k+1,\ldots,\hat\imath,\ldots,n}^{k+1,\ldots,\hat\jmath,\ldots,n} \over |A^{-1}\!\left]n-k\right[|} . $$

Damit ist $a_{ij}^{(k)}$ genau der $(i,j)$-Eintrag von $(A^{-1}\!\left]n-k\right[)^{-1}$. ☐

7. Corollar: Führt man vor der eigentlichen Elimination sämtliche Zeilen- und Spaltenvertauschungen im voraus durch (also Matrix ist CP), so hat dies die Bedeutung, daß für $k=1,\ldots,n-1$ der Betrag der Determinante von $A^{-1}\!\left]n-k\right[$ nicht vergrößert werden kann durch irgendwelche Zeilen- und Spaltenvertauschungen der letzten $n-k+1$ Zeilen und Spalten von $A$.

Beweis: Nach den obigen Bemerkungen (3) und (4) ist GECP gleichbedeutend mit der Minimierung von $\left|\det A^{(k)}\!\left]n-k\right[\right|$ in jedem Schritt $k$, also der Maximierung von $\left|\det A^{-1}\!\left]n-k\right[\right|$. ☐

Für positiv definite Matrizen $A$ ist GE immer anwendbar und zugleich liefert GE ein einfaches Kriterium zur Überprüfung auf positve Definitheit. Es gilt nämlich

8. Satz: Voraussetzung: $A\in\mathbb{C}^{n\times n}$ sei hermitesch. (Hermite, Charles (1822--1901))

Behauptung: GE durchführbar $\land$ $a_{ii}^{(k)}>0$ $\iff$ $A\succ0$.

Beweis: $A$ positiv definit $\iff$ $A_{1\ldots r}^{1\ldots r}>0$ $\iff$ $a_{ii}^{(k)}>0$, da Determinanten oder Hauptminoren sich nicht ändern bei Addition von Vielfachen von Zeilen zueinander. ☐

9. Corollar: Jede positiv definite (hermitesche) Matrix $A$ besitzt genau eine $LU$-Zerlegung der Form $A=LU=LDL^\top$, mit einer normierten Subdiagonalmatrix $L$ und einer Diagonalmatrix $D$ mit lauter positiven Diagonalelementen.

Die Gauß-Elimination mit Diagonalstrategie mit positiven (Diagonal-) Pivots ist genau dann ausführbar, wenn die Matrix positiv definit ist. Also bei positiv definiten Matrizen sind Zeilen- und/oder Spaltenvertauschungen prinzipiell nicht erforderlich. Dies ist insofern von besonderem Interesse, als daß bei sehr großdimensionalen Matrizen ($n>1000$ beispielsweise) man besonders Wert legt auf einen geringen Auffüllungsgrad, welcher mit einer Pivotstrategie i.d.R. in einem Zielkonflikt steht. Konzentriert man sich daher bei positiv definiten Matrizen allein darauf, den Auffüllungsgrad gering zu halten, so bleibt dennoch die Gauß-Elimination immer durchführbar.

Genauso zeigt man: GE durchführbar $\iff$ $A_{1\ldots r}^{1\ldots r}\ne0$, da auch hier wieder sich die Hauptminoren nicht ändern bei Linearkombination von Zeilen. Damit hat man: Eine Matrix $A$ besitzt genau dann eine $LU$-Zerlegung, wenn alle führenden Hauptminoren nicht verschwinden. Dies deswegen, weil die Existenz einer $LU$-Zerlegung äquivalent ist mit der Durchführbarkeit der Gauß-Elimination ohne irgendwelche Zeilen- oder Spaltenvertauschungen.

Projektionsmatrix eines Raumes

Sun, 09 Jun 2024 18:00:00 +0200

1. Projektor

Sei eine lineare Mannigfaltigkeit ${\cal M}\subseteq\mathbb{C}^n$ aufgespannt durch die $s$ linear unabhängigen Vektoren $a_1,\ldots,a_s\in\mathbb{C}^n$. Sei $A=(a_1,\ldots,a_s)\in\mathbb{C}^{n\times s}$. Es gilt

$$ % Mehrfache Indizes für Minoren: % #1: Anzahl der Indizes, #2: Variable für unten \def\multisub#1#2{{\textstyle\!{\scriptstyle1\atop\scriptstyle#2_1}% {\scriptstyle2\atop\scriptstyle#2_2}% {\scriptstyle\ldots\atop\scriptstyle\ldots}% {\scriptstyle#1\atop\scriptstyle#2_#1}}} % #1: Anzahl der Indizes, #2: Variable für oben \def\multisup#1#2{{\textstyle\!{\scriptstyle#2_1\atop\scriptstyle1}% {\scriptstyle#2_2\atop\scriptstyle2}% {\scriptstyle\ldots\atop\scriptstyle\ldots}% {\scriptstyle#2_{#1}\atop\scriptstyle#1}}} % #1: Anzahl der Indizes, #2: unten, #3: oben \def\multisubsup#1#2#3{{\textstyle\!{\scriptstyle#3_1\atop\scriptstyle#2_1}% {\scriptstyle#3_2\atop\scriptstyle#2_2}% {\scriptstyle\ldots\atop\scriptstyle\ldots}% {\scriptstyle#3_{#1}\atop\scriptstyle#2_{#1}}}} \def\xM{x_{\cal M}} \def\xMb{x_{{\cal M}^\bot}} \forall x\in{\cal M}: \: \dot\exists u\in\mathbb{C}^s:\quad x=Au. $$

Sei ${\cal M}^\bot$ das orthogonale Komplement von ${\cal M}$, also $y\in{\cal M}^\bot$ genau dann, wenn

$$ x\bot y\: (\forall x\in{\cal M}) \iff Au\bot y\: (\forall u\in\mathbb{C}^s) \iff \langle x,y\rangle = 0 = \langle Au,y\rangle. $$

Ist die Orthogonalitätsrelation (bzw. die dazugehörige Sesquilinearform) nicht ausgeartet, so gilt

$$ x\in{\cal M}\cap{\cal M}^\bot \: \Rightarrow\: x\bot x \: \Rightarrow\: x=0, $$

also ${\cal M}\oplus{\cal M}^\bot=\mathbb{C}^n$. Jeder Vektor $0\ne x\in\mathbb{C}^n$ lässt sich somit eindeutig zerlegen in einen Anteil aus ${\cal M}$ und einen Anteil aus ${\cal M}^\bot$, also

$$ \forall 0\ne x\in\mathbb{C}^n:\: \dot\exists\xM\in{\cal M}:\: \dot\exists\xMb\in{\cal M}^\bot:\quad x = \xM + \xMb. $$

Bezeichnung: $\xM$ heißt die Projektion von $x$ auf ${\cal M}$ und $\xMb$ heißt orthogonale Projektion von $x$ auf ${\cal M}$.

1. Satz: Bei fest gegebener Basis $a_1,\ldots,a_s$ von ${\cal M}$ lassen sich $\xM$ und $\xMb$ berechnen durch

$$ \eqalign{ \xM &= Px, \qquad P=A(A^*A)^{-1}A^*,\cr \xMb &= Qx, \qquad Q=I-P=I-A(A^*A)^{-1}A^*.\cr } $$

Da $A\in\mathbb{C}^{n\times s}$ maximalen Spaltenrang hat, ist $A^*A$ nach der Gramschen Determinante invertierbar.

Bezeichnung: $P$ heißt Projektionsmatrix oder Projektor, $Q$ heißt orthogonaler Pojektor; genauer ist von $P_s$ bzw. $Q_s$ zu sprechen.

Beweis: Sei

$$ x = u_1a_1+\ldots+u_sa_s)+(u_{s+1}a_{s+1}+\ldots+u_na_n) =: Au + \hat A\hat u. $$

Es ist $\xM=Au$, mit $u\in\mathbb{C}^s$. Durch Linksmultiplikation mit $(A^*A)^{-1}A^*$ folgt $u=(A^*A)^{-1}A\xM$. Ein Element von ${\cal M}$ muß bzgl. der Basis $a_1,\ldots,a_s$ die Koordinaten $(u_1,\ldots,u_s,0,\ldots,0)$ haben, somit

$$ \xM = Px = (A,*)\pmatrix{u\cr0\cr} = Au = A(A^*A)^{-1}A^*x. $$

Für $x\in\cal M$ gilt $x=Au$ und $Px=Au=x$. Für $x\in{\cal M}^\bot$ gilt $A^*x=0,$ wegen $a_i^*x=0$ ($i=1,\ldots,s$). Die Darstellung von $Q$ folgt sofort aus der Darstellung von $P$, wegen $\xMb = x - \xM = x - Px$. ☐

2. Beispiel: Seien $n=3$, $s=2$, $a_1=(1,0,0)^\top$, $a_2=(0,1,0)^\top$. Damit wird $P=A(A^\top A)^{-1}A^\top=\mathop{\rm diag}(1,1,0)$, wegen $A^\top A={1{\mskip 3mu}0\choose 0{\mskip 3mu}1}$. Dies ist tatsächlich die Projektion auf die ersten beiden Komponenten.

3. Satz: Bezeichne $\mathopen|x\mathclose|=\sqrt{\langle x,x\rangle}$. Es gelten

(1) $P^*=P$, $Q^*=Q$ (Hermitizität),

(2) $P^2=P$, $Q^2=Q$ (Idempotenz),

(3) $x,a_1,\ldots,a_s$ linear anhängig $\iff$ $Qx=0$,

(4) $P$, $Q$ positiv semidefinit,

(6) $P_iP_k = P_{\min(i,k)}$, $Q_iQ_k = Q_{\max(i,k)}$,

(7)

$$ P_ia_k = \cases{=0, &falls $i\lt k$,\cr \ne0, &sonst,} \qquad Q_ia_k = \cases{=0, &falls $k\lt i$,\cr \ne0, &sonst.} $$

Beweis: zu (1) und (2): elementare Rechnung.

zu (3): $Qx=0$ $\iff$ $x\in{\cal M}$ $\iff$ $x,a_1,\ldots,a_s$ im gleichen $s$-dimensionelen Raum.

zu (4): $P$, $Q$ hermitesch mit Eigenwerten 0 und 1.

zu (6) und (7): klar. ☐

4. Beispiel: Projektionen müssen nicht immer hermitesch, noch niemals normal sein. Projektionen können auch die euklidische Norm eines Vektors vergrößern. Dies zeigt $R={1{\mskip 3mu}1\choose 0{\mskip 3mu}0}$: $R^2=R$ (Projektoreigenschaft), $R\ne R^\top$, $R^\top R\ne RR^\top$ und $R{1\choose1}={2\choose0}$, aber $\mathopen|(2,0)\mathclose|=2$, $\mathopen|(1,1)\mathclose|=\sqrt2$. Der obige Satz sagt, daß mit dem Projektor $A(A^*A)^{-1}A^*$, dies alles nicht passieren kann.

2. Basiswechsel

1. Jeder Vektor $x\in\mathbb{C}^n$ besitzt bzgl. der Basis $a_1,\ldots,a_n$ eine Darstellung der Form $x=\sum{1\le i\le n} a_i\hat x_i = A\hat x$, oder $\hat x=A^{-1}x$, wobei $A=(a_1,\ldots,a_n)\in\mathbb{C}^{n\times n}$. Jeder Vektor $y\in\mathbb{C}^n$ hat also bzgl. $a_1,\ldots,a_n$ die Basisdarstellung $\hat y=A^{-1}y$. Sei $B=(b_1,\ldots,b_m)\in\mathbb{C}^{m\times m}$ eine Basismatrix für den $\mathbb{C}^m$ und sei $L\colon\mathbb{C}^n\mapsto\mathbb{C}^m$ eine Matrix bzgl. der Standardbasis. Der Übergang von der Standardbasis im $\mathbb{C}^n$ auf $A$ und der simultane Übergang von der Standardbasis im $\mathbb{C}^m$ auf $B$ “bewirkt”, daß man $L$ ersetzt durch $\hat L=B^{-1}LA$. Ein Vektor $\hat y=A^{-1}y$ wird also durch die zuerst wirkende Matrix $A$ umgeformt in Standardkoordinaten, danach wirkt wie üblich $L$, und $B^{-1}$ führt zur gewünschten Koordinatendarstellung im Zielraum $\mathbb{C}^m$.

2. Beispiel: Im Falle $B=A$ hat man $\hat L=A^{-1}LA$, bzw. im Falle $B=I$ einfach nur $\hat L=LA$. Wäre lediglich $A=I$ so $\hat L=B^{-1}L$.

Umgekehrt kann die Ersetzung der Matrix $L$ durch $A^{-1}LA$ gedeutet werden, als ein Übergang von der Standardbasis auf die Basis $A$, z.B. $L\to XJY=XJX^{-1}$, bzw. $J\to X^{-1}LX$. D.h., beim Übergang von der Standardbasis auf eine Jordanbasis (gegeben durch die Rechtshauptvektoren $X$), hat dann die Matrix $L$ die bekannte Jordansche Normalform.

Eine Anwendung der Jordanschen Normalform, bzw. der Schurschen Normalform, ist der Äquivalenzsatz für äquivalente Matrizen.

3. Definition: (1) Zwei Matrizen $A,B\in\mathbb{C}^{m\times n}$, heißen äquivalent, wenn $A=RBS$, mit invertierbarer $(m\times m)$-Matrix $R$ und invertierbarer $(n\times n)$-Matrix $S$.

(2) Zwei Matrizen $A,B\in\mathbb{C}^{n\times n}$ heißen ähnlich, falls $A=SBS^{-1}$, mit invertierbarer Matrix $S$.

Ähnlichkeit zweier Matrizen heißt nichts anderes, als Übergang zu einer anderen Basis. Der nächste Satz sagt, daß Äquivalenz zweier Matrizen nur eine Ranginvariante ist, sonst nichts.

4. Satz: (1) Zu jeder Matrix $A\in\mathbb{C}^{m\times n}$ existieren invertierbare Matrizen $R\in\mathbb{C}^{m\times m}$ und $S\in\mathbb{C}^{n\times n}$, sodaß

$$ RAS = \mathop{\rm diag}(1,\ldots,1,0,\ldots,0), $$

wobei die Anzahl der Einsen gleich $\mathop{\rm rank} A$ ist.

(2) Zwei Matrizen $A,B\in\mathbb{C}^{m\times n}$ sind genau dann äquivalent, wenn sie den gleichen Rang haben.

Beweis: Entweder über das Gaußsche Eliminationsverfahren mit Zeilen- und Spaltenvertauschungen oder über Jordansche oder Schursche Normalform. (2) folgt sofort aus (1). ☐

Eine weitere Anwendung hiervon ist

5. Satz: $A\in\mathbb{C}^{m\times n}$ hat den genauen Rang $r$ ($0\le r\le\min(m,n)$) $\iff$ es gibt mindestens einen nicht verschwindenden Minor der Ordnung $r$, sämtliche Minoren der Ordnung $r+1,r+2,\ldots,\min(m,n)$ verschwinden.

Beweis: Folgt sofort aus

$$ \left(\mathop{\rm diag}(1,\ldots,1,0,\ldots,0)\right)\multisubsup sik = (RAS)\multisubsup sik. $$

☐

3. Kronecker-Produkt

1. Definition: Sei $A\in\mathbb{C}^{m\times m}$ und $B\in\mathbb{C}^{n\times n}$. Dann heißt

$$ A\otimes B = \pmatrix{ a_{11}B & \ldots & a_{1m}B\cr \vdots & \ddots & \vdots\cr a_{m1}B & \ldots & a_{mm}B\cr } \in \mathbb{C}^{mn\times mn}, $$

also

$$ a_{k\ell} = a_{ri} b_{sj} \qquad k=(r-1)n+s,\quad \ell=(i-1)n+j. $$

das Kronecker-Produkt oder direktes Produkt von $A$ und $B$.

Turning 60 - Now What?

Tue, 04 Jun 2024 12:00:00 +0200

I turned 60 this year. I had written a similar post ten years ago: Turning 50 - Now what? What has happened in the last ten years?

The last time I wrote:

I am married and have three children. All three children show interest in society and technology, and will likely find their way through life. Being a husband and father of three is a story on its own, meriting a separate blog post.

All three children are now adults. They show a strong interest in science and technology and have made this their profession. They earn their own money, and live their own lives, mostly with their partner, whom they have found along the way. I couldn't be more proud to see them succeed.

In 2016, I lost my father, see Arnold Klausmeier. Last year, my wife lost her father. So we are both now fatherless.

After a long and stressful project, which finished in 2019, I decided to reduce my workload to 60%. Since 2020, I only work on Wednesday, Thursday, and Friday. I should have done this way earlier. Before that decision to reduce the workload, I struggled emotionally with this decision for quite some time, although rationally things were pretty clear.

I have lived in the same house and have driven the same car since ten years ago. I updated my various PCs, smartphones, and tablets. Advances in this technology are still breathtaking.

I traveled to Poland and Canada and visited France multiple times.

The CoVID pandemic didn't affect me personally, except that I had to postpone my planned trip to Montreal by three years. I worked from home before the pandemic and still do. Now working from home has become normal for many more people. In that respect, CoVID had acted as an accelerator for something that was ripe anyway.

According German mortality table, I have 21.5 years left to live. I can therefore write two more posts:

Turning 70 - Now what?
Turning 80 - Now what?

;-)

Dark Mode on Website

Mon, 03 Jun 2024 22:00:00 +0200

This blog offered to switch betwen light and dark mode. But this choice was not stored anywhere. So any page you clicked on, you had to choose dark mode again. Now I store this choice in localStorage on the client.

While cookies could also be used for storing this choice, they are not needed for this. A cookie is first created on the server side and then transfered to the client, which then resends it whenever communicating with this server again. LocalStorage is different. LocalStorage is stored on the client, and never leaves the client. I.e., localStorage is only set and read via JavaScript on the client.

1. CSS. The actual CSS for dark and light colors uses CSS variables and is as below:

:root { --bg-color:#fffff8; --bgAcolor:white; color:black; --h1Color:DarkBlue; --thColor:LightBlue; --nthChild:#f2f2f2; --klmwidth:46rem; }
.dark-mode { background-color:#22160B; color:white; --bgAcolor:black; --h1Color:LightBlue; --thColor:DarkBlue; --nthChild:#935116;
    --pagefind-ui-primary: #eeeeee; --pagefind-ui-text: #eeeeee; --pagefind-ui-background: #152028; --pagefind-ui-border: #152028; --pagefind-ui-tag: #152028;
}
body {
    background-color: var(--bg-color);
    font-family:Merriweather,"Times New Roman",ui-serif,Georgia,Cambria,Times,serif;
    /*font-size: 28px;  font-weight: 100;*/
    margin: auto;
    max-width: var(--klmwidth);
}

th { border: 1px solid Black; background-color:var(--thColor); padding:0.3rem 0.5rem 0.3rem 0.5rem; position:sticky; top:0 }
tr:nth-child(even) { background-color:var(--nthChild); }

We use below two icons for switching between dark and light.

Light	Dark

The "button" for switching between dark and light mode is:

2. JavaScript. We use the string "dark-theme" as the key for localStorage. Its values can be either 1 or 0. When the toggle function darkLightToggle() is called, it has a switch whether to store the decision in localStorage or not. We take care to only set a value in localStorage if the user actually has overriden the default. The default can be either light or dark, depending on the browser setting.

We use the "window load" event in JavaScript to check both:

localStorage
media query

3. Media query. In the Brave web-browser you can force the dark mode. Go to Settings, then Appearance. Brave calls it "Night Mode" instead of dark mode.

This setting will then trigger the above media query.

Added 13-Jul-2024: I am now member of The Darktheme Club.

IGYRC5108-U: IBM Cobol Compiler Terminating

Wed, 29 May 2024 20:00:00 +0200

I stumbled on below error message from the IBM COBOL compiler for mainframe:

IGYRC5108-U   COBOL COMPILER TERMINATING:  UNCORRECTABLE PROGRAM INTERRUPT CONDITION.

The COBOL program was not fully compiled but compilation stopped midway.

Further messages from the compile run:

PROGRAM CHECK, INTERRUPT CODE 04
PSW 078D2F00 CAA178A4

REGISTERS 0 - 15:
4A9973EE 00000084 0013421C 00000026         00081180 00000000 00069364 00079C54
00000000 00000000 00079C54 4A9C6B48         4A9972BC 4A997000 CA9C68C6 CA9C6D42

 CURRENT COBOL COMPILER PHASE: IGYCPANA (STORAGE LOC: CA9C6000)

LAST MODULE RECORDING CONTROL: IGYNXML  (STORAGE LOC: 4AA153D0)
INTERRUPT OCCURRED AT STORAGE LOCATION: 00A178A4
CURRENT LINE NUMBER: 010055 PP 5655-EC6 IBM ENTERPRISE COBOL FOR Z/OS  6.3.0 P210118       CZF02     DATE 05/17/2024  TIME 18:38:56   PAGE   231

Compiler in question is "Enterprise COBOL for z/OS 6.3.0".

Searching for this error message brought below entry from IBM: IGYRC5108-U COBOL Compiler Error Message for Phase IGYCPSCN. This is not pointing to the real cause.

The real problem is that you have used too much space in WORKING-STORAGE SECTION. I.e., your variables in there consume too much space. When remembering Memory Limitations with IBM Enterprise COBOL Compiler this comes as no shocking surprise. So reducing this occupied space solved the problem. In my case I had a couple of large PIC X(256000) clauses.

Member of 1MB club

Thu, 16 May 2024 20:00:00 +0200

I am now a member of the 1MB club. The members must have websites with size below 1 MB. This new membership is not surprising as I am already a member of the two clubs:

This 1MB club has 781 members as of today.

Becoming member there is by using:

git init
echo "---\npageurl: eklausmeier.goip.de\nsize: 133.1\n---\n" > eklausmeier.goip.de.md
git format-patch -1

The resulting file is then sent via e-mail to patches@btxx.org.

A renewed check of my blog on https://tools.pingdom.com shows:

Performance grade: 93
Page size: 102.5 KB
Load time: 220ms from Frankfurt
Requests: 7

Content and requests by content type:

Waterfall diagram:

Performance Comparison of Wendt Website: WordPress vs. Simplified Saaze

Tue, 14 May 2024 19:00:00 +0200

In the previous post Example Theme for Simplified Saaze: Wendt I demonstrated the transition from a website using WordPress to Simplified Saaze. This very blog, which you are reading right now, also uses Simplified Saaze. This post shows how much better performance-wise this transition was. The comparison is therefore between:

Original: WordPress version, publicomag.com
Modified: Simplified Saaze version of PublicoMag

The original website https://www.publicomag.com is hosted on Cloudflare. It uses WordPress.

1. Comparison. For the comparison I use the website tools.pingdom.com, which provides various metrics to evaluate the performance of a website:

Page size
Number of requests
Load time
Concrete tips to improve performance
Waterfall diagram of requests
Breakdown of content types

The first few tests in Pingdom were conducted for Europe/Frankfurt, as I host all stuff on below machine in my living room not far from Frankfurt.

The post in question is Inspiration als Energiequelle: Neues vom grünen Hauptmann von Köpenick. The version using Simplified Saaze is here. This post contains 5 images and 13 comments. All images are served directly from https://www.publicomag.com. I.e., no side has any advantage in that respect. I had already blogged on this here: Performance Remarks on PublicoMag Website.

The results are thus:

Original (WordPress)	Modified (Simplified Saaze)

The results for the original website, based on WordPress, are indeed worse on every dimension: page size, load time, number of requests. In comparison to the modified version using Simplified Saaze the ratio is roughly:

Page size is almost than 3:1
Load time is almost 8:1
Number of requests is more than 5:1

So Simplified Saaze is better in all dimensions by a factor. Load time is particularly striking. This is quite noteworthy as the Simplified Saaze version is entirely self-hosted, i.e., upload to the internet is limited to 50 MBit/s!

The recommendations for the original website are therefore not overly surprising:

The missing compression is clearly an oversight on the web-server part.

The breakdown of the content type for the original WordPress website is:

One can clearly see that half of the page size are images, one third is JavaScript, fonts and CSS each have roughly 8%, the actual HTML content is just 2%.

I uploaded the Simplified Saaze version to Netlify, which provides CDN functionality. I measured again the WordPress post requested from San Francisco, and the Simplified Saaze version from San Francisco. The measurements are pretty similar to the Frankfurt results.

Original (WordPress) San Francisco	Modified (Simplified Saaze) San Francisco

Surprisingly, the Simplified Saaze version on Cloudflare has loading time of 5.24 seconds from San Francisco. Vercel is in line with Netlify and has load times of 385 ms.

For comparison I also hosted the Simplified Saaze version on https://www.lima-city.de. Load times are 248 ms for Frankfurt. Load times are 943 ms for San Francisco.

2. Modified website. The breakdown of the modified site, based on Simplified Saaze, is as below.

Actual loading of the modified site will roughly follow below waterfall diagram. This waterfall diagram shows that a major part of the loading time is spent in loading Google's Playfair fonts. This is quite surprising. The other fonts from Google load in record time.

3. Known limitations. Alexander Wendt wrote about some general limitations with the used technical solution so far:

Trotzdem sind wir zuversichtlich, demnächst das eine oder andere technische Problem hoffentlich befriedigend zu lösen. Generell braucht Publico eine schrittweise Erneuerung seiner technischen Plattform, die in ihren Grundzügen von 2017 stammt.

4. Low powered devices. Dan Luu noted in How web bloat impacts users with slow devices that many so-called modern websites are more or less unusable on older or low-powered devices. Some quotes:

If you've never used a low-end device like this, the general experience is that many sites are unusable on the device and loading anything resource intensive (an app or a huge website) can cause crashes.

Software developers underestimate the impacts low-powered devices have, when loading websites:

People seem to really underestimate the dynamic range in wealth and income across the world.

Example Theme for Simplified Saaze: Wendt

Mon, 13 May 2024 16:35:00 +0200

Another theme for Simplified Saaze called "Wendt". You can inspect it here.

It offers below features:

Responsive with media breaks for large and small screens, and for printing.
Top menu with submenus.
Two column using CSS grid, "Holy Grail Layout".
Multiple blogs:
- Each category has its own blog by using filtering.
- Each author has its own blog by using filtering.
- Aggregate blog, i.e., the combination of the above.
Using the tag to showcase the initial content of a blog post.
Sitemap in HTML and XML, RSS feed.
WebAssembly based search using pagefind.
No cookies, therefore no annoying cookie banner required.

The theme looks like this:

This theme is modeled after the blog from Alexander Wendt. That blog is powered by WordPress and hosted on Cloudflare. I have written on this PublicoMag website: Performance Remarks on PublicoMag Website. Alexander Wendt started this blog in October 2017. The number of posts per year are given in below table. Year 2024 is not complete. As time passes the year 2024 will have more and more posts.

Year	17	18	19	20	21	22	23	24
#posts	50	237	191	190	179	177	168	43
#comments	721	3999	3211	2973	2480	1300	1115	230

Number of comments were counted like this (varying 2017 to 2024):

perl -ne 'if (/^(\d+) Kommentare <\/h5>/) { $s+=$1; printf("%d\t%d\t%s\n",$1,$s,$ARGV); }' 2017*

1. Installation

There are two parts in the installation.

1. Install the theme including content and the Simplified Saaze static site generator using composer:

$ composer create-project eklausme/saaze-wendt
Creating a "eklausme/saaze-wendt" project at "./saaze-wendt"
Installing eklausme/saaze-wendt (v1.0)
  - Downloading eklausme/saaze-wendt (v1.0)
  - Installing eklausme/saaze-wendt (v1.0): Extracting archive
Created project in /tmp/T/saaze-wendt
Loading composer repositories with package information
Updating dependencies
Lock file operations: 1 install, 0 updates, 0 removals
  - Locking eklausme/saaze (v2.2)
Writing lock file
Installing dependencies from lock file (including require-dev)
Package operations: 1 install, 0 updates, 0 removals
  - Downloading eklausme/saaze (v2.2)
  - Installing eklausme/saaze (v2.2): Extracting archive
Generating optimized autoload files
No security vulnerability advisories found.
        real 3.08s
        user 0.48s
        sys 0
        swapped 0
        total space 0

2. The Simplified Saaze installation is described in Simplified Saaze. It documents how to check for PHP version, check for yaml-parsing, FFI, MD4C extension, etc.

Once everything is installed, just run php saaze -mor.

2. Downloading all WordPress content

We need a list or URLs available.

Below approach did not work: We use the month list in WordPress.

for i in `seq 2018 2023`; do for j in `seq -w 01 12`; do curl https://www.publicomag.com/$i/$j/ > m$i-$j.html; done; done

Special cases for 2017 and 2024:

curl https://www.publicomag.com/2017/10/ -o m2017-10.html
curl https://www.publicomag.com/2017/11/ -o m2017-11.html
curl https://www.publicomag.com/2017/12/ -o m2017-12.html
...
curl https://www.publicomag.com/2024/03/ -o m2024-03.html

It turned out that the month-lists lack links. To be exact: It lacks more than 466 URLs.

This approach fetches all links:

$ curl https://www.publicomag.com/ -o wendt-p1.html
$ time ( for i in `seq 2 124`; do
    curl https://www.publicomag.com/page/$i/ -o wendt-p${i}.html;
  done )

This creates 124 files:

$ ls -alFt | head
total 25580
drwxr-xr-x 2 klm klm   4096 Apr  2 11:34 ./
drwxr-xr-x 4 klm klm   4096 Apr  2 11:33 ../
-rw-r--r-- 1 klm klm 208194 Apr  2 11:28 wendt-p1.html
-rw-r--r-- 1 klm klm 187908 Apr  2 11:27 wendt-p124.html
-rw-r--r-- 1 klm klm 203575 Apr  2 11:27 wendt-p123.html
-rw-r--r-- 1 klm klm 206497 Apr  2 11:27 wendt-p122.html
-rw-r--r-- 1 klm klm 207572 Apr  2 11:27 wendt-p121.html
-rw-r--r-- 1 klm klm 207970 Apr  2 11:27 wendt-p120.html
-rw-r--r-- 1 klm klm 206010 Apr  2 11:27 wendt-p119.html
...

List of URLs:

perl -ne 'print $1."\n" if / allURL

Downloading all posts uses below Perl script blogwendtcurl:

#!/bin/perl -W
# Download content from www.publicomag.com (Alexander Wendt) given a list of URLs
# Elmar Klausmeier, 05-Mar-2024

use strict;
my $fn;
my @F;

while (<>) {
    chomp;
    @F = split('/');
    $F[5] =~ s/a%cc%88/ä/;
    $fn = $F[3] . '-' . $F[4] . '-' . $F[5] . '.html';
    printf $fn . "\n";
    `curl $_ -o $fn`;
}

This creates a list of HTML files:

$ ls -alFt | head
total 175856
drwxr-xr-x 3 klm klm   4096 Mar  7 19:16 ../
drwxr-xr-x 2 klm klm  69632 Mar  5 19:53 ./
-rw-r--r-- 1 klm klm 203580 Mar  5 19:53 2024-03-18471.html
-rw-r--r-- 1 klm klm 252784 Mar  5 19:53 2024-03-wenn-die-zukunft-ans-fenster-des-gruenen-hauses-klopft.html
-rw-r--r-- 1 klm klm 203765 Mar  5 19:53 2024-03-zeller-der-woche-niedere-gruende.html
-rw-r--r-- 1 klm klm 203337 Mar  5 19:53 2024-02-zeller-der-woche-widerstaendler.html
-rw-r--r-- 1 klm klm 231904 Mar  5 19:52 2024-02-das-nie-wieder-deutschland-und-seine-millionen-fuer-judenhasser.html
...

3. Analyzing content types

1. Fonts.

Logo: Shadows Into Light Two, original uses image instead. Another contender could be Croissant One.
Text: Playfair Display

2. Categories. Categories over all posts are as follows:

$ perl -ne 'print $1."\n" if / hentry category-([-\w]+)/' *.html | sort | uniq -c | sort -rn
    595 spreu-weizen
    486 politik-gesellschaft
    122 medien-kritik
     28 fake-news
      3 hausbesuch
      1 film

Different, i.e., multiple, categories can be attributed to a single post. However, the majority of posts only has a single category attached.

In the above list there is no categoriy "alte-weise". I added this category.

We want to convert images in "Alte-Weise" to text. That way loading those pages should be way quicker. Therefore we need to download those images and convert them with tesseract.

3. URLs. Below Perl one-liners produces a list of URLs for the images.

perl -ne 'print "$1$2\n" if (/^ ../allAlte-WeiseURL

Downloading these images:

perl -ane 'chomp; @F=split(/\//); `curl $_ -o $F[7]`' ../allAlte-WeiseURL
curl https://www.publicomag.com/wp-content/uploads/2023/01/Alte-Weise_C.Wright-Mills-1011x715.jpg -o Alte-Weise_Wright_Mills-scaled.jpg

4. JavaScript. A huge number of JavaScript libraries are loaded. We will get rid of them all.

Google Analytics
JQuery Minimal
JQuery Migrate
WordPress User Avatar
Buzzblog Hercules Likes
Borlabs Cookies Prioritize
WordPress GDPR Compliance
Comment Reply
Contact Form
JQuery Easing for Buzzblog
JQuery MagnificPopup for Buzzblog
JQuery Plugins for Buzzblog
JQuery JustifiedGallery for Buzzblog
Buzzblog Bootstrap
Owl Carussel for Buzzblog
Buzzblog AnimatedHeader
Shariff
MailPoet
Akismet
Borlabs Cookies Minimal

4. Reducing number of images

An easy target is the logo: this was replaced with plain text. This saves one roundtrip to the web-server.

1. For the category "alte-weise" the entire image with text is converted to two elements:

An image
The actual text

The image is scanned with tesseract.

That way the text can be searched via Pagefind. Also, the required bandwidth is reduced.

Old:

New:

The new approach is to use a blockquote, where the CSS puts an image on top:

blockquote blockquote {
    background: transparent no-repeat top/30% url('/img/Alte-Weise-Kopf.svg');
    text-align:center;
    padding-left:2rem;
    padding-right:2rem;
    padding-top:12rem;
    padding-bottom:1rem;
    background-color:#b6c7c8; border-radius:2.5rem
}

The actual text in Markdown is then:

>> „Zweifel ist nicht das Gegenteil, sondern ein Element des Glaubens.“
>>
>> Paul Tillich

That way the ordinary blockquote in Markdown (single >) is left free to be used for citations.

Obviously, entering the text in >> is way easier than producing an image for each epigram.

2. Care was taken to reduce the number of images needed for the social media icons.

Old:

New:

That reduces loading eight images. However, you need to load some font glyphs.

 🮰 Telegram

In particular this symbol U+1fbb0 is %F0%9F%AE%B0 when URL encoded:

@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+Symbols+2&text=%F0%9F%97%8F%F0%9F%AE%B0%F0%9F%96%82%F0%9F%96%A8');

Similarly, symbol U+1f5cf is %F0%9F%97%8F when URL encoded.

5. Converting WordPress HTML to Markdown

Perl script blogwendtmd is used to convert a single HTML file to Markdown.

$ time ( for i in *.html; do blogwendtmd $i; done )
        real 94.95s
        user 136.51s
        sys 0
        swapped 0
        total space 0

The long runtime is exclusively for running tesseract, i.e., the conversion from image to text. Once all WordPress posts are converted to Markdown, this script no longer needs to be run, obviously.

blogwendtmd is 180 lines of Perl code.

Listing of all authors and their corresponding directories.

$ perl -ne 'print $1."\n" if /\/author\/([^\/]+)\//' 2*.html | sort -u
alexander
archi-bechlenberg
bernd-zeller
cora-stephan
david-berger
hansjoerg-mueller
joerg-friedrich
matthias-matussek
redaktion
samuel-horn
wolfram-ackner

Each of these authors have a separate index beneath /author/.

Generating all yearly overviews:

for i in *; do ( echo $i; cd $i; blogwendtdate -gy$i *.md > index.md ) done

Perl script blogwendtdate generates a Markdown file, which contains all articles for the corresponding year. This script first has to store all posts for one year in a hash, sort it according to date in the frontmatter.

my @L;	# list of posts in a year, in the beginning not necessarily sorted

sub markdownfile(@) {
    my $f = $_[0];
    my ($flag,$title,$date,$draft) = (0,"","",0);
    open(F,"<$f") || die("Cannot open $f");
    while () {
        if (/^\-\-\-\s*$/) {
            last if (++$flag >= 2);
        . . .
    }
    if ($draft == 0  &&  length($title) > 0  &&  length($date) > 0) {
        push(@L, sprintf("%s: [%s](%s%s)",$date,$title,$prefix,substr($f,0,-3)) );
    }
    close(F) || die("Cannot close $f");
}

while (<@ARGV>) {
    #printf("ARGV=|%s|\n",$_);
    next if (substr($_,-8) eq "index.md");
    markdownfile($_);
}

for (sort @L) {
    printf("%d. %s\n",++$cnt,$_);
}

Many HTML errors were corrected, which were reported by Nu Html Checker. See for example das-magische-sprechen-schafft-macht-fuer-den-augenblick.

6. Handling comments

The Publico blog contains comments, where readers have left their thoughts. In Perl script blogwendtmd we detect comments by checking for

tags for the beginning, and pinglist for the end of all comments.

if (/^/) { $flag = 0; next; }
elsif (//) {
    ...
    $flag = 1;
}
next if ($flag == 0);

We refrained from integrating the commenting system HashOver. It is not difficult, as we have already demonstrated in the Lemire theme. However, for a political blog a comment system is rather "dangerous", as it can attract rather unwelcoming writings. Under German law the hoster of these comments becomes liable. Essentially, you therefore must check every comment manually:

... da die Kommentare alle gesichtet werden müssen und die Redaktion nach wie vor aus dem Gründer Alexander Wendt und einer Teilzeitredakteurin besteht, können sie nicht umgehend online gehen.

In light of the high volume of comments HashOver should most probably be added.

7. Running static site generator

In serial mode it takes less than 3 seconds to build 19 collections without comments. With comments it takes less than 6 seconds to process 23 thousand pages, see below. This build time can be almost halved by using parallelisation with -p16.

$ time php saaze -morb /tmp/build
Building static site in /tmp/build...
    execute(): filePath=./content/alexander.yml, nSIentries=770, totalPages=39, entries_per_page=20
    execute(): filePath=./content/alte-weise.yml, nSIentries=131, totalPages=7, entries_per_page=20
    execute(): filePath=./content/archi-bechlenberg.yml, nSIentries=5, totalPages=1, entries_per_page=20
    execute(): filePath=./content/bernd-zeller.yml, nSIentries=332, totalPages=17, entries_per_page=20
    execute(): filePath=./content/cora-stephan.yml, nSIentries=1, totalPages=1, entries_per_page=20
    execute(): filePath=./content/david-berger.yml, nSIentries=1, totalPages=1, entries_per_page=20
    execute(): filePath=./content/fake-news.yml, nSIentries=28, totalPages=2, entries_per_page=20
    execute(): filePath=./content/film.yml, nSIentries=1, totalPages=1, entries_per_page=20
    execute(): filePath=./content/hansjoerg-mueller.yml, nSIentries=2, totalPages=1, entries_per_page=20
    execute(): filePath=./content/hausbesuch.yml, nSIentries=2, totalPages=1, entries_per_page=20
    execute(): filePath=./content/joerg-friedrich.yml, nSIentries=2, totalPages=1, entries_per_page=20
    execute(): filePath=./content/mag.yml, nSIentries=1235, totalPages=62, entries_per_page=20
    execute(): filePath=./content/matthias-matussek.yml, nSIentries=1, totalPages=1, entries_per_page=20
    execute(): filePath=./content/medien-kritik.yml, nSIentries=123, totalPages=7, entries_per_page=20
    execute(): filePath=./content/politik-gesellschaft.yml, nSIentries=486, totalPages=25, entries_per_page=20
    execute(): filePath=./content/redaktion.yml, nSIentries=112, totalPages=6, entries_per_page=20
    execute(): filePath=./content/samuel-horn.yml, nSIentries=3, totalPages=1, entries_per_page=20
    execute(): filePath=./content/spreu-weizen.yml, nSIentries=596, totalPages=30, entries_per_page=20
    execute(): filePath=./content/wolfram-ackner.yml, nSIentries=6, totalPages=1, entries_per_page=20
Finished creating 19 collections, 19 with index, and 1248 entries (2.58 secs / 809.47MB)
#collections=19, parseEntry=0.7290/23712-19, md2html=1.1983, toHtml=1.2839/23712, renderEntry=0.1562/1248, renderCollection=0.0403/224, content=23712/0
    real 5.16s
    user 4.36s
    sys 0
    swapped 0
    total space 0

Running pagefind, i.e., indexing al keywords for the WebAssembly based search functionality:

$ time pagefind -s . --exclude-selectors aside --exclude-selectors footer --force-language=de

Running Pagefind v1.0.4
Running from: "/tmp/buildwendt"
Source:       ""
Output:       "pagefind"

[Walking source directory]
Found 1473 files matching **/*.{html}

[Parsing files]
Did not find a data-pagefind-body element on the site.
↳ Indexing all  elements on the site.

[Reading languages]
Discovered 1 language: de

[Building search indexes]
Total:
  Indexed 1 language
  Indexed 1473 pages
  Indexed 133261 words
  Indexed 0 filters
  Indexed 0 sorts

Finished in 19.644 seconds
        real 19.87s
        user 18.28s
        sys 0
        swapped 0
        total space 0

It would take 11 seconds without comments, i.e., indexing 77168 words.

8. Collections

There are quite a number of collections at play in this theme. The most important one being mag (short for magazine). This directory contains all the blog posts. All the other collections are just symbolic links to mag, i.e., they do not contain additional content.

total 96
drwxr-xr-x  4 klm klm 4096 Apr 27 17:11 ./
drwxr-xr-x  7 klm klm 4096 May 13 13:00 ../
lrwxrwxrwx  1 klm klm    3 Mar 26 21:48 alexander -> mag/
-rw-r--r--  1 klm klm  273 Apr  2 18:56 alexander.yml
lrwxrwxrwx  1 klm klm    3 Apr 27 17:11 alte-weise -> mag/
-rw-r--r--  1 klm klm  225 Apr 27 17:10 alte-weise.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:22 archi-bechlenberg -> mag/
-rw-r--r--  1 klm klm  495 Apr  2 18:58 archi-bechlenberg.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:17 bernd-zeller -> mag/
-rw-r--r--  1 klm klm  213 Apr  2 18:01 bernd-zeller.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 15:18 cora-stephan -> mag/
-rw-r--r--  1 klm klm  707 Apr  2 19:01 cora-stephan.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 15:17 david-berger -> mag/
-rw-r--r--  1 klm klm  761 Apr  2 19:06 david-berger.yml
drwxr-xr-x  2 klm klm 4096 Apr  2 16:24 error/
-rw-r--r--  1 klm klm   88 Apr  2 16:21 error.not_used_yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 fake-news -> mag/
-rw-r--r--  1 klm klm  216 Apr  2 19:42 fake-news.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 film -> mag/
-rw-r--r--  1 klm klm  201 Apr  2 19:43 film.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:22 hansjoerg-mueller -> mag/
-rw-r--r--  1 klm klm  318 Apr  2 18:56 hansjoerg-mueller.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 hausbesuch -> mag/
-rw-r--r--  1 klm klm  219 Apr  2 19:42 hausbesuch.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 15:18 joerg-friedrich -> mag/
-rw-r--r--  1 klm klm  222 Apr  2 18:01 joerg-friedrich.yml
drwxr-xr-x 10 klm klm 4096 May 12 20:56 mag/
-rw-r--r--  1 klm klm  110 Apr  1 22:25 mag.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:22 matthias-matussek -> mag/
-rw-r--r--  1 klm klm  228 Apr  2 18:02 matthias-matussek.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 medien-kritik -> mag/
-rw-r--r--  1 klm klm  234 Apr  2 19:27 medien-kritik.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 17:47 politik-gesellschaft -> mag/
-rw-r--r--  1 klm klm  255 Apr  2 17:59 politik-gesellschaft.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:16 redaktion -> mag/
-rw-r--r--  1 klm klm  202 Apr  2 18:03 redaktion.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:21 samuel-horn -> mag/
-rw-r--r--  1 klm klm  259 Apr  2 19:03 samuel-horn.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 spreu-weizen -> mag/
-rw-r--r--  1 klm klm  231 Apr  2 19:27 spreu-weizen.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:22 wolfram-ackner -> mag/
-rw-r--r--  1 klm klm  542 Apr  2 19:05 wolfram-ackner.yml

The collection yaml files look like this. First mag.yml:

title: Publico
sort_field: date
sort_direction: desc
index_route: /
entry_route: /{slug}
more: true
rss: true

Now alexander.yml, which filters for author:

title: Publico - Autor Alexander Wendt
subtitle: "Alexander Wendt ist Herausgeber von Publico."
sort_field: date
sort_direction: desc
index_route: /author/alexander
entry: false
entry_route: /{slug}
more: true
filter: return ($entry->data['author'] === 'Alexander Wendt');

Similarly, alte-weise.yml, which filters for categories:

title: Publico - Alte & Weise
sort_field: date
sort_direction: desc
index_route: /alte-weise
entry: false
entry_route: /{slug}
more: true
filter: return (array_search('alte-weise',$entry->data['categories']) !== false);

Except mag.yml, all other yaml files set rss: false.

9. Templates

This theme uses the following PHP template files:

bottom-layout.php: commonalities for the bottom part
entry.php: template for the entry, i.e., the usual blog post
error.php: 404 page, or other error conditions
head.php: HTML for the first few lines for all HTML files
index.php: template for the index, i.e., the listing of posts
overview.php: HTML sitemap
rss.php: RSS feed
sitemap.php: XML sitemap
top-layout.php: commonalities for the top part

I use the following hierarchy of PHP files for my entry-template, i.e., the template for a blog post:

# entry.php ## top-layout.php ### head.php ## Actual content: $entry['content'] ## bottom-layout.php

The following hierarchy is used for the index-template, i.e., the template for showing a reverse-date sorted list of blog posts:

# index.php ## top-layout.php ### head.php ## for-loop over entry-excerpts ## bottom-layout.php

Converting UNIX Timestamps to Year, Month, Day in COBOL

Fri, 03 May 2024 19:00:00 +0200

1. Task at hand. COBOL programs reads UNIX timestamps as input. Output should be the values of year, month, day, hour, minutes, seconds.

In C this is just gmtime(). gmtime accepts time_t and produces struct tm:

struct tm *gmtime(const time_t *timep);

On mainframe, however, it is sometimes a little inconvienent to call a C routine from COBOL. It is easier to just code the short algorithm in COBOL.

2. Approach. P.J. Plauger's book "The Standard C Library" contains the source code for gmtime() and localtime(). This code is then translated to COBOL.

The C code is as below.

/* Convert UNIX timestamp to triple (year,month,day)
   Elmar Klausmeier, 01-Apr-2024
*/

#include 
#include 
#include 
#include 

// From P.J.Plauger: The Standard C Library, Prentice Hall, 1992

static const int daytab[2][12] = {
    { 0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335 },	// leap year
    { 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 }
};

int daysTo (int year, int mon) {	// compute extra days to start of month
    int days;

    if (year > 0) days = (year - 1) / 4;	// correct for leap year: 1801-2099
    else if (year <= -4) days = 1 + (4 - year) / 4;
    else days = 0;

    return days + daytab[year&03 || (year==0)][mon];
}


struct tm *timeTotm (struct tm *t, time_t secsarg, int isdst) {	// convert scalar time to time struct
    int year, mon;
    const int *pm;
    long i, days;
    time_t secs;
    static struct tm ts;

    secsarg += ((70 * 365LU) + 17) * 86400;	// 70 years including 17 leap days since 1900
    if (t == NULL) t = &ts;
    t->tm_isdst = isdst;

    for (secs=secsarg; ; secs=secsarg+3600) {	// loop to correct for DST (not used here)
        days = secs / 86400;
        t->tm_wday = (days + 1) % 7;
        for (year = days / 365; days < (i=daysTo(year,0)+365L*year); --year)
            ;	// correct guess and recheck
        days -= i;
        t->tm_year = year;
        t->tm_yday = days;

        pm = daytab[year&03 || (year==0)];
        for (mon=12; daystm_mon = mon;
        t->tm_mday = days - pm[mon] + 1;

        secs %= 86400;
        t->tm_hour = secs / 3600;
        secs %= 3600;
        t->tm_min = secs / 60;
        t->tm_sec = secs % 60;

        //if (t->tm_isdst >= 0  ||  (t->tm_isdst = IsDST(t)) <= 0) return t;
        return t;
    }
}


int main (int argc, char *argv[]) {
    struct tm t;
    long secs;

    if (argc <= 1) return 0;
    secs = atol(argv[1]);

    timeTotm(&t, secs, 0);
    printf("timeTotm(): year=%d, mon=%d, day=%d, hr=%d, min=%d, sec=%d\n",
        1900+t.tm_year, 1+t.tm_mon, t.tm_mday, t.tm_hour, t.tm_min, t.tm_sec);

    return 0;
}

3. COBOL solution. Below is the COBOL code which was translated from above C code.

Fun fact: GNU Cobol crashed on some intermediate result, see cobc crashes on illegal COBOL source code file. This bug was fixed within a few hours by Simon Sobisch!

Below source code is compiled without problems.

000010 IDENTIFICATION DIVISION.
000020 PROGRAM-ID.   Timestamp2date.
000030 AUTHOR.       Elmar Klausmeier.
000040 DATE-WRITTEN. 02-May-2024.
000050
000060 DATA DIVISION.
000070 WORKING-STORAGE SECTION.
000080*
000090 01 year    PIC S9(18) comp-5.
000100 01 mon     PIC S9(18) comp-5.
000110 01 days    PIC S9(18) comp-5.
000120*
000130* Local helper variables
000140 01 i       PIC S9(18) comp-5.
000150 01 idays   PIC S9(18) comp-5.
000160 01 daysTo  PIC S9(18) comp-5.
000170 01 yearMod4  PIC S9(9) comp-5.
000180 01 leapIx  PIC S9(9) comp-5.
000190 01 daysP1  PIC S9(18) comp-5.
000200*
000210 01 secs    PIC S9(18) comp-5.
000220 01 secsarg PIC S9(18) comp-5.
000230*
000240*
000250* struct tm:
000260*    int tm_sec;    // Seconds          [0, 60]
000270*    int tm_min;    // Minutes          [0, 59]
000280*    int tm_hour;   // Hour             [0, 23]
000290*    int tm_mday;   // Day of the month [1, 31]
000300*    int tm_mon;    // Month            [0, 11]  (January = 0)
000310*    int tm_year;   // Year minus 1900
000320*    int tm_wday;   // Day of the week  [0, 6]   (Sunday = 0)
000330*    int tm_yday;   // Day of the year  [0, 365] (Jan/01 = 0)
000340*    int tm_isdst;  // Daylight savings flag
000350 01 tm_sec  PIC S9(9).
000360 01 tm_min  PIC S9(9).
000370 01 tm_hour PIC S9(9).
000380 01 tm_mday PIC S9(9).
000390*   range: 1-12
000400 01 tm_mon  PIC S9(9).
000410 01 tm_year PIC S9(9).
000420 01 tm_wday PIC S9(9).
000430 01 tm_yday PIC S9(9).
000440*
000450*
000460 01 daytabInit.
000470*   Number of days for leap year
000480    05 daytab-1-1 pic s9(9) comp-5 value 0.
000490    05 daytab-1-2 pic s9(9) comp-5 value 31.
000500    05 daytab-1-3 pic s9(9) comp-5 value 60.
000510    05 daytab-1-4 pic s9(9) comp-5 value 91.
000520    05 daytab-1-5 pic s9(9) comp-5 value 121.
000530    05 daytab-1-6 pic s9(9) comp-5 value 152.
000540    05 daytab-1-7 pic s9(9) comp-5 value 182.
000550    05 daytab-1-8 pic s9(9) comp-5 value 213.
000560    05 daytab-1-9 pic s9(9) comp-5 value 244.
000570    05 daytab-1-10 pic s9(9) comp-5 value 274.
000580    05 daytab-1-11 pic s9(9) comp-5 value 305.
000590    05 daytab-1-12 pic s9(9) comp-5 value 335.
000600*   Number of days for non-leap year
000610    05 daytab-2-1 pic s9(9) comp-5 value 0.
000620    05 daytab-2-2 pic s9(9) comp-5 value 31.
000630    05 daytab-2-3 pic s9(9) comp-5 value 59.
000640    05 daytab-2-4 pic s9(9) comp-5 value 90.
000650    05 daytab-2-5 pic s9(9) comp-5 value 120.
000660    05 daytab-2-6 pic s9(9) comp-5 value 151.
000670    05 daytab-2-7 pic s9(9) comp-5 value 181.
000680    05 daytab-2-8 pic s9(9) comp-5 value 212.
000690    05 daytab-2-9 pic s9(9) comp-5 value 243.
000700    05 daytab-2-10 pic s9(9) comp-5 value 273.
000710    05 daytab-2-11 pic s9(9) comp-5 value 304.
000720    05 daytab-2-12 pic s9(9) comp-5 value 334.
000730 01 daytabArr redefines daytabInit.
000740    05 filler     occurs 2 times.
000750       10 filler     occurs 12 times.
000760          15 daytab  pic s9(9) comp-5.
000770*
000780*
000790
000800 PROCEDURE DIVISION.
000810******************************************************************
000820* A100-main
000830******************************************************************
000840* Function:
000850*
000860******************************************************************
000870 A100-main SECTION.
000880 A100-main-P.
000890
000900*    initialize daytabArr.
000910*    move daytabInit to daytabArr
000920*    perform varying leapIx from 1 by 1 until leapIx > 2
000930*        perform varying mon from 1 by 1 until mon > 12
000940*            display 'daytab(' leapIx ',' mon ') = '
000950*                daytab(leapIx, mon)
000960*        end-perform
000970*    end-perform.
000980
000990     ACCEPT secsarg FROM ARGUMENT-VALUE
001000     perform v910-timeToTm
001010     display '        tm_sec  = ' tm_sec
001020     display '        tm_min  = ' tm_min
001030     display '        tm_hour = ' tm_hour
001040     display '        tm_mday = ' tm_mday
001050     display '        tm_mon  = ' tm_mon
001060     display '        tm_year = ' tm_year
001070     display '        tm_wday = ' tm_wday
001080     display '        tm_yday = ' tm_yday
001090
001100     STOP RUN.
001110
001120
001130* Convert UNIX timestamp to triple (year,month,day)
001140* Converted from C program
001150* From P.J.Plauger: The Standard C Library, Prentice Hall, 1992
001160
001170******************************************************************
001180* V900-daysTo
001190******************************************************************
001200* Function: compute daysTo given year and mon
001210*	          compute extra days to start of month
001220******************************************************************
001230 V900-daysTo SECTION.
001240 V900-daysTo-P.
001250
001260* correct for leap year: 1801-2099
001270     evaluate true
001280         when year > 0
001290             compute idays = (year - 1) / 4
001300         when year <= -4
001310             compute idays = 1 + (4 - year) / 4
001320         when other
001330             move zero to idays
001340     end-evaluate
001350
001360     compute yearMod4 = function mod(year,4)
001370     if yearMod4 not= zero or year = zero then
001380         move 2 to leapIx
001390     else
001400         move 1 to leapIx
001410     end-if
001420     compute daysTo = idays + daytab(leapIx, mon)
001430
001440     CONTINUE.
001450 END-V900-daysTo.
001460     EXIT.
001470
001480
001490******************************************************************
001500* V910-timeToTm
001510******************************************************************
001520* Function: compute tmT from secsarg (seconds since 01-Jan-1970)
001530*	          convert scalar time to time struct
001540******************************************************************
001550 V910-timeToTm SECTION.
001560 V910-timeToTm-P.
001570
001580* 70 years including 17 leap days since 1900
001590     compute secsarg = secsarg + ((70 * 365) + 17) * 86400;
001600     move secsarg to secs
001610
001620     compute days = secs / 86400
001630     add 1 to days giving daysP1
001640     compute tm_wday = function mod(daysP1, 7)
001650
001660     compute year = days / 365
001670     move 1 to mon
001680     perform until 1 = 0
001690         perform v900-daysTo
001700         compute i = daysTo + 365 * year
001710         if days >= i then
001720*            exit perform
001730             go to v910-endloop
001740         end-if
001750*        correct guess and recheck
001760         subtract 1 from year
001770     end-perform.
001780 v910-endloop.
001790
001800     subtract i from days
001810     move year to tm_year
001820     move days to tm_yday
001830
001840     compute yearMod4 = function mod(year,4)
001850     if yearMod4 not= zero or year = zero then
001860         move 2 to leapIx
001870     else
001880         move 1 to leapIx
001890     end-if
001900     move 12 to mon
001910     perform until days >= daytab(leapIx, mon)
001920         subtract 1 from mon
001930     end-perform
001940     move mon to tm_mon
001950     compute tm_mday = days - daytab(leapIx, mon) + 1
001960
001970     compute secs = function mod(secs,86400)
001980     compute tm_hour = secs / 3600;
001990     compute secs = function mod(secs,3600)
002000     compute tm_min = secs / 60;
002010     compute tm_sec = function mod(secs, 60)
002020
002030     CONTINUE.
002040 END-V910-timeToTm.
002050     EXIT.
002060
002070

4. Example output. Here are two examples. First example is Fri May 03 2024 14:16:01 GMT+0000. See https://www.unixtimestamp.com.

$ ./cobts2date 1714745761
        tm_sec  = +000000001
        tm_min  = +000000016
        tm_hour = +000000014
        tm_mday = +000000003
        tm_mon  = +000000005
        tm_year = +000000124
        tm_wday = +000000005
        tm_yday = +000000123

Second example is Thu Dec 31 1964 22:59:59 GMT+0000.

$ ./cobts2date -157770001
        tm_sec  = +000000059
        tm_min  = +000000059
        tm_hour = +000000022
        tm_mday = +000000031
        tm_mon  = +000000012
        tm_year = +000000064
        tm_wday = +000000004
        tm_yday = +000000365

For a list of leap years see Schaltjahr.

Installing and Configuring the H2O Web-Server

Sat, 13 Apr 2024 18:45:00 +0200

1. Task at hand. Install H2O web-server on Arch Linux. H2O is a web-server written by Kazuho Oku et al. It supports:

HTTP/1 and HTTP/1.1,
HTTP/2,
HTTP/3 ("QUIC"),
FastCGI, therefore PHP-FPM,
Reverse proxy,
Builtin mruby, though, that crashes.

In benchmarks it ranks at the top constantly. See Web Framework Benchmarks.

It works way faster than NGINX or Apache. It shines for static web content.

2. Building. The already existing AUR packages for H2O do not work. I.e., they generate a binary which crashes. Below PKGBUILD produces a H2O binary.

pkgname=h2o-master-git
pkgver=1.0
pkgrel=1
arch=('i686' 'x86_64')
pkgdesc="H2O: the optimized HTTP/1.x, HTTP/2, HTTP/3 server"
provides=(h2o)
url="https://h2o.examp1e.net"
source=("git+https://github.com/h2o/h2o.git?commit=master?signed/" h2o.service)
sha256sums=('SKIP' 734e9d045dd5568665762d48e4077208c3da8c68f87510aaa9559d495dd680fd)


build() {
    cd "$srcdir"/h2o
    cmake -DCMAKE_INSTALL_PREFIX=/usr .
    make
}

package() {
    cd "$srcdir"/h2o
    install -Dm 644 LICENSE "$pkgdir"/usr/share/licenses/$pkgname/LICENSE
    install -Dm 644 README.md "$pkgdir"/usr/share/doc/h2o/README.md
    install -Dm 644 "$srcdir"/h2o.service "$pkgdir"/usr/lib/systemd/system/h2o.service
    install -Dm 644 examples/h2o/h2o.conf "$pkgdir/etc/h2o.conf"
    make DESTDIR="$pkgdir" install
}

Compiling on AMD Ryzen 7 5700G, max clock 4.673 GHz, 64 GB RAM, finishes in less than two minutes.

$ time makepkg -f
...
==> Tidying install...
  -> Removing libtool files...
  -> Purging unwanted files...
  -> Removing static library files...
  -> Copying source files needed for debug symbols...
  -> Compressing man and info pages...
==> Checking for packaging issues...
==> Creating package "h2o-master-git"...
  -> Generating .PKGINFO file...
  -> Generating .BUILDINFO file...
  -> Generating .MTREE file...
  -> Compressing package...
==> Leaving fakeroot environment.
==> Finished making: h2o-master-git 1.0-1 (Fri 12 Apr 2024 09:48:36 PM CEST)
        real 92.42s
        user 447.76s
        sys 0
        swapped 0
        total space 0

3. Configuration. Below is a working configuration in file /etc/h2o.conf. The configuration accomplishes the following:

it serves http and https,
it compresses via gzip and brotli,
it is started user root, then switches to user http,
Log format is similar to the Hiawatha log-format,
PHP files are handled by php-fpm.

The entire configuration file is a YAML file.

listen: 80
listen: &ssl_listen
  port: 443
  ssl:
    certificate-file:    /etc/letsencrypt/live/eklausmeier.goip.de/fullchain.pem
    key-file:  /etc/letsencrypt/live/eklausmeier.goip.de/privkey.pem
    minimum-version: TLSv1.2
    cipher-preference: server
    cipher-suite: "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256"
    # Oldest compatible clients: Firefox 27, Chrome 30, IE 11 on Windows 7, Edge, Opera 17, Safari 9, Android 5.0, and Java 8
    # see: https://wiki.mozilla.org/Security/Server_Side_TLS

# The following three lines enable HTTP/3
listen:
  <<: *ssl_listen
  type: quic
header.set: "Alt-Svc: h3-25=\":443\""

user: http
#pid-file: /var/run/h2o/h2o.pid
#crash-handler: /usr/local/bin/h2obacktrace
access-log:
  path: /var/log/h2o/access.log
  format: "%h|%{%Y/%m/%d:%T %z}t|%s|%b|%r|%{referer}i|%{user-agent}i|%V:%p|"
error-log: /var/log/h2o/error.log
compress: [ br, gzip ]
#file.dirlisting: ON

file.custom-handler:
  extension: .php
  fastcgi.connect:
    port: /run/php-fpm/php-fpm.sock
    type: unix

hosts:
  0:
    paths:
      /jpilot/favicon.ico:
        file.file: /home/klm/php/saaze-jpilot/public/favicon.ico
      /jpilot/img:
        file.dir: /home/klm/php/saaze-jpilot/public/img
      /jpilot/jpilot.css:
        file.file: /home/klm/php/saaze-jpilot/public/jpilot.css
      /koehntopp/assets:
        file.dir: /home/klm/php/saaze-koehntopp/public/assets
      /koehntopp/jscss:
        file.dir: /home/klm/php/saaze-koehntopp/public/jscss
      /lemire/jscss:
        file.dir: /home/klm/php/saaze-lemire/public/jscss
      /mobility/img:
        file.dir: /home/klm/php/saaze-mobility/public/img
      /nukeklaus/img:
        file.dir: /home/klm/php/saaze-nukeklaus/public/img
      /nukeklaus/jscss:
        file.dir: /home/klm/php/saaze-nukeklaus/public/jscss
      /panorama/img:
        file.dir: /home/klm/php/saaze-panorama/public/img
      /paternoster/paternoster.css:
        file.file: /home/klm/php/saaze-paternoster/public/paternoster.css
      /saaze-example/blogklm.css:
        file.file: /home/klm/php/saaze-example/public/blogklm.css
      /vonhoff/img:
        file.dir: /home/klm/php/saaze-vonhoff/public/img
      /wendt/pagefind:
        file.dir: /home/klm/php/saaze-wendt/public/pagefind
      /:
        file.dir: /srv/http
        redirect:
          status: 301
          internal: YES
          url: /index.php?
      /p:
        mruby.handler: |
          Proc.new do |env|
            [200, {'content-type' => 'text/plain'}, ["Hello world"]]
          end

As already mentioned at the top: mruby doesn't work. Once you access /p the entire web-server crashes.

H2O does not offer URL rewriting out of the box. The above path-configurations operate on a prefix match schema. I.e., if the URL in question starts with the string provided, this is considered a match. The string after the match is appended to the part in file.dir.

4. Discussion. While alternatives to Apache and NGINX are highly welcome, the current state of H2O leaves many questions unanswered.

The builtin brotli compression is "stone old": it is seven years behind the official Google Brotli repository, which contains a number of serious fixes.
The builtin mruby software is two years behind, offering mruby version 3.1 instead of 3.3.
mruby crashes once called.
In the hosts part the hostname seems to have no effect.

I tried to replace the old mruby dependency with the current 3.3 version. The build of H2O then failed.

While embodying software packages directly into the H2O GitHub repo makes building the software easier, it risks that the included software rots. That's exactly what is happening here.

Fun fact: I noticed H2O when reading about the LWAN web-server written by L. Pereira. Both, Kazuho Oku and L. Pereira, work at Fastly.

Also see H2O Tutorial.

In case someone wants to analyze why mruby crashes, here is the result of where in gdb:

Core was generated by `h2o -c h2o.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000062085dc9ae9b in mrb_str_hash (mrb=, str=...) at /usr/src/debug/h2o-master-git/h2o/deps/mruby/src/string.c:1673
1673        hval ^= (uint32_t)*bp++;
[Current thread is 1 (Thread 0x7002156006c0 (LWP 18088))]
(gdb) where
#0  0x000062085dc9ae9b in mrb_str_hash (mrb=, str=...) at /usr/src/debug/h2o-master-git/h2o/deps/mruby/src/string.c:1673
#1  0x000062085dc8cb6c in obj_hash_code (h=0x7001d0028660, key=..., mrb=0x1a0) at /usr/src/debug/h2o-master-git/h2o/deps/mruby/src/hash.c:325
#2  ib_it_init (mrb=mrb@entry=0x7001d00015a0, it=it@entry=0x7002155fe550, h=h@entry=0x7001d0028660, key=...) at /usr/src/debug/h2o-master-git/h2o/deps/mruby/src/hash.c:645
#3  0x000062085dc8cd3a in ib_init (ib_byte_size=, ib_bit=, h=0x7001d0028660, mrb=) at /usr/src/debug/h2o-master-git/h2o/deps/mruby/src/hash.c:151
#4  ht_init (mrb=mrb@entry=0x7001d00015a0, h=h@entry=0x7001d0028660, size=size@entry=17, ea=0x7001d0047700, ea_capa=ea_capa@entry=25, ht=ht@entry=0x0, ib_bit=) at /usr/src/debug/h2o-master-git/h2o/deps/mruby/src/hash.c:793
#5  0x000062085dc8d11a in ar_set (mrb=0x7001d00015a0, h=0x7001d0028660, key=..., val=...) at /usr/src/debug/h2o-master-git/h2o/deps/mruby/src/hash.c:536
#6  0x000062085dc8c2e6 in h_set (val=..., key=..., h=0x7001d0028660, mrb=0x7001d00015a0) at /usr/src/debug/h2o-master-git/h2o/deps/mruby/src/hash.c:169
#7  mrb_hash_set (mrb=0x7001d00015a0, hash=..., key=..., val=...) at /usr/src/debug/h2o-master-git/h2o/deps/mruby/src/hash.c:1245
#8  0x000062085dc67938 in iterate_headers_callback (shared_ctx=shared_ctx@entry=0x7001d0001540, pool=pool@entry=0x7001d0076958, header=header@entry=0x7002155fe8d0, cb_data=cb_data@entry=0x7001d0028660) at /usr/src/debug/h2o-master-git/h2o/lib/handler/mruby.c:748
#9  0x000062085dc67e4c in h2o_mruby_iterate_native_headers (shared_ctx=shared_ctx@entry=0x7001d0001540, pool=, headers=, cb=cb@entry=0x62085dc678a0 , cb_data=cb_data@entry=0x7001d0028660)
    at /usr/src/debug/h2o-master-git/h2o/lib/handler/mruby.c:727
#10 0x000062085dc6a76e in build_env (generator=0x7001d006cbe0) at /usr/src/debug/h2o-master-git/h2o/lib/handler/mruby.c:836
#11 on_req (_handler=, req=) at /usr/src/debug/h2o-master-git/h2o/lib/handler/mruby.c:974
#12 0x000062085dbc603a in call_handlers (req=0x7001d00765d8, handler=0x62085f2d5ef0) at /usr/src/debug/h2o-master-git/h2o/lib/core/request.c:165
#13 0x000062085dbeeb89 in handle_incoming_request (conn=) at /usr/src/debug/h2o-master-git/h2o/lib/http1.c:714
#14 0x000062085dba6293 in run_socket (sock=0x7001d009b660) at /usr/src/debug/h2o-master-git/h2o/lib/common/socket/evloop.c.h:834
#15 run_pending (loop=loop@entry=0x7001d0000b70) at /usr/src/debug/h2o-master-git/h2o/lib/common/socket/evloop.c.h:876
#16 0x000062085dba6300 in h2o_evloop_run (loop=0x7001d0000b70, max_wait=) at /usr/src/debug/h2o-master-git/h2o/lib/common/socket/evloop.c.h:925
#17 0x000062085dc5da1b in run_loop (_thread_index=) at /usr/src/debug/h2o-master-git/h2o/src/main.c:4210
#18 0x000070022b8a955a in ?? () from /usr/lib/libc.so.6
#19 0x000070022b926a3c in ?? () from /usr/lib/libc.so.6

Location of core files in Arch Linux

Wed, 10 Apr 2024 22:15:00 +0200

In the old UNIX days the core file was written where the offending program was started. The only prerequisite was that there was no limit imposed. Limits can be checked by

$ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      unlimited
-u: processes                       254204
-n: file descriptors                1024
-l: locked-in-memory size (kbytes)  8192
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 254204
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15: rt cpu time (microseconds)   unlimited

The line for the "core file size" must be greater than zero.

In Arch Linux that alone doesn't help. core files are written to this directory:

$ coredumpctl info
          PID: 16354 (h2o)
           UID: 33 (http)
           GID: 33 (http)
        Signal: 11 (SEGV)
     Timestamp: Wed 2024-04-10 20:02:12 CEST (2h 3min ago)
  Command Line: h2o
    Executable: /usr/bin/h2o
 Control Group: /user.slice/user-1000.slice/user@1000.service/tmux-spawn-3fc3de1b-6e2d-43bf-ad3d-bf55b4ce3a1a.scope
          Unit: user@1000.service
     User Unit: tmux-spawn-3fc3de1b-6e2d-43bf-ad3d-bf55b4ce3a1a.scope
         Slice: user-1000.slice
     Owner UID: 1000 (klm)
       Boot ID: 8b9d5dcffc3a4669b0c7fa244db334be
    Machine ID: 814e9c58b1e34999a682767020267eb0
      Hostname: chieftec
       Storage: /var/lib/systemd/coredump/core.h2o.33.8b9d5dcffc3a4669b0c7fa244db334be.16354.1712772132000000.zst (inaccessible)
       Message: Process 16354 (h2o) of user 33 dumped core.

                Stack trace of thread 16363:
                #0  0x0000777802fe7bb3 n/a (libcrypto.so.53 + 0xd0bb3)
                #1  0x00007778030efd5b SSL_CTX_flush_sessions (libssl.so.56 + 0x24d5b)
                #2  0x00005d994cc02023 cache_cleanup_thread (h2o + 0x12a023)
                #3  0x0000777802c7755a n/a (libc.so.6 + 0x8b55a)
                #4  0x0000777802cf4a3c n/a (libc.so.6 + 0x108a3c)

The command coredumpctl list enlists the core's so far:

$ coredumpctl list
TIME                           PID UID GID SIG     COREFILE     EXE          SIZE
Sat 2024-04-06 17:55:20 CEST 24746  33  33 SIGSEGV inaccessible /usr/bin/h2o    -
Sat 2024-04-06 18:49:20 CEST 26982  33  33 SIGSEGV inaccessible /usr/bin/h2o    -
Sat 2024-04-06 18:50:04 CEST 27178  33  33 SIGSEGV inaccessible /usr/bin/h2o    -

You can start debugging with coredumpctl debug. That will call gdb.

The location and name of the core file can be changed by tampering with

$ cat /proc/sys/kernel/core_pattern
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

More information is here: Core dump file is not generated, coredumpctl, systemd-coredump.

CSS Naked Day

Sun, 31 Mar 2024 12:45:00 +0200

9th April is CSS Naked Day. A day where you do not use CSS on your web-site. In 2024 I participate in this day, i.e., I will deactivate the CSS on this blog.

From the CSS Naked Day:

The idea behind CSS Naked Day is to promote web standards. Plain and simple. This includes proper use of HTML, semantic markup, a good hierarchy structure, and of course, a good old play on words. In the words of 2006, it’s time to show off your for what it really is.

The importance of CSS is illustrated by this humorous tweet:

1. The 50 hour window

The logic to enable or disable CSS is given by below PHP routine on CSS Naked Day:

= $start && $now <= $end ) {
        return true;
    }
    return false;
}
?>

Running this with php -a and unixtimestamp.com for the year 2024 gives the following interval:

Start: 08-Apr-2024 12:00 CET
End: 10-Apr-2024 14:00 CET

The rationale is:

CSS Naked Day lasts for one international day. Technically speaking, it will be April 9 somewhere in the world for 50 hours. This is to ensure that everyone’s website will be publicly nude for the entire world to see at any given time during April 9.

2. Required changes in templates

For this blog I use the static site generator Simplified Saaze. All templates of this generator are written in PHP. So deactivating CSS is a pretty simple if statement.

I use the following hierarchy of PHP files for my entry-template, i.e., the template for a blog post:

# entry.php ## top-layout.php ### head.php ## read_cattag_json.php ## Actual content: $entry['content'] ## bottom-layout.php

The following hierarchy is used for the index-template, i.e., the template for showing a reverse-date sorted list of blog posts:

# index.php ## top-layout.php ### head.php ## for-loop over entry-excerpts ## bottom-layout.php

3. Changes in section

File head.php does not contain any CSS. File top-layout.php handles the majority of the HTML section, and the beginning of the section.

I use prism.js for syntax highlighting. This in turn uses CSS, which is surrounded by a simple if:

If I generate all the static HTML files, I use the environment variable NO_CSS. In case of dynamic generation I simply set $NO_CSS explicitly in top-layout.php, i.e., $NO_CSS=true;.

I have a separate CSS file, called blogklm.css, which I also surround with an if:

\n" ?>

\n" ?>

For galleries and Markmap I had a conditional anyway. This needed an additional clause:

I use Pagefind for searching within this blog. Pagefind in turn needs CSS, which is surrounded by an if:

4. Changes in section

Still in top-layout.php. Finally, I explicitly mention that I stripped all CSS, so visitors are not surprised to find a new layout:

         April 9 is CSS Naked Day!\n"; ?>

5. History and evolution of CSS Naked Day

Below text is copied from CSS Naked Day and the Missing Wikipedia Page:

The event dates back to 2006, when Dustin Diaz, an American web developer, advertised the first CSS Naked Day in order “to promote web standards.”

During the first two years (2006 and 2007), CSS Naked Day was held on April 5, when in 2008, the date was changed to April 9.

Until 2009, the event was organized by Diaz. From 2010 to 2014, Taylor Satula, an American web designer, ...

From the first CSS Naked Day in 2006, which had 763 recorded participants, engagement went up to 2,160 participants in 2008. After another strong participation in 2009 (1,266 recorded participants), fewer people and sites are documented to have taken part.

In recent years (2020–2023), only a fraction of these participants is known, usually including a few dozen individuals and their sites. While there are no reliable ways to measure participation, it seems clear that while CSS Naked Day is still being observed, that is only the case for a small minority of people in the field. ...

In the months following the 2015 edition, and until today, Basmaison and Meiert have kept maintaining the site and promoting the event together.

The usual omnipresent Wikipedia trolls and naysayer blocked this wiki entry.

Is Binary Compiled with Frame Pointer Support?

Mon, 18 Mar 2024 14:00:00 +0100

How can you detect whether a Linux binary was compiled with

gcc -fomit-frame-pointer

Unfortunately the ELF itself does not contain a flag, which tells you that. But looking at the assembler code can give you the answer.

First disassemble the code with

objdump -d

Check the disassembly for below pairs directly after any C function:

push   %rbp
mov    %rsp,%rbp

These are the instructions to set up the frame pointer on 64 bit Linux x86 systems.

Example:

0000000000001380 :
    1380:       55                      push   %rbp
    1381:       48 89 e5                mov    %rsp,%rbp

A good heuristic is then

objdump -d $binary | grep -c "mov.*%rsp,.*%rbp"

Double check with

objdump -d $binary | grep -C1 "mov.*%rsp,.*%rbp"

This heuristic is not fool proof, as individual C routines can be augmented with

__attribute__((optimize("omit-frame-pointer"))

In the intense debate about making -fno-omit-frame-pointer the default in Fedora, see this comment from L. A. F. Pereira in Python 3.11 performance with frame pointers.

See How can I tell whether a binary is compiled with frame pointers or not on Linux?, which discusses the case for 32 bit x86 Linux systems.

Code with framepointers will always contain the both of the two instructions push %ebp and mov %esp, %ebp. ... For those working with x86_64, the registers to look for are the 64-bit equivalents: %rbp and %rsp - the concept is the same though!

The post The Return of the Frame Pointers by Brendan Gregg triggered this task.

As of today, 18-Mar-2024, Arch Linux still does not ship binaries with frame pointer support. For example:

$ objdump -d /bin/zsh | grep -c "mov.*%rsp,.*%rbp"
10

The PHP binary fails the heuristic:

$ objdump -d /bin/php | grep -c "mov.*%rsp,.*%rbp"
173

But looking at the actuall disassembly shows something like this:

000000000021aff2 :
  21aff2:       f3 0f 1e fa             endbr64
  21aff6:       48 8d 05 43 9b 1e 01    lea    0x11e9b43(%rip),%rax        # 1404b40

I.e., no frame pointer handling.

Chinese Hackers #2

Tue, 05 Mar 2024 14:15:00 +0100

In the year 2020 in the blog post Chinese Hackers I noticed that China tries the most to hack my Linux machines. These attempts look like this:

$ lastb
a        ssh:notty    209.97.163.130   Tue Mar  5 13:07 - 13:07  (00:00)
sftpuser ssh:notty    93.123.39.2      Tue Mar  5 13:05 - 13:05  (00:00)
sftpuser ssh:notty    93.123.39.2      Tue Mar  5 13:05 - 13:05  (00:00)
hzp      ssh:notty    43.156.241.167   Mon Mar  4 18:19 - 18:19  (00:00)
hzp      ssh:notty    43.156.241.167   Mon Mar  4 18:19 - 18:19  (00:00)
root     ssh:notty    8.219.249.208    Mon Mar  4 18:17 - 18:17  (00:00)
mheydary ssh:notty    118.178.132.93   Mon Mar  4 12:35 - 12:35  (00:00)
mheydary ssh:notty    118.178.132.93   Mon Mar  4 12:34 - 12:34  (00:00)
ftp1user ssh:notty    143.255.140.241  Mon Mar  4 12:34 - 12:34  (00:00)
ftp1user ssh:notty    143.255.140.241  Mon Mar  4 12:34 - 12:34  (00:00)
panisa   ssh:notty    139.224.200.60   Mon Mar  4 11:13 - 11:13  (00:00)
panisa   ssh:notty    139.224.200.60   Mon Mar  4 11:13 - 11:13  (00:00)
sina     ssh:notty    129.226.158.202  Mon Mar  4 10:45 - 10:45  (00:00)
sina     ssh:notty    129.226.158.202  Mon Mar  4 10:44 - 10:44  (00:00)
hadoop   ssh:notty    129.226.152.121  Mon Mar  4 10:43 - 10:43  (00:00)

In 2020 I used fail2ban. Since 2021 I use SSHGuard. It uses way less resources. See Analysis And Usage of SSHGuard.

I ran a quick analysis which country is the most aggressive penetrator.

1. Collecting IP addresses. SSHGuard filters the offending intruder via ipset.

$ ipset list > i1

This collects all IP addresses.

Now I run these IP numbers through geoiplookup:

$ for i in `perl -ne 'print $1."\n" if /^(\d+\.\d+\.\d+\.\d+)\s+/' i1`; do geoiplookup $i >> i3; done

The resulting list looks like this:

$ head i3
GeoIP Country Edition: CN, China
GeoIP Country Edition: HK, Hong Kong
GeoIP Country Edition: US, United States
GeoIP Country Edition: US, United States
GeoIP Country Edition: KR, Korea, Republic of
GeoIP Country Edition: PE, Peru
GeoIP Country Edition: CA, Canada
GeoIP Country Edition: CN, China
GeoIP Country Edition: KR, Korea, Republic of
GeoIP Country Edition: KE, Kenya

2. Sorting according frequency.

cut -d: -f2 i3 | sort | uniq -c | sort -rn

The top 20 offenders are:

   4228  CN, China
   3175  US, United States
   2142  SG, Singapore
   1596  KR, Korea, Republic of
   1042  DE, Germany
    980  IN, India
    755  HK, Hong Kong
    661  BR, Brazil
    566  RU, Russian Federation
    522  VN, Vietnam
    471  ID, Indonesia
    453  JP, Japan
    403  FR, France
    396  NL, Netherlands
    354  GB, United Kingdom
    313  IR, Iran, Islamic Republic of
    307  CA, Canada
    279  TW, Taiwan
    236  AU, Australia
    173  TH, Thailand

Graphically this looks like this:

Installing IBM COBOL for Linux on Arch Linux #2

Sat, 02 Mar 2024 14:15:00 +0100

I tried to install IBM COBOL for Linux multiple times. I tried to install it on Arch Linux, which is the Linux I use:

Installing IBM COBOL for Linux on Arch Linux in 2021
Testing COBOLworx gcc-cobol #2 in 2023

Initially I succeeded in installing the IBM compiler in 2021. The IBM compiler compared very favorably against the GNU Cobol compiler, see Comparing GnuCOBOL to IBM COBOL. But in 2023 this installation procedure failed. So, no IBM COBOL on Arch Linux.

Richard Nelson from IBM contacted me today and mentioned that IBM COBOL should also run on Arch Linux. So I tried to install the latest version 1.2.0.2 again. Version 1.2 is particularly appealing as it supports 64 bit. IBM COBOL compilers were notorious with lacking 64 bit support, see Memory Limitations with IBM Enterprise COBOL Compiler.

My current Arch Linux setup is as given in below table.

Type	Version
Linux	6.7.6-arch1-2 #1 SMP PREEMPT_DYNAMIC x86_64 GNU/Linux
gcc	gcc version 13.2.1 20230801 (GCC)
glibc	2.39-1
gcc-libs	13.2.1-5

1. Download. Software package is here: IBM COBOL for Linux on x86. IBM now uses this annoying two-factor authorization procedure, click through all these hoops. This 2FA makes it essentially impossible to write an AUR package, which downloads the IBM file within the PKGBUILD.

The file in question is IBM_COBOL_V1.2.0_LINUX_EVAL.x86-64.240110.tar.gz. Its size is 116 MB.

$ tar ztvf IBM_COBOL_V1.2.0_LINUX_EVAL.x86-64.240110.tar.gz
drwxr-sr-x root/root         0 2023-06-06 01:05 images/
drwxr-sr-x root/root         0 2024-01-10 16:16 images/rhel/
-rw-rw-r-- root/root  26210268 2024-01-10 16:16 images/rhel/cobol.rte.1.2.0-1.2.0.2-231215.x86_64.rpm
-rw-rw-r-- root/root   2331592 2024-01-10 16:16 images/rhel/cobol.dbg.1.2.0-1.2.0.2-231215.x86_64.rpm
-rw-rw-r-- root/root   3055224 2024-01-10 16:16 images/rhel/cobol.cmp.license-eval.1.2.0-1.2.0.2-231215.x86_64.rpm
-rw-rw-r-- root/root  11199076 2024-01-10 16:16 images/rhel/cobol.cmp.1.2.0-1.2.0.2-231215.x86_64.rpm
drwxr-sr-x root/root         0 2024-01-10 16:17 images/sles/
-rw-r--r-- root/root  22295780 2024-01-10 16:17 images/sles/cobol.rte.1.2.0-1.2.0.2-231215.x86_64.rpm
-rw-r--r-- root/root   1975984 2024-01-10 16:17 images/sles/cobol.dbg.1.2.0-1.2.0.2-231215.x86_64.rpm
-rw-r--r-- root/root   2999760 2024-01-10 16:17 images/sles/cobol.cmp.license-eval.1.2.0-1.2.0.2-231215.x86_64.rpm
-rw-r--r-- root/root   9095804 2024-01-10 16:17 images/sles/cobol.cmp.1.2.0-1.2.0.2-231215.x86_64.rpm
drwxr-sr-x root/root         0 2024-01-10 16:17 images/ubuntu/
-rw-r--r-- root/root   1957512 2024-01-10 16:17 images/ubuntu/cobol.dbg.1.2.0_1.2.0.2-231215_amd64.deb
-rw-r--r-- root/root   2992220 2024-01-10 16:17 images/ubuntu/cobol.cmp.license-eval.1.2.0_1.2.0.2-231215_amd64.deb
-rw-r--r-- root/root  10125300 2024-01-10 16:17 images/ubuntu/cobol.cmp.1.2.0_1.2.0.2-231215_amd64.deb
-rw-r--r-- root/root  22514248 2024-01-10 16:17 images/ubuntu/cobol.rte.1.2.0_1.2.0.2-231215_amd64.deb
-rwxr-xr-x root/root      6763 2024-01-10 16:32 install
-rw-r--r-- root/root    820691 2023-06-06 01:05 install.pdf
-rwxr-xr-x root/root   2694559 2023-06-06 01:12 LicenseAgreement.pdf
-rwxr-xr-x root/root    285651 2023-06-06 01:12 LicenseInformation.pdf
-rwxr-xr-x root/root     57001 2023-06-06 01:12 notices
-rw-r--r-- root/root    311858 2023-06-06 01:14 quickstart.fr_FR.pdf
-rw-r--r-- root/root    311477 2023-06-06 01:14 quickstart.ja_JP.pdf
-rw-r--r-- root/root    281309 2023-06-06 01:14 quickstart.pdf
-rwxr-xr-x root/root      2932 2023-06-06 01:12 README

2. Unpacking the Ubuntu part. We will extract the Ubuntu part, highlighted above.

$ tar zxf IBM_COBOL_V1.2.0_LINUX_EVAL.x86-64.240110.tar.gz images/ubuntu/

Change to images/ubuntu directory and run the below loop, which first unpacks the deb-files with ar, then unpacks the resulting tar.xz data file with tar Jx:

for i in *.deb; do ar xf $i; tar Jxf data.tar.xz; done

This creates a subdirectory opt with 188 entries.

Move the resulting opt or opt/ibm to the "real" /opt and chown -R root:root all the files.

Installation size is 135 MB.

3. Checking the installation. See, whether all libraries are in place.

$ ldd /opt/ibm/cobol/1.2.0/bin/cob2
        linux-vdso.so.1 (0x00007ffebab8a000)
        librt.so.1 => /usr/lib/librt.so.1 (0x000070366a43e000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x000070366a439000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x000070366a434000)
        libc.so.6 => /usr/lib/libc.so.6 (0x000070366a252000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x000070366a47a000)

$ ldd /opt/ibm/cobol/1.2.0/bin/cob3
        not a dynamic executable

$ ldd cob3_64
        linux-vdso.so.1 (0x00007ffeddbf9000)
        librt.so.1 => /usr/lib/librt.so.1 (0x00007a35119f5000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007a35119f0000)
        libicuuc_64r.so => /opt/ibm/cobol/1.2.0/usr/bin/./../../../rte/usr/lib/libicuuc_64r.so (0x00007a3510600000)
        libcob2_64r.so => /opt/ibm/cobol/1.2.0/usr/bin/./../../../rte/usr/lib/libcob2_64r.so (0x00007a3510000000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007a3511904000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007a3511720000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007a350fc00000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007a35116fb000)
        libicudata_64r.so => /opt/ibm/cobol/1.2.0/usr/bin/./../../../rte/usr/lib/libicudata_64r.so (0x00007a350da00000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007a35116f6000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007a3511a31000)
        libicui18n_64r.so => /opt/ibm/cobol/1.2.0/usr/bin/./../../../rte/usr/lib/libicui18n_64r.so (0x00007a350d200000)
        libdfp_64r.so => /opt/ibm/cobol/1.2.0/usr/bin/./../../../rte/usr/lib/libdfp_64r.so (0x00007a350ca00000)

For convenience add the bin-directory to the PATH:

$ export PATH=$PATH:/opt/ibm/cobol/1.2.0/bin

Up to this point, running the compiler would report a license problem. The actual compiler is cob2.

Here is an example, once the license is setup correctly:

$ cob2 hello1.cob
IBM COBOL for Linux 1.2.0 compile started
End of compilation 1,  program HELLO1,  no statements flagged.

4. Getting a 60 day trial license. Richard Nelson sent me a new file libxlcmpev_64r.so. With this new library file the compiler works flawlessly.

$ license_check
Evaluation (Trial/Eval/TnB) license
Current date    Sat, 02 Mar 2024 17:54:00 GMT
Activation date Thu, 29 Feb 2024 00:00:01 GMT
Expire date     Mon, 29 Apr 2024 23:59:59 GMT
Days left       58

Thanks Richard!

Also, Richard mentioned the install shell script in the original tar file, see line 18. I didn't make use of that! My fault. Once I knew that this libxlcmpev_64r.so is problematic, and looking at the install script:

...
extendTrial="$reldir/cobol/$version/usr/bin/xlcmp xlcbl && rm $reldir/cobol/$version/usr/bin/xlcmp"
eval $extendTrial
...

Generating the license now goes like this, as user root:

/opt/ibm/cobol/1.2.0/usr/bin/xlcmp xlcbl

This generates a new 1.2.0/usr/lib/libxlcmpev_64r.so. This provides a valid 60 day license.

$ license_check
Evaluation (Trial/Eval/TnB) license
Current date    Sat, 02 Mar 2024 18:14:24 GMT
Activation date Sat, 02 Mar 2024 00:00:01 GMT
Expire date     Wed, 01 May 2024 23:59:59 GMT
Days left       60

Parallelizing the Output of Simplified Saaze

Tue, 27 Feb 2024 08:00:00 +0100

This blog uses Simplified Saaze as its static site generator. Generating all 561 HTML pages takes 0.25 seconds. The environment used is as in below table.

Type	Value
CPU	AMD Ryzen 7 5700G
RAM	64 GB
OS	Arch Linux 6.7.6-arch1-1 #1 SMP PREEMPT_DYNAMIC
PHP	PHP 8.3.3 (cli)
PHP with JIT	PHP 8.3.3 (cli), Zend Engine v4.3.3 with Zend OPcache v8.3.3
Simplified Saaze	2.0

1. Runtimes in serial mode. In the following we use PHP with no JIT. So far runtimes for this very blog are as below:

$ time php saaze -mortb /tmp/build
Building static site in /tmp/build...
    execute(): filePath=./content/aux.yml, nSIentries=7, totalPages=1, entries_per_page=20
    execute(): filePath=./content/blog.yml, nSIentries=452, totalPages=23, entries_per_page=20
    execute(): filePath=./content/gallery.yml, nSIentries=7, totalPages=1, entries_per_page=20
    execute(): filePath=./content/music.yml, nSIentries=69, totalPages=4, entries_per_page=20
    execute(): filePath=./content/error.yml, nSIentries=0, totalPages=0, entries_per_page=20
Finished creating 5 collections, 4 with index, and 561 entries (0.25 secs / 24.46MB)
#collections=5, parseEntry=0.0103/563-5, md2html=0.0201, MathParser=0.0141/561, renderEntry=0.1573/561, renderCollection=0.0058/33, content=561/0, excerpt=0/0
    real 0.28s
    user 0.16s
    sys 0
    swapped 0
    total space 0

It can be seen that the renderEntry() function uses 0.1573 seconds from overall 0.25 seconds, i.e., more than 60%. These 561 calls will now be parallelized. The rest stays serial.

For the Lemire blog we have:

$ time php saaze -rb /tmp/buildLemire
Building static site in /tmp/buildLemire...
        execute(): filePath=/home/klm/php/saaze-lemire/content/blog.yml, nSIentries=2771, totalPages=139, entries_per_page=20
Finished creating 1 collections, 1 with index, and 4483 entries (1.01 secs / 97.18MB)
#collections=1, parseEntry=0.0702/4483-1, md2html=0.1003, MathParser=0.0594/4483, renderEntry=0.4121/4483, renderCollection=0.0225/140, content=4483/0, excerpt=0/0
        real 1.03s
        user 0.64s
        sys 0
        swapped 0
        total space 0

In this case the output template processing is 0.4121 seconds from overall 1.01 seconds, that's 40%. This shows that the Lemire templates are easier. No wonder, they do not use categories and tags, and many other gimmicks, which I used in this blog. But still, 40% of the runtime is spent on output rendering.

In Performance Comparison Saaze vs. Hugo vs. Zola I wrote:

It would be quite easy to use threads in Saaze, i.e., so-called entries and the chunks of collections could easily be processed in parallel.

It is even easier to parallelize the generation of the output files when the PHP templating is in place. We will see that parallelizing can be done in less than 20 lines of PHP code.

2. Runtimes in serial mode with JIT enabled. Below are the runtime with JIT and OPCache enabled for PHP.

time php saaze -mortb /tmp/build
Building static site in /tmp/build...
        execute(): filePath=./content/aux.yml, nSIentries=7, totalPages=1, entries_per_page=20
        execute(): filePath=./content/blog.yml, nSIentries=453, totalPages=23, entries_per_page=20
        execute(): filePath=./content/gallery.yml, nSIentries=7, totalPages=1, entries_per_page=20
        execute(): filePath=./content/music.yml, nSIentries=69, totalPages=4, entries_per_page=20
        execute(): filePath=./content/error.yml, nSIentries=0, totalPages=0, entries_per_page=20
Finished creating 5 collections, 4 with index, and 562 entries (0.16 secs / 20.36MB)
#collections=5, parseEntry=0.0104/564-5, md2html=0.0219, MathParser=0.0203/562, renderEntry=0.0521/562, renderCollection=0.0022/33, content=562/0, excerpt=0/0
        real 0.19s
        user 0.11s
        sys 0
        swapped 0
        total space 0

The previous massive renderEntry() part in runtime shrank from 0.1573 seconds to 0.0521 seconds. I think this is mainly due to the OPCache, which now avoids recompiling and reparsing the PHP output template.

For the Lemire blog with JIT enabled we have:

time php saaze -rb /tmp/buildLemire
Building static site in /tmp/buildLemire...
        execute(): filePath=/home/klm/php/saaze-lemire/content/blog.yml, nSIentries=2771, totalPages=139, entries_per_page=20
Finished creating 1 collections, 1 with index, and 4483 entries (0.62 secs / 96.24MB)
#collections=1, parseEntry=0.0655/4483-1, md2html=0.0974, MathParser=0.0586/4483, renderEntry=0.0707/4483, renderCollection=0.0110/140, content=4483/0, excerpt=0/0
        real 0.65s
        user 0.40s
        sys 0
        swapped 0
        total space 0

Similar picture to the above: the renderEntry() part dropped from 0.4121 seconds to 0.0707 seconds. That's massive.

3. Unix forks in PHP. As a preliminary introduction to pcntl_fork() in PHP, look at below simple PHP code.


Running this script:
$ php forktst.php
i=1, pid=15082
i=2, pid=15083
i=3, pid=15084
i=4, pid=15085

The fork and join method of parallelization is easy to use, but it has the disadvantage that communicating results from the children to the parent is "difficult".
Communicating data from the parent to its children is "easy": everything is copied over.
4. Implementation in BuildCommand.php.
The command-line version of Simplified Saaze calls buildAllStatic().
This routine iterates through all collections, and for each collection it iterates through all entries.

Function getEntries() reads Markdown files into memory and converts them to HTML by using MD4C, all in memory
Function buildEntry() uses the entry in question and writes the HTML to disk by processing it through our PHP templates.

PHP function buildEntry() is essentially:
private function buildEntry(Collection $collection, Entry $entry, string $dest) : void {
    ...
    file_put_contents($entryDir, $this->templateManager->renderEntry($entry);
}

buildEntry() is now encapsulated within beginParallel() and endParallel().
That's it.
foreach ($collections as $collection) {
    $entries    = $collection->getEntries();	# finally calls getContentAndExcerpt() and sorts
    $nentries   = count($entries);
    $nSIentries = count($collection->entriesSansIndex);
    $entries_per_page = $collection->data['entries_per_page'] ?? \Saaze\Config::$H['global_config_entries_per_page'];
    $totalPages = ceil($nSIentries / $entries_per_page);
    printf("\texecute(): filePath=%s, nSIentries=%d, totalPages=%d, entries_per_page=%d\n",$collection->filePath,$nSIentries,$totalPages,$entries_per_page);

    $this->beginParallel($nentries,$aprocs);
    $i = 0;
    foreach ($entries as $entry) {
        if ($this->nprocs > 0  &&  ($i++ % $this->nprocs) != $this->procnr) continue;	// distribute work among nprocs processes
        if ($entry->data['entry'] ?? true) {
            $this->buildEntry($collection, $entry, $dest);
            $entryCount++;
        }
    }
    $this->endParallel();

    if ($tags) {	// populate cat_and_tag[][] array
        foreach ($entries as $entry) {
            if ($entry->data['entry'] ?? true)
                $this->build_cat_and_tag($entry,$collection->draftOverride);
        }
    }

    ++$totalCollection;
    if ($this->buildCollectionIndex($collection, 0, $dest)) $collectionCount++;

    for ($page=1; $page <= $totalPages; $page++)
        $this->buildCollectionIndex($collection, $page, $dest);
}

The two PHP functions for fork and join are thus:
protected function beginParallel(int $nentries, int $aprocs) : void {
    $this->pid = 0;
    $this->procnr = 0;
    $this->nprocs = 1;
    if ($nentries < 128) return;	// too few entries to warrant forking
    $this->nprocs = $aprocs;	// aprocs = allowed procs, specified on commmand-line
    for ($this->procnr=0; $this->procnr<$this->nprocs; ++$this->procnr)
        if (($this->pid = pcntl_fork())) return;	// child returns to work
}

protected function endParallel() : void {
    if ($this->pid) exit(0);	// exit child process; pid=0 is parent
}

This fork and join via pcntl_fork() does not work on Microsoft Windows.
5. Benchmarking. How much of an improvement do we get by this?
For this very blog with 561 entries, the runtimes can be more than halved.
This is in line with the 60% runtime used by the output template processing.
It should be noted that this blog is comprised of five collections:

aux: 7 entries
blog: 452 entries, only these are parallelized!
gallery: 7 entries
music: 69 entries
error: 1 entry

The parallelization kicks in only for at least 128 entries.
I.e., only the blog-part is parallelized, the music-part and the other parts are not.
Another benchmark is the Lemire blog converted to Simplified Saaze, see Example Theme for Simplified Saaze: Lemire.
Command-lines are:
time php saaze -p16 -mortb /tmp/build
time php saaze -p16 -rb /tmp/buildLemire

Then we are varying the parameter -p.
All output is to /tmp, which is a RAM disk in Arch Linux.
Obviously, I do not want to measure disk read or write speed.
I want to measure the processing speed of Simplified Saaze.
Timings are from time, taking real time.



Blog entries
p=1
p=2
p=4
p=8
p=16




561 posts / this blog
0.28
0.18
0.16
0.13
0.12


561 posts with JIT
0.19
0.17
0.14
0.13
0.12










4.483 posts in Lemire
1.03
1.02
0.65
0.54
0.52


4.483 posts with JIT
0.65
0.64
0.53
0.47
0.46



Overall, with just 20 lines of PHP we can halve the runtime.
For JIT enabled, the drop in runtime is not so pronounced, but also almost halved.
The very good performance of JIT, which we can see here, is in line with the findings in Phoronix: PHP 8.0 JIT Is Offering Very Compelling Performance Ahead Of Its Alpha.

Blog entries	p=1	p=2	p=4	p=8	p=16
561 posts / this blog	0.28	0.18	0.16	0.13	0.12
561 posts with JIT	0.19	0.17	0.14	0.13	0.12

4.483 posts in Lemire	1.03	1.02	0.65	0.54	0.52
4.483 posts with JIT	0.65	0.64	0.53	0.47	0.46



GitHub RSS Atom Feeds
Sun, 25 Feb 2024 17:45:00 +0100

Ronalds Vilcins, in his article RSS feeds for your Github releases, tags and activity, provides a handy overview of some GitHub RSS feeds.
I reproduce them here verbatim:



Type
URL




Releases
https://github.com/:owner/:repo/releases.atom


Commits
https://github.com/:owner/:repo/commits.atom


Private feed
https://github.com/:user.private.atom?token=:secret


Tags
https://github.com/:user/:repo/tags.atom


User activity
https://github.com/:user.atom



They are vaguely documented by GitHub here: Get feeds.
For example, for my saaze GitHub repository the feed for the commits is:

  tag:github.com,2008:/eklausme/saaze/commits/master
  
  
  Recent Commits to saaze:master
  2024-02-17T12:58:12Z
  
    tag:github.com,2008:Grit::Commit/48560c8bb5535cfaacdf2fc1be153c43448051d5
    
    
        Reduced CPU overhead in composer
    
    2024-02-17T12:58:12Z
    
    
      eklausme
      https://github.com/eklausme
    
    
      <pre style='white-space:pre-wrap;width:81ex'>Reduced CPU overhead in composer</pre>
    
  
  ...


The above output was produced by below command-line:
curl https://github.com/eklausme/saaze/commits.atom


		


MD4C PHP Extension
Sat, 24 Feb 2024 22:45:00 +0100

This blog uses MD4C to convert Markdown to HTML.
So far I used PHP:FFI to link PHP with the MD4C C library.
PHP:FFI is "Foreign Function Interface" in PHP and allows to call C functions from PHP without writing a PHP extension.
Using FFI is very easy.
Previous profiling measurements with XHProf and PHPSPY indicated that the handling of the return value from MD4C via FFI::String takes some time.
So I changed FFI to a "real" PHP extension.
I measured again.
Result: No difference between FFI and PHP extension.
So the profiling measurements were misleading.
Also the following claim in the PHP manual is downright false:

it makes no sense to use the FFI extension for speed; however, it may make sense to use it to reduce memory consumption.

Nevertheless, writing a PHP extension was a good exercise to keep my acquaintance with the PHP development ecosystem up to date.
I had already written a COBOL to PHP and an IMS/DC to PHP extension:

PHP extension seg-faulting
IMS/DC MFS To PHP

Literature on writing PHP extension are here:

Sara Golemon: Extending and Embedding PHP, Sams Publishing, 2006, xx+410 p.
PHP Internals: Zend extensions
https://github.com/dstogov/php-extension

The PHP extension code is in GitHub: php-md4c.
1. Walk through the C code. For this simple extension there is no need for a separate header file.
The extension starts with basic includes for PHP, for the phpinfo(), and for MD4C:
// MD4C extension for PHP: Markdown to HTML conversion

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include 
#include 
#include 

The following code is directly from the FFI part php_md4c_toHtml.c:
struct membuffer {
    char* data;
    size_t asize;	// allocated size = max usable size
    size_t size;	// current size
};

The following routines are also almost the same as in the FFI case, except that memory allocation is using safe_pemalloc() instead of native malloc().
In our case this doesn't make any difference.
static void membuf_init(struct membuffer* buf, MD_SIZE new_asize) {
    buf->size = 0;
    buf->asize = new_asize;
    if ((buf->data = safe_pemalloc(buf->asize,sizeof(char),0,1)) == NULL)
        php_error_docref(NULL, E_ERROR, "php-md4c.c: membuf_init: safe_pemalloc() failed with asize=%ld.\n",(long)buf->asize);
}

Next routine uses safe_perealloc() instead of realloc().
static void membuf_grow(struct membuffer* buf, size_t new_asize) {
    buf->data = safe_perealloc(buf->data, sizeof(char*), new_asize, 0, 1);
    if (buf->data == NULL)
        php_error_docref(NULL, E_ERROR, "php-md4c.c: membuf_grow: realloc() failed, new_asize=%ld.\n",(long)new_asize);
    buf->asize = new_asize;
}

The rest is identical to FFI.
static void membuf_append(struct membuffer* buf, const char* data, MD_SIZE size) {
    if (buf->asize < buf->size + size)
        membuf_grow(buf, buf->size + buf->size / 2 + size);
    memcpy(buf->data + buf->size, data, size);
    buf->size += size;
}

static void process_output(const MD_CHAR* text, MD_SIZE size, void* userdata) {
    membuf_append((struct membuffer*) userdata, text, size);
}

static struct membuffer mbuf = { NULL, 0, 0 };

Now we come to something PHP specific.
We encapsulate the C function into PHP_FUNCTION.
Furthermore, the arguments of the routine are parsed with ZEND_PARSE_PARAMETERS_START(1, 2).
This routine must have at least one argument.
It might have an optional second argument.
That is what is meant by (1,2).
The return string is allocated via estrndup().
In the FFI case we just return a pointer to a string.
/* {{{ string md4c_toHtml( string $markdown, [ int $flag ] )
 */
PHP_FUNCTION(md4c_toHtml) {	// return HTML string
    char *markdown;
    size_t markdown_len;
    int ret;
    long flag = MD_DIALECT_GITHUB | MD_FLAG_NOINDENTEDCODEBLOCKS;

    ZEND_PARSE_PARAMETERS_START(1, 2)
        Z_PARAM_STRING(markdown, markdown_len)
        Z_PARAM_OPTIONAL Z_PARAM_LONG(flag)
    ZEND_PARSE_PARAMETERS_END();

    if (mbuf.asize == 0) membuf_init(&mbuf,16777216);	// =16MB

    mbuf.size = 0;	// prepare for next call
    ret = md_html(markdown, markdown_len, process_output,
        &mbuf, (MD_SIZE)flag, 0);
    membuf_append(&mbuf,"\0",1); // make it a null-terminated C string, so PHP can deduce length
    if (ret < 0) {
        RETVAL_STRINGL("
- - - Error in Markdown - - -
\n",sizeof("
- - - Error in Markdown - - -
\n"));
    } else {
        RETVAL_STRING(estrndup(mbuf.data,mbuf.size));
    }
}
/* }}}*/

The following two PHP extension specific functions are just for initialization and shutdown.
The following diagram from PHP internals shows the sequence of initialization and shutdown.
  
Init: Do nothing.
/* {{{ PHP_MINIT_FUNCTION
 */
PHP_MINIT_FUNCTION(md4c) {	// module initialization
    //REGISTER_INI_ENTRIES();
    //php_printf("In PHP_MINIT_FUNCTION(md4c): module initialization\n");

    return SUCCESS;
}
/* }}} */

Shutdown: Do nothing.
/* {{{ PHP_MSHUTDOWN_FUNCTION
 */
PHP_MSHUTDOWN_FUNCTION(md4c) {	// module shutdown
    if (mbuf.data) pefree(mbuf.data,1);
    return SUCCESS;
}
/* }}} */

The following function prints out information when called via phpinfo().
/* {{{ PHP_MINFO_FUNCTION
 */
PHP_MINFO_FUNCTION(md4c) {
    php_info_print_table_start();
    php_info_print_table_row(2, "MD4C", "enabled");
    php_info_print_table_row(2, "PHP-MD4C version", "1.0");
    php_info_print_table_row(2, "MD4C version", "0.5.2");
    php_info_print_table_end();
}
/* }}} */

The output looks like this:

Below describes the argument list.
/* {{{ arginfo
 */
ZEND_BEGIN_ARG_INFO(arginfo_md4c_test, 0)
ZEND_END_ARG_INFO()

ZEND_BEGIN_ARG_INFO(arginfo_md4c_toHtml, 1)
    ZEND_ARG_INFO(0, str)
    ZEND_ARG_INFO_WITH_DEFAULT_VALUE(0, flag, "MD_DIALECT_GITHUB | MD_FLAG_NOINDENTEDCODEBLOCKS")
ZEND_END_ARG_INFO()
/* }}} */

/* {{{ php_md4c_functions[]
 */
static const zend_function_entry php_md4c_functions[] = {
    PHP_FE(md4c_toHtml,	arginfo_md4c_toHtml)
    PHP_FE_END
};
/* }}} */

The zend_module_entry is somewhat classical.
All the above is configured here.
/* {{{ md4c_module_entry
 */
zend_module_entry md4c_module_entry = {
    STANDARD_MODULE_HEADER,
    "md4c",						// Extension name
    php_md4c_functions,			// zend_function_entry
    NULL,	//PHP_MINIT(md4c),	// PHP_MINIT - Module initialization
    PHP_MSHUTDOWN(md4c),		// PHP_MSHUTDOWN - Module shutdown
    NULL,						// PHP_RINIT - Request initialization
    NULL,						// PHP_RSHUTDOWN - Request shutdown
    PHP_MINFO(md4c),			// PHP_MINFO - Module info
    "1.0",						// Version
    STANDARD_MODULE_PROPERTIES
};
/* }}} */

This seemingly innocent looking statement is important: Without it you will get PHP Startup: Unable to load dynamic library.
#ifdef COMPILE_DL_TEST
# ifdef ZTS
ZEND_TSRMLS_CACHE_DEFINE()
# endif
#endif
ZEND_GET_MODULE(md4c)

2. M4 config file.
The PHP extension requires a config.m4 file.
dnl config.m4 for php-md4c extension

PHP_ARG_WITH(md4c, [whether to enable MD4C support],
[  --with-md4c[[=DIR]]       Enable MD4C support.
                          DIR is the path to MD4C install prefix])

if test "$PHP_YAML" != "no"; then

    AC_MSG_CHECKING([for md4c headers])
    for i in "$PHP_MD4C" "$prefix" /usr /usr/local; do
        if test -r "$i/include/md4c-html.h"; then
            PHP_MD4C_DIR=$i
            AC_MSG_RESULT([found in $i])
            break
        fi
    done
    if test -z "$PHP_MD4C_DIR"; then
        AC_MSG_RESULT([not found])
        AC_MSG_ERROR([Please install md4c])
    fi

    PHP_ADD_INCLUDE($PHP_MD4C_DIR/include)
    dnl recommended flags for compilation with gcc
    dnl CFLAGS="$CFLAGS -Wall -fno-strict-aliasing"

    export OLD_CPPFLAGS="$CPPFLAGS"
    export CPPFLAGS="$CPPFLAGS $INCLUDES -DHAVE_MD4C"
    AC_CHECK_HEADERS([md4c.h md4c-html.h], [], AC_MSG_ERROR(['md4c.h' header not found]))
    #AC_CHECK_HEADER([md4c-html.h], [], AC_MSG_ERROR(['md4c-html.h' header not found]))
    PHP_SUBST(MD4C_SHARED_LIBADD)

    PHP_ADD_LIBRARY_WITH_PATH(md4c, $PHP_MD4C_DIR/$PHP_LIBDIR, MD4C_SHARED_LIBADD)
    PHP_ADD_LIBRARY_WITH_PATH(md4c-html, $PHP_MD4C_DIR/$PHP_LIBDIR, MD4C_SHARED_LIBADD)
    export CPPFLAGS="$OLD_CPPFLAGS"

    PHP_SUBST(MD4C_SHARED_LIBADD)
    AC_DEFINE(HAVE_MD4C, 1, [ ])
    PHP_NEW_EXTENSION(md4c, md4c.c, $ext_shared)
fi

3. Compiling. Run
phpize
./configure
make

Symbols are as follows:
$ nm md4c.so
0000000000002160 r arginfo_md4c_test
0000000000003d00 d arginfo_md4c_toHtml
                 w __cxa_finalize@GLIBC_2.2.5
00000000000040a0 d __dso_handle
0000000000003dc0 d _DYNAMIC
                 U _emalloc
                 U _emalloc_64
                 U _estrndup
00000000000016c8 t _fini
                 U free@GLIBC_2.2.5
00000000000016c0 T get_module
0000000000003fe8 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
00000000000021c8 r __GNU_EH_FRAME_HDR
0000000000001000 t _init
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
0000000000004180 b mbuf
00000000000040c0 D md4c_module_entry
                 U md_html
                 U memcpy@GLIBC_2.14
                 U php_error_docref
                 U php_info_print_table_end
                 U php_info_print_table_row
                 U php_info_print_table_start
0000000000003d60 d php_md4c_functions
                 U php_printf
0000000000001640 t process_output
0000000000001234 t process_output.cold
                 U _safe_malloc
                 U _safe_realloc
                 U __stack_chk_fail@GLIBC_2.4
                 U strlen@GLIBC_2.2.5
0000000000004168 d __TMC_END__
                 U zend_parse_arg_long_slow
                 U zend_parse_arg_str_slow
                 U zend_wrong_parameter_error
                 U zend_wrong_parameters_count_error
                 U zend_wrong_parameters_none_error
. . .
0000000000001380 T zif_md4c_toHtml
00000000000011cf t zif_md4c_toHtml.cold
0000000000001175 T zm_info_md4c
0000000000001350 T zm_shutdown_md4c
00000000000016b0 T zm_startup_md4c

4. Installing on Arch Linux. Copy the md4c.so library to /usr/lib/php/modules as root:
cp modules/md4c.so /usr/lib/php/modules

Finally activate the extension in php.ini:
extension=md4c

5. Notes on Windows. On Linux we use the installed MD4C library.
As noted in Installing Simplified Saaze on Windows 10 #2 it is advisable
to amalgamate all MD4C source files into a single file for easier compilation.

		


Let's Encrypt Certbot Usage with NGINX
Mon, 19 Feb 2024 16:35:00 +0100

Previously I used lefh to generate and update Let's Encrypt certificates for the Hiawatha webserver.
Unfortunately, this PHP script no longer works.
Therefore I installed certbot:
pacman -S certbot-nginx

Updating my domains is like this:
certbot --nginx -d eklausmeier.goip.de,klm.ddns.net,eklausmeier.mywire.org,klmport.no-ip.org,klm.no-ip.org

Its output is roughly
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Requesting a certificate for eklausmeier.goip.de and 4 more domains

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/eklausmeier.goip.de/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/eklausmeier.goip.de/privkey.pem
This certificate expires on 2024-05-19.
These files will be updated when the certificate renews.

Add the first two files in /etc/nginx/nginx.conf:
ssl_certificate      /etc/letsencrypt/live/eklausmeier.goip.de/fullchain.pem;
ssl_certificate_key  /etc/letsencrypt/live/eklausmeier.goip.de/privkey.pem;

Check with nginx -t.
If all is OK, then restart with systemctl restart nginx.
Final check is with Qualys SSL Labs:


		


Considerations on a Newsletter Program
Sun, 11 Feb 2024 17:40:00 +0100

1. Statement of the problem. This blog does not offer any newsletter functionality.
If a reader is interested to know whether I have posted new content, he must either use an RSS feed or directly visit this site.
WordPress offers the possibility of getting notified of new posts automatically.
I.e., a user can easily subscribe for new content.

On my old WordPress blog, https://eklausmeier.wordpress.com, I had 79 subscribers.
From their e-mail names, I would suspect that some of them were not really interested in my actual content but were a little bit spammy.
Nevertheless, many seemed to be legitimate.
There are a lot of professional newsletter services on the market.
For example:

https://www.mailjet.com
https://buttondown.email
https://mailchimp.com
https://omnisend.com

There are many more.
These solutions should be distinguished from mailing list software.
2. Data model. Initially, I thought of a single file used to store all information.
Something like this:
Handling of subscription-file: Read into a PHP hash table, change whatever needs change, and if there is a change required, e.g., new subscriber, then move the old file, and write a new file from the hash.
However, this file needs some protection using flock() to guard against simultaneous writing to it.
After some thought it seems more advantageous to use a simple SQLite file, i.e., a database, which already handles concurrency out of the box.
A single database table suffices. Henceforth this table is called subscription.



Nr.
Column
type
nullable
Example or meaning




1
email
text
not null
primary key, e.g., Peter.Miller@super.com


2
Firstname
text
null
e.g., Peter


3
Lastname
text
null
e.g., Miller


4
registration
date
not null
date of registration, e.g., 06-Feb-2024


5
IP
text
not null
e.g., 84.119.108.23, IP address of web client during initial subscription


6
status
int
not null
1=in-limbo
2=active
3=inactive
4=bounced during registration
5=bounced


7
token
text
not null
e.g., uIYkEk+ylks=
computed with
$token = base64_encode(random_bytes(8));



State diagram for status is as below.

graph LR
    A(1=in-limbo) --> B(2=active)
    B --> C(3=inactive)
    A --> D(4=bounced during registration)
    B --> E(5=bounced)

Create script for SQLite is like this:
drop table subscription;

create table subscription (
    email       text primary key,
    firstname   text,
    lastname    text,
    registration    date not null,
    IP          text not null,
    status      int not null,
    token       text not null
);

The following SQL statements will be used:

During sending out the newsletter: select email, firstname from subscription where status=1
New subscriber: insert into subscription (email,firstname,lastname,registration,IP,status,token) values (...)
Checking correct token: select token from subscription where email=:m
Updating status column: update subscription set status=:s where email=:m

The following columns could be added to better cope with malicious users.



Nr.
Column
type
nullable
Example or meaning




8
lastRegist
date
null
date of last registration, relevant only for multiple subscriptions for the same e-mail


9
lastIP
text
null
last used IP of the web client, when used for multiple subscriptions



3. Sketch of solution. Here are considerations and requirements for a simple newsletter software.

Programming this application in PHP is preferred as this can be installed on many hosting providers, which offer PHP, e-mail, DNS, etc.
Have one single database table, called subscription, see above.
Periodically reads incoming e-mails for new subscribers or unsubscription requests.
New subscribers add an entry to the subscription table.
Subscription requests will generate a random token, which is sent to the e-mail address.
Unsubscribe requests set the status column to inactive in the subscription table.
During deployment of a new post on the static site, or by manual start, send an e-mail to all recipients on the subscription table, which are active.
The IP address of the registrating web client is stored. With this we can defend against flooding of e-mail addresses, which all bounce. For example, this IP address can then be blocked in the firewall of the web-server.

The token does not need to be overly confidential.
Its purpose is to defend against funny/stupid/malicious actors, who want to unsubscribe people against their will.
Handling of e-mails: For reading e-mail you can use imap_headers(), for sending imap_mail().
Also see Sending email using PhpMailer with Gmail XOAUTH2, and Gmail Email Inbox using PHP with IMAP.
Subscribing to the mailing list works with an empty e-mail that states Subscribe in the subject line.
For unsubscribing you send Unsubscribe in the subject line and the token in the body part.
These two operations are also supported by a simple web-form, which essentially asks for the e-mail address and the token from the user and then sends the confirmation e-mail and sets the status in the subscription table.
Reading e-mails is done every 20 minutes, e.g., controlled by cron.
The reading process then analyses the subject field for Subscribe and Unsubscribe.
This process also checks for any bounces.
In case of a bounce the status flag is set to either bounced or bounced during registration.
No distinction is made betweeen hard or soft bounces.
A subscription request makes an entry in the subscription table and sets the status column to in-limbo.
The sender receives an e-mail, which he must confirm by e-mail or web form.
Once the confirming e-mail is received or the web form is used to confirm then the status column is set to active.
If a new subscription request is made with an already existing e-mail address then a new token is generated and sent, and the status remains its previous status, e.g., it might remain active or in-limbo.
If a malicious user subscribes to multiple e-mail addresses, which he does not own, then all these e-mail addresses are set to in-limbo.
If the legitimate user now wants to subscribe, he can do so without fuss, because new tokens are sent out for any subscription requests.
This prevents that e-mail addresses are blocked, which are not confirmed.
4. Web forms. The HTML form for processing subscribe and unsubscribe requests looks very simple:

First name: Arnold

Last name:   Schwarzenegger

E-mail address: Arnold.Schwarzenegger@Terminator.org

Token:   uIYkEk+ylks= (only required for unsubscribe)

                  


Changing your e-mail address is done by subscribing to the new address, and then unsubscribing from the old one.


If you have lost or deleted the token for unsubscribing, then simply subscribe again with the same e-mail address.
A token will be sent to you, which you then can use for unsubscribing.

While the e-mail address is mandatory, the first and last name are optional.
The actual e-mailing can be done with below simple HTML form:

Greeting:   Firstname will be taken

Content:   Your content

                    

The following e-mails are sent depending on the circumstances:

Once a user has entered his name and e-mail on the HTML form, he will be sent an e-mail to confirm his e-mail address with the generated token.
If the user has unsubscribed from the mailing list, he will receive a confirmation e-mail, which confirms that he has unsubscribed. If the token is wrong then no e-mail will be sent.
The actual content is sent to all members stored in the subscription table, which are active. I.e., this is the whole purpose of maintaining this e-mail list.

5. Effort estimation. I expect the whole code for this to be no more than 1kLines of PHP code.
I expect the following PHP programs/files:

Handling the web form.
Run through cron and checking for new subscription or unsubscription requests. Checking for bounces.
Configurations for user-id, password, and hostname for e-mail host.
Sending an e-mail to each recipient in the subscription table, either by using a web form, or via command-line, taking a text file as input.

Possible problems ahead due to hosting limitations:

If you want to use Google Mail as mail provider you will encounter their limit of 500 mails per day.
Yahoo seems to have a limit of 500 mails per day.
Outlook also has a 500 mails per day limit.
IONOS imposes a 500 mails per hour limit.
Hetzner similarly restricts to 500 mails per hour.
Amazon SES has a limit of 200 mails per day

To counter above limits somewhat, you can split your e-mails into batches, i.e., send 500 e-mails the first hour, then another 500 mails the next hour.
For this you need an additional table, which stores the batch-number, and the message text to be sent.
Obviously, you will not actually send 500 e-mails, but rather 450 or so, to cope for the confirmation mails for new subscribers or unsubscribers.
I am quite surprised that a Google search didn't reveal any program, which already does something similar.
The most resembling is this phpList.

		


Stabilität und Polynome
Sat, 10 Feb 2024 11:00:00 +0100

1. Satz:  Stabilitätskriterium von Routh/Hurwitz,
nach Routh, Edward John (1831--1907),
Hurwitz, Adolf (1859--1919).
Voraussetzungen: Es sei

$$
    p(z) = a_0z^n + a_1z^{n-1} + \cdots + a_{n-1}z + a_n
         = a_0 (z - \lambda_1) \ldots (z - \lambda_n)
$$

ein beliebiges komplexes Polynom mit Koeffizienten $a_i\in\mathbb{C}$ und
Nullstellen $\lambda_i\in\mathbb{C}$.
Weiter sei

$$
\displaylines{
    \Delta_1 = a_1, \qquad
    \Delta_2 = \left|\matrix{a_1&a_3\cr a_0&a_2\cr}\right|, \qquad
    \Delta_3 = \left|\matrix{a_1&a_3&a_5\cr a_0&a_2&a_4\cr 0&a_1&a_3\cr}\right|, \quad\ldots, \cr
    \Delta_n = \left|\matrix{
        a_1 & a_3 & \ldots\cr
        a_0 & a_2 & \ldots\cr
        & a_1 & a_3 & \ldots\cr
        & a_0 & a_2 & \ldots\cr
        && a_1 & a_3 & \ldots\cr
        && a_0 & a_2 & \ldots\cr
        &&& \ddots & \ddots\cr
        & 0 &&& a_1 & a_3 & \ldots\cr
        &   &&& a_0 & a_3 & \ldots\cr
    }\right|, \cr
}
$$

mit der Vereinbarung $a_{n+1}=a_{n+2}=\cdots=0$.
Behauptung: $\mathop{\rm Re}\nolimits \lambda_i<0$ genau dann, wenn

$$
    a_0\Delta_1\gt 0,\:  \Delta_2\gt 0,\:  a_0\Delta_3\gt 0,\:  \Delta_4\gt 0,\:  \ldots,\: 
    \cases{a_n\Delta_n\gt 0, & $n$ gerade,\cr  \Delta_n\gt 0, & $n$ ungerade.\cr}
$$

Für $a_0>0$ also $\Delta_i>0$, $i=1,\ldots,n$.
Beweis: Siehe das Buch von Gantmacher, Felix Ruvimovich (1908--1964), Gantmacher (1986), §16.6,
"Matrizentheorie",
Springer-Verlag,
Berlin Heidelberg New York Tokyo, Übersetzung aus dem Russischen von
Helmut Boseck, Dietmar Soyka und Klaus Stengert, 1986, 654 S.
    ☐
Der obige Satz ist ein Spezialfall des allgemeinen Satzes von Routh/Hurwitz,
der es erlaubt die genaue Anzahl der Nullstellen mit echt negativen Realteil
genau anzugeben.
Der folgende Satz von Liénard/Chipart
aus dem Jahre 1914 hat gegenüber dem Stabilitäskriterium von
Routh/Hurwitz den Vorteil, nur
etwa halb so viele Minoren auf ihr Vorzeichen zu untersuchen.
2. Satz:  Stabilitätskriterium von Liénard/Chipart
nach Chipart, A.H.,
Liénard, Alfred-Marie (1869--1958).
Behauptung: $\mathop{\rm Re}\nolimits \lambda_i<0$ ist äquivalent zu einer der folgenden 4 Aussagen:
(1)     $a_n>0$, $a_{n-2}>0$, $\ldots$; $\Delta_1>0$, $\Delta_3>0$, $\ldots$,
(2)     $a_n>0$, $a_{n-2}>0$, $\ldots$; $\Delta_2>0$, $\Delta_4>0$, $\ldots$,
(3)     $a_n>0$, $a_{n-1}>0$, $a_{n-3}>0$, $\ldots$; $\Delta_1>0$, $\Delta_3>0$, $\ldots$,
(4)     $a_n>0$, $a_{n-1}>0$, $a_{n-3}>0$, $\ldots$; $\Delta_2>0$, $\Delta_4>0$, $\ldots.$
Beweis: Siehe erneut Gantmacher (1986),
§16.13.     ☐
Für die Überprüfung eines vorgelegten Polynoms wählt man dann zweckmässigerweise
von den vier Bedingungen diejenige, sodaß $\Delta_{n-1}$ oder $\Delta_n$
die geringere Zeilenzahl hat.

		


Die Formel von Faà di Bruno
Fri, 09 Feb 2024 21:00:00 +0100

Die Formel von Faà di Bruno, Faà di Bruno, Francesco (1825--1888),
verallgemeinert die Kettenregel auf die Form für beliebig hohe Ableitungen.
1. Satz:  Formel von Faà di Bruno
Es hänge $w$ von $u$ ab, $u$ ist hierbei Funktion von $x$.
Es sei $D_x^k u$ die $k$-te Ableitung von $u$ nach $x$.
Dann gilt

$$
    D_x^n w = \sum_{j=0}^n \sum_{\scriptstyle{k_1+k_2+\cdots+k_n=j}\atop
        {\scriptstyle{k_1+2k_2+\cdots+nk_n=n}\atop
        \scriptstyle{k_1,k_2,\ldots,k_n\ge0}}}
        {n!{\mskip 3mu} D_u^j w\over k_1! (1!)^{k_1} \cdots k_n! (n!)^{k_n}}
        (D_x^1 u)^{k_1} \ldots D_x^n u)^{k_n}.
$$

Beweis: Siehe Knuth, Donald Ervin (*1938),
The Art of Computer Programming,
Volume 1 -- Fundamental Algorithms,
Addison-Wesley Publishing Company, Reading (Massachusetts) Menlo Park
(California) London Sydney Manila, 1972, second printing,
xxi+634 S.
Siehe  McEliece im o.a. Buch von Knuth,
McEliece, Robert James.
Bezeichnet $c(n,j,k_1,k_2,\ldots)$ den Bruchterm, so rechnet man durch
Differenzieren

$$
\eqalignno{
    c(n+1,j,k_1,\ldots){}={}& c(n,j-1,k_2,\ldots)\cr
    & {}+(k_1+1){\mskip 3mu}c(n,j,k_1+1,k_2-1,k_3,\ldots)\cr
    & {}+(k_2+1){\mskip 3mu}c(n,j,k_1,k_2+1,k_3-1,k_4,\ldots) + \ldots {\mskip 3mu}.
}
$$

Hierbei ist es von Vorteil unendlich viele $k_i$ anzunehmen, obwohl
$k_{n+1}=k_{n+2}=\cdots=0$.
Im Induktionsschritt sind $k_1+\cdots+k_n=j$ und $k_1+2k_2+\cdots+nk_n=n$
Invarianten.
Man kann nun $n! / k_1! (1!)^{k_1} k_2! (2!)^{k_2}\ldots$ kürzen und
gelangt dann zu $k_1+2k_2+\cdots=n+1$.
Man vgl. auch Bourbaki und Schwartz. 
    ☐

		


Taylorformel für Vektorfunktionen
Thu, 08 Feb 2024 21:00:00 +0100

Aus dem Eindimensionalen sind das Lagrangesche und Schlömilchsche Restglied bekannt.
Lagrange, Joseph Louis (1736--1813),
Schlömilch, Otto (1823--1901).

$$
\eqalignno{
    f(x) &= \sum_{k=0}^n {f^{(k)}(a)\over k!}(x-a)^k
            + {1\over n!}\int_a^x (x-t)^n f^{(n+1)}(t) dt\cr
         &= \sum_{k=0}^n {f^{(k)}(a)\over k!}(x-a)^k
            + {f^{(n+1)}(\xi)\over(n+1)!}(x-a)^{n+1} \qquad\hbox{(Lagrange)}\cr
         &= \sum_{k=0}^n {f^{(k)}(a)\over k!}(x-a)^k
            + o(\left|x-a\right|^n)\cr
         &= \sum_{k=0}^n {f^{(k)}(a)\over k!}(x-a)^k
            + {f^{(n+1)}(\xi)\over p\cdot n!}(x-\xi)^{n+1-p} (x-a)^p.
            \qquad\hbox{(Schlömilch)}\cr
}
$$

Diese Darstellungen für $f$ lassen sich für vektorwertige Funktionen
entsprechend verallgemeinern.
Wie im Eindimensionalen liegt auch hier wieder das Schwergewicht auf der
Gewinnung von Restgliedformeln, oder mit den Worten von Mangoldt und Knopp:
(Mangoldt, Hans Carl Friedrich von (1854--1925,
Knopp, Konrad Hermann Theodor (1882--1957))

Ausdrücklich sei noch einmal betont, daß der wesentliche Inhalt des
Taylorschen Satzes nicht darin besteht, daß ein Ansatz der Form


$$f(x_0+h)=f(x_0)+{f'(x_0)\over1!}h+{f''(x_0)\over2!}h^2+\cdots+
        {f^{(n)}(x_0)\over n!}h^n+R_n
    $$


überhaupt gemacht werden kann.
Das ist vielmehr unter der alleinigen Voraussetzung, daß $f^{(n)}(x_0)$
existiert, für jedes seinem Betrage nach hinreichend kleines $h$ unter
allen Umständen möglich. $\ldots$ $R_n$ ist lediglich eine abkürzende
Bezeichnung für die Differenz der linken Seite und der Summe dieser
$(n+1)$ ersten Summanden der rechten Seite.
Das Schwergewicht des Problems und damit der allein wesentliche
Inhalt des Taylorschen Satzes liegt ausschließlich in den Aussagen, die
über dieses Restglied gemacht werden können.

1. Defintion:  (Multiindizes)
Für $\alpha=(\alpha_1,\ldots,\alpha_n)\in\mathbb{N}^n$ sei die Ordnung eines
Multiindex und die
Multifakultät definiert zu

$$
    \left|\alpha\right| := \alpha_1+\cdots+\alpha_n, \qquad
    \alpha! := \alpha_1! \alpha_2! \cdot\ldots\cdot \alpha_n!
$$

Ist $f$ eine $\left|\alpha\right|$-mal stetig differenzierbare Funktion,
so sei die ^{Multiableitung} gesetzt zu

$$
    D^\alpha f := D_1^{\alpha_1} D_2^{\alpha_2} \ldots D_n^{\alpha_n} f =
    {\partial^{\left|\alpha\right|} f\over
        \partial x_1^{\alpha_1} \cdots \partial x_n^{\alpha_n} },
$$

insbesondere $D_i^{\alpha_i}=D_i\ldots D_i$ ($i$ mal).
Die ^{Multipotenz} für einen Vektor $x$ ist

$$
    x^\alpha := x_1^{\alpha_1} x_2^{\alpha_2} \cdot\ldots\cdot x_n^{\alpha_n}{\mskip 5mu}.
$$

Nach dem Satz von H.A. Schwarz,
Schwarz, Hermann Armandus (1843--1921),
ist die Reihenfolge des Differenzierens nach verschiedenen Variablen
unerheblich, bei genügend glatter Funktion $f$.
2. Lemma:  Es gilt

$$
    (x_1+x_2+\cdots+x_n)^k = \sum_{\left|\alpha\right|=k} {k!\over\alpha!} x^\alpha,
    \qquad\forall k\in\mathbb{N}.
$$

Beweis:  Durch Induktion nach $n$, wenn man die Binomische Formel
voraussetzt.
Man rechnet über Induktion nach $k$, wenn man dies nicht benutzen will.
Für $n=1$ ist die Behauptung klar.
Für den Induktionsschluß klammert man $[x_1+(x_2+\cdots+x_n)]^k$.
    ☐
Entsprechend gilt

$$
    p(x) := (h_1x_1+\cdots+h_nx_n)^k =
    \sum_{\left|\alpha\right|=k} {k!\over\alpha!} h^\alpha x^\alpha,
$$

also

$$
    p(D)f = \left( \sum_{i=1}^n h_iD_i \right)^k f =
    \sum_{\left|\alpha\right|=k} {k!\over\alpha!} D^\alpha f{\mskip 3mu}h^\alpha.
$$

Generalvoraussetzung: $f\colon U\subset\mathbb{R}^n\rightarrow\mathbb{R}$
sei $k$-mal stetig differenzierbar auf der offenen Menge $U$.
Es sei $x\in U$ und $h\in\mathbb{R}^n$ derart, daß $x+th\in U$,
$\forall t\in[0,1]$.
Es sei $g\colon[0,1]\rightarrow\mathbb{R}$, mit $g(t):=f(x+th)$.
3. Hilfssatz:  Die Funktion $g$ ist $k$-mal stetig
differenzierbar und

$$
     g^{(k)}(t) = \sum_{\left|\alpha\right|=k} {k!\over\alpha!} D^\alpha f(x+th){\mskip 3mu}h^\alpha.
$$

Beweis:
Induktion nach der Ordnung des Multiindex, also nach $k$.
Für $k=1$ ist nach der Kettenregel

$$
    g'(t) = \mathop{\rm grad} f(x+th)\cdot h = \sum_{i=1}^n D_i f(x+th){\mskip 3mu}h.
$$

Induktionsschluß von $(k-1)\rightarrow k$:

$$
    g^{(k-1)}(t) = \sum_{\left|\alpha\right|=k-1} {(k-1)!\over\alpha!} h^\alpha{\mskip 3mu} D^\alpha f(x+th)
    = \underbrace{\left[\sum_{i=1}^n (h_i D_i)^{k-1} f\right]}_{=:{\mskip 5mu}S} (x+th);
$$

Anwenden der Kettenregel und des Lemmas liefert

$$
    g^{(k)}(t) = \left[\left(\sum_{i=1}^n h_i D_i\right) S\right] (x+th)
    = \left[\left(\sum_{i=1}^n h_i D_i\right)^k f \right] (x+th)
    = \sum_{\left|\alpha\right|=k} {k!\over\alpha!} h^\alpha \left(D^\alpha f\right)(x+th).
$$

    ☐
4. Satz:  Satz von Taylor,
Taylor, Brook (1685--1731).
Es sei $f$ jetzt sogar $(k+1)$-mal stetig differenzierbar.
Dann existiert ein $\theta\in[0,1]$, so daß

$$
    f(x+h) = \sum_{\left|\alpha\right|\le k} {D^\alpha f(x)\over\alpha!} h^\alpha
        + \sum_{\left|\alpha\right|=k+1} {D^\alpha f(x+\theta h)\over\alpha!} h^\alpha.
$$

Beweis: $g$ ist wie $f$ mindestens $(k+1)$-mal stetig differenzierbar.
Nach der Taylorformel für eine Veränderliche existiert ein
$\theta\in[0,1]$, so daß

$$
    g(1) = \sum_{m=0}^k {g^{(m)}(0)\over m!} + {g^{(k+1)}(\theta)\over(k+1)!}.
$$

Einsetzen der im Hilfssatz ermittelten Formeln liefert unmittelbar das Ergebnis.
    ☐
5. Corollar:  Es sei $f$ mindestens $k$-mal stetig differenzierbar
und es sei $h$ hinreichend klein.
Dann gilt

$$
    f(x+h) = \sum_{\left|\alpha\right|\le k} {D^\alpha f(x)\over\alpha!} h^\alpha
        + o(\left\|h\right\|^k),
$$

dabei steht $o(\left|h\right|^k)$ als Abkürzung für eine
Funktion $\varphi$ mit $\varphi(0)=0$ und

$$
    \lim_{\scriptstyle h\to0\atop\scriptstyle h\ne0}
        {\varphi(h)\over\left\|h\right\|^k} = 0.
$$

Beweis:  Nach dem vorhergehenden Satz gibt es ein von $h$ abhängiges
$\theta\in[0,1]$, mit

$$
    f(x+h) = \sum_{\left|\alpha\right|\le k+1} {D^\alpha f(x)\over\alpha!} h^\alpha
        + \sum_{\left|\alpha\right|=k} {D^\alpha f(x+\theta h)\over\alpha!} h^\alpha
    = \sum_{\left|\alpha\right|\le k-1} {D^\alpha f(x)\over\alpha!} h^\alpha
        + \sum_{\left|\alpha\right|=k} r_\alpha(h){\mskip 3mu}h^\alpha,
$$

wobei

$$
    r_\alpha(h) = {D^\alpha f(x+\theta h) - D^\alpha f(x)\over\alpha!}.
$$

Wegen der vorausgesetzten Stetigkeit von $D^\alpha f$
verschwindet $r_\alpha(\cdot)$ bei 0, also
$\displaystyle\lim_{h\to0} r_\alpha(h)=0$.
Setzt man

$$
    \varphi(h) := \sum_{\left|\alpha\right|=k} r_\alpha(h){\mskip 3mu}h^\alpha,
$$

so folgt $\displaystyle\lim_{h\to0} {\varphi(h) / \left|h\right|^k} = 0$,
d.h. $\varphi(h)=o(\left|h\right|^k)$, denn

$$
    {\left|h^\alpha\right|\over\left\|h\right\|^k}
    = { \left|h_1^{\alpha_1}\ldots h_n^{\alpha_n}\right| \over
        \left\|h\right\|^{\alpha_1}\ldots\left\|h\right\|^{\alpha_n} }
    \le 1, \qquad\hbox{für}\quad \left|\alpha\right| = k.
$$

    ☐
Der Satz von Taylor im $\mathbb{R}^m$ entsteht durch komponentenweise Anwendung
der vorherigen Resultate.
Man benötigt allerdings $m$ möglicherweise verschiedene Zwischenstellen.
6. Beispiel:  Es sei $f\colon\mathbb{R}\rightarrow\mathbb{R}^3$ mit
$f(t):=(\sin t,{\mskip 3mu}\cos t,{\mskip 3mu}t)$.
Dann ist $f'(t)=(\cos t,{\mskip 3mu}-\sin t,{\mskip 3mu}1)$ und wenn man nur eine einzige
Zwischenstelle zulässt erhält man den Widerspruch

$$
    f(2\pi)-f(0) = f'(\xi)(2\pi-0) = 2\pi\pmatrix{\cos\xi\cr -\sin\xi\cr 1\cr}
    = \pmatrix{0\cr 0\cr 2\pi\cr}.
$$

Aus $\cos\xi=0=\sin\xi$ folgt $\cos^2\xi+\sin^2\xi=0$.
Literatur: Otto Forster (*1937): Analysis 2.

		


Differentiation von Matrizen und Determinanten
Wed, 07 Feb 2024 07:00:00 +0100

Wie differenziert man Determinanten, die von einem Parameter abhängen?
1. Satz:  Voraussetzungen: Es seien $a_{ij}(\lambda)$ differenzierbare
Funktionen.
Es sei

$$
\def\multisub#1#2{{\textstyle\mskip-3mu{\scriptstyle1\atop\scriptstyle#2_1}{\scriptstyle2\atop\scriptstyle#2_2}{\scriptstyle\ldots\atop\scriptstyle\ldots}{\scriptstyle#1\atop\scriptstyle#2_#1}}}
\def\multisup#1#2{{\textstyle\mskip-3mu{\scriptstyle#2_1\atop\scriptstyle1}{\scriptstyle#2_2\atop\scriptstyle2}{\scriptstyle\ldots\atop\scriptstyle\ldots}{\scriptstyle#2_{#1}\atop\scriptstyle#1}}}
\def\multisubsup#1#2#3{{\textstyle\mskip-3mu{\scriptstyle#3_1\atop\scriptstyle#2_1}{\scriptstyle#3_2\atop\scriptstyle#2_2}{\scriptstyle\ldots\atop\scriptstyle\ldots}{\scriptstyle#3_{#1}\atop\scriptstyle#2_{#1}}}}
    A(\lambda) = \left|\matrix{
        a_{11}(\lambda) & \ldots & a_{1n}(\lambda)\cr
        \vdots & \ddots & \vdots\cr
        a_{n1}(\lambda) & \ldots & a_{nn}(\lambda)\cr
    }\right| = \det(a_1,\ldots,a_n),
$$

ferner

$$
    \alpha\multisubsup rik = (-1)^{i_1+\cdots+i_r + k_1+\cdots+k_r}
        A\multisubsup r{i'}{k'},
$$

insbesondere $\displaystyle{
\alpha_i^j = (-1)^{i+j} A_{1\ldots\widehat\imath\ldots n}^{1\ldots\widehat\jmath\ldots n}.
}$
Behauptung:

$$
\displaystyle{{\partial\over\partial\lambda}A =
(\alpha_{11},\ldots,\alpha_{nn}) \pmatrix{a_{11}'\cr \vdots\cr a_{nn}'\cr}
= \sum_{i,j=1}^n \alpha_i^j a_{ij}' }
= \sum_{i=1}^n \det(a_1,\ldots,a_{i-1},a_i',a_{i+1},\ldots,a_n)
$$

Beweis:  Entwickelt man $A(\lambda)$ nach dem Laplaceschen
Entwicklungssatz nach der $i$-ten Zeile, so erkennt man
$\partial A/\partial(a_{ij}) = \alpha_i^j$.
Anwenden der Kettenregel liefert die mittleren Identitäten.
Die letzte Identität ist nur eine Umsortierung der vorherigen (Laplacescher
Entwicklungssatz rückwärts gelesen).
    ☐
Man vgl. auch Bourbaki (1976):
"Éléments de mathématique: Fonctions d'une variable réelle -- Théorie élémentaire",
Hermann, Paris, 1976, 54+38+69+46+55+31+38 S. = 331 S.
2. Die Jacobimatrizen einiger Matrizenfunktionen, wie Spur,
Determinante, Matrizenprodukt.
Es sei $y=f(x_{11},\ldots,x_{1n},x_{21},\ldots,x_{2n},\ldots,x_{m1},\ldots,x_{mn})$
eine reelle Funktion in $mn$ Veränderlichen, also $y=f(X)$.
Es bezeichne

$$
    {dy\over dX} := \left(\partial y\over\partial x_{ij}\right)
        _{\scriptstyle{i=1,\ldots,m}\atop\scriptstyle{j=1,\ldots,n}} .
$$

Im Falle $X=(x_1,\ldots,x_n)$ ist ${{dy\over dX}=\nabla y}$.
3. Satz: (1)     $\displaystyle{{d{\mskip 5mu}ax\over dx} = a}$,    
$\displaystyle{{d{\mskip 5mu}x^\top Ax\over dx} = 2Ax}$,     ($A=A^\top$).
(2)     $\displaystyle{{d{\mskip 5mu}\ln\det X\over dX} = (X^\top)^{-1}}$,    
$\displaystyle{{d{\mskip 5mu}\det X\over dX} = (\det X)^{-1} (X)^{-1}}$.
(3)     $\def\tr{\mathop{\rm tr}}\displaystyle{{d{\mskip 5mu}\tr X^{-1}A\over dX} = -(X^{-1} A X^{-1})^\top}$.
Beweis: (1) ist klar.
Bei (2) beachte man

$$
    {\partial\over\partial x_{ij}}\det X =
    \alpha_i^j = (-1)^{i+j} X_{1\ldots\hat\imath\ldots n}^{1\ldots\hat\jmath\ldots n}
$$

entsprechend

$$
    {\partial\over\partial x_{ij}}\ln\det X = {1\over\det X} \alpha_i^j.
$$

Zu (3): Es gelten

$$
    {d{\mskip 3mu}X^{-1}\over dx_{ij}} = -X^{-1} E_{ij} X^{-1}, \qquad
    \tr E_{ij} B = b_{ji}, \qquad
    {d{\mskip 3mu}\tr B\over dx} = \tr{dB\over dx}.
$$

    ☐

		


Holomorphe Matrixfunktionen
Tue, 06 Feb 2024 11:00:00 +0100


1. Integraldefinition
2. Homomorphismus in obere Dreiecksmatrizen

1. Integraldefinition
1. Sei $f$ eine geeignet gewählte holomorphe Funktion.
Dann definiert man für eine quadratische Matrix $A$ die
Matrixfunktion $f(A)$ zu

$$
    f(A) := {1\over2\pi i}\int_\Gamma f(\lambda) (I\lambda-A)^{-1} d\lambda.
$$

Wegen des Satzes von Cauchy,
Cauchy, Augustin Louis (1789--1857),
hängt $f(A)$ nicht von der Wahl der Kurve $\Gamma$ ab.
Offensichtlich ist $S^{-1}f(A)S=f(S^{-1}AS)$, für jede invertierbare
$(n\times n)$-Matrix $S$.
Ohne Einschränkung kann man deshalb $A$ bei den weiteren Überlegungen als
Jordanmatrix, Jordan, Camille (1838--1922), voraussetzen.
Also $A = J = \mathop{\rm diag}(J_\nu)_{\nu=1}^k$, wobei $J_\nu$ Jordanblock ist.
Es ist

$$
    f(J) = {1\over2\pi i}\int_\Gamma f(\lambda) (I\lambda-J)^{-1} d\lambda
    = \mathop{\rm diag}_{\nu=1}^k \left({1\over2\pi i}\int_\Gamma f(\lambda) (I\lambda-J_\nu)^{-1} d\lambda\right)
    = \mathop{\rm diag}_{\nu=1}^k f(J_\nu).
$$

Viele Behauptungen reduzieren sich damit also sogar lediglich auf die
Betrachtung eines einzelnen Jordanblockes $J_\nu$, mit
$J_\nu=\lambda_0\delta_{xy}+\left(\delta_{x+1,y}\right)_{x,y=1}^m$.
2. Sei nun $J$ Jordan-Block der Größe $k\times k$ zum Eigenwert $\lambda_0$.
Dann gilt

$$
  f(J) = \pmatrix{
    f(\lambda_0) & {1\over1!}f'(\lambda_0) & \ldots & {1\over(k-1)!}f^{(k-1)}(\lambda_0)\cr
    0            & f(\lambda_0)            & \ldots & \cr
    \vdots       & \vdots                  & \ddots & \vdots\cr
    0            & 0                       & \ldots & f(\lambda_0)\cr
  }
$$

Insbesondere für die spezielle Funktion $f(\lambda):=\lambda^n$ ergibt sich

$$
  J^n = \pmatrix{
    \lambda^n & {n\choose1}\lambda^{n-1} & \ldots & {n\choose k-1}\lambda^{n-k+1}\cr
    0         & \lambda^n                & \ldots & \cr
    \vdots    & \vdots                   & \ddots & \vdots\cr
    0         & 0                        & \ldots & \lambda^n\cr
  },
$$

wobei $\lambda^{-j}:=0$, für $j\in\mathbb{N}$.
3. Diese Darstellungen finden ihre Begründung durch den folgenden Satz,
obwohl für den Fall $\lambda^n$ die Darstellung auch leicht direkt
unter Benutzung von $J^n = (\lambda I + N)^n$, mit geeignetem
Nilpotenzblock $N$ und der binomischen Formel bewiesen werden kann.
Man braucht dann nicht den ganzen Weg über Matrizenfunktionen zu gehen.
Möchte man die Integraldarstellung stärker berücksichtigen rechnet man wie
folgend.
Allgemein ist $f(A)=(1/2\pi i)\int_\Gamma f(z)(Iz-A)^{-1}dz$.
Entwicklung des Cauchy-Kernes liefert

$$
    (Iz-A)^{-1} = {1\over z} \sum_{\nu=0}^\infty \left(A\over z\right)^\nu,
    \qquad \mathopen|z\mathclose| \gt  \rho(A).
$$

Dann berechnet man das Residuum durch Vertauschen von Integration und
Summation zu

$$
    {1\over2\pi i} \int_\Gamma z^k (Iz-A)^{-1} dz
    = {1\over2\pi i} \int_\Gamma z^k {1\over z}
        \left(I+{A\over z}+\cdots+{A^k\over z^k}+\cdots\right) dz
    = A^k.
$$

4. Satz:  Es ist

$$
    {1\over2\pi}\int_\Gamma (I\lambda-A)^{-1}d\lambda = I,\qquad
    {1\over2\pi}\int_\Gamma \lambda(I\lambda-A)^{-1}d\lambda = A.
$$

Sind $f$ und $g$ holomorph auf (möglicherweise verschiedenen) Umgebungen des
Spektrums von $A$, so gilt

$$
    (\alpha f+\beta g)=\alpha f(A)+\beta g(A),\qquad
    (f\cdot g)(A)=f(A){\mskip 3mu}g(A).
$$

Beweis:  Es genügt, w.o. bemerkt, sich auf ein einziges
Jordankästchen $J$ der Größe $m\times m$ zu beschränken.
Es sei $\Gamma$ ein positiv orientierter Kreis um $\lambda_0$.
Es ist

$$
\eqalign{
    (I\lambda-J)^{-1} &= {I\over\lambda-\lambda_0} + {N\over(\lambda-\lambda_0)^2}
        + \cdots + {N^{m-1}\over(\lambda-\lambda_0)^m} \cr
    &= \pmatrix{
        (\lambda-\lambda_0)^{-1} & (\lambda-\lambda_0)^{-2} & \ldots & (\lambda-\lambda_0)^{-m}\cr
                                 & \ddots                   & \ddots & \vdots\cr
                                 &                          & \ddots & (\lambda-\lambda_0)^{-2}\cr
                          0      &                          &        & (\lambda-\lambda_0)^{-1}\cr
      }, \cr
}
$$

wobei $N = (\delta_{x+1,y})_{x,y}^m$, also $N^m=0\in\mathbb{C}^{m\times m}$ ist.
Wegen $\int_\Gamma d\lambda/(\lambda-\lambda_0)=2\pi i$, und
$\int_\Gamma (\lambda-\lambda_0)^k d\lambda=0$, für $k\in\mathbb{Z}\setminus\{-1\}$
gilt offensichtlich ${1\over2\pi i}\int_\Gamma (I\lambda-J)d\lambda=I$ und

$$
    {1\over2\pi i}\int_\Gamma \lambda{\mskip 3mu}(I\lambda-J)^{-1}d\lambda =
    {1\over2\pi i}\int_\Gamma \left((\lambda-\lambda_0)+\lambda_0\right)(I\lambda-J)^{-1}d\lambda =
    N + I\lambda_0 = J.
$$

Die additive Linearität ist klar.
Für die multiplikative Aussage schließt man:
Ist $f(\lambda)=\sum_{k=0}^\infty (\lambda-\lambda_0)^k f_k$ und
$g(\lambda)=\sum_{k=0}^\infty (\lambda-\lambda_0)^k g_k$, so ist
$f(\lambda)g(\lambda)=\sum_{k=0}^\infty (\lambda-\lambda_0)^k h_k$, mit
$h_k=\sum_{i=0}^k f_i g_{k-i}$.
Folglich

$$
\eqalign{
    f(J){\mskip 3mu}g(J) &= \pmatrix{
        f_0 & f_1 & \ldots & f_{m-1}\cr
            & \ddots &     & \vdots\cr
        0   &        & \ddots & f_1\cr
            &        &     & f_0\cr}
    \cdot \pmatrix{
        g_0 & g_1 & \ldots & g_{m-1}\cr
            & \ddots &     & \vdots\cr
        0   &        & \ddots & g_1\cr
            &        &     & g_0\cr} \cr
    &= \pmatrix{
        h_0 & h_1 & \ldots & h_{m-1}\cr
            & \ddots &     & \vdots\cr
        0   &        & \ddots & h_1\cr
            &        &     & h_0\cr}
    = (f\cdot g)(J). \cr
}
$$

    ☐
Mit der Darstellung für $J^n$ ergibt sich leicht der folgende Sachverhalt.
5. Satz:  Sei $J$ eine beliebige Jordanmatrix.
Dann gelten:
(1) $J^n\to0$ genau dann, wenn $\left|\lambda\right| < 1$.
(2) $\sup_{n=1}^\infty|J^n|\le\rm const$ genau dann, wenn
$\left|\lambda\right| \le 1$ und
zu Eigenwerten vom Betrage 1 nur lineare Elementarteiler gehören, also
die Jordanblöcke zum Eigenwert 1 stets von der Größe $(1\times 1)$ sind.
Wegen $A=XJY$, $Y=X^{-1}$ und damit $A^n=XJ^nY$ und wegen
$\left|A^n\right|\le\left|X\right|\cdot\left|J^n\right|\cdot\left|Y\right|$,
erhält man daher für eine beliebige quadratische Matrix $A$ den folgenden
Satz.
6. Satz:  Seien $\lambda_i$ für $i=1,\ldots k$, die Eigenwerte
der Matrix $A$.
Dann gelten
(1) $\def\mapright#1{\mathop{\longrightarrow}\limits^{#1}}|A|\mapright{n\to\infty}0$ genau dann, wenn $|\lambda_i|<1$, für alle
$i=1,\ldots,k$, und
(2) $|A^n|$ beschränkt für alle $n\in\mathbb{N}$ genau dann, wenn
$|\lambda_i|\le1$ und zu Eigenwerten vom Betrage 1, nur
$(1\times 1)$-Jordanblöcke korrespondieren.
7. Bemerkung:  Es gelten die Äquivalenzen

$$
    \rho(A)\lt 1 \iff A^n\to0 \iff \sum_{n=0}^\infty A^n = (I-A)^{-1}
    \iff \left|\sum_{n=0}^\infty A^n\right|\lt \infty .
$$

Beweis:  Zu: $\sum_{n=0}^\infty A^n=(I-A)^{-1}$, falls
$\rho(A)<1$.
Ist $\lambda$ Eigenwert von $A$, so ist $(1-\lambda)$ Eigenwert von
$(I-A)$.
Wegen $|\lambda|<1$, ist $(I-A)$ invertierbar.
Weiter

$$
\eqalign{
    & I = (I-A)(I+A+\cdots+A^n)+A^{n+1}{\mskip 3mu} \cr
    \Rightarrow{\mskip 3mu} &
    (I-A)^{-1} = (I+A+\cdots+A^n)+(I-A)^{-1}A^{n+1}. \cr
}
$$

Somit gilt für alle $n\in\mathbb{N}$

$$
    \bigl|(I-A)^{-1}-(I+A+\cdots+A^n)\bigr| \le
    \left|(I-A)^{-1}\right|\cdot\left|A^{n+1}\right|
$$

und damit folgt wegen $A^n\to0$, die Behauptung.
Die Rückrichtung $\rho(A)<1$, falls $\sum A^n = (I-A)^{-1}$
ist klar aufgrund der notwendigen Konvergenzbedingung für die Reihe.
Die restlichen Äquivalenzen ergeben sich u.a. mit Hilfe des vorhergehenden
Satzes und sind offensichtlich.
    ☐
8. Eine andere Anwendung für die Darstellung von $J^n$, ist die
Lösungsdarstellung für homogene, lineare Differenzengleichungen mit
konstanten Koeffizienten.
Durch Übergang von der Begleitmatrix zur Jordanmatrix erkennt man dann
recht schnell die Lösungsdarstellung für die Differenzengleichung.
Es ist

$$
% \begingroup\let\oldleft=\left \let\oldright=\right \def\left#1{\oldleft|} \def\right#1{\oldright|}
\begin{vmatrix}
    && \leftarrow\lambda & \leftarrow\lambda & \ldots & \leftarrow\lambda & \leftarrow\lambda\cr
    &I\lambda & -I & 0 & \ldots & 0 & 0\cr
    &0 & I\lambda & -I & \ldots & 0 & 0\cr
    &\vdots & \vdots & \vdots & \ddots & \vdots & \vdots\cr
    & &&&& I\lambda & -I\cr
    &A_0 & A_1 & & \ldots & A_{\ell-1} & I\lambda+A_{\ell-1}\cr
\end{vmatrix}
% \endgroup
= \left|\matrix{
    0 & -I & 0 & \ldots & 0\cr
    0 & 0 & -I & \ldots & 0\cr
    \vdots & \vdots & \vdots & \ddots & \vdots\cr
    &&&& -I\cr
    L(\lambda) & * & \ldots & * & I\lambda+A_{\ell-1}\cr
}\right|
$$

also

$$
    \left|I\lambda-C_1\right| = \det L(\lambda).
$$

9. Satz:  Voraussetzung: Es habe
$L(\lambda)=\lambda^\ell+a_{\ell-1}\lambda^{\ell-1}+\cdots+a_0 \in \mathbb{C}$ die
Faktorisierung

$$
    L(\lambda) = (\lambda-\mu_1)^{\eta_1} (\lambda-\mu_2)^{\eta_2}
        \ldots (\lambda-\mu_k)^{\eta_k}.
$$

Behauptung: Der Lösungsraum der homogenen, linearen Differenzengleichung
$a_{m+\ell}+a_{\ell-1}x_{m+\ell-1}+\cdots+a_0x_m=0$ hat die Dimension $\ell$
und wird aufgespannt von

$$
    x_m = \sum_{\nu=1}^k p_\nu(m) \mu_\nu^m, \qquad m=0,1,\ldots,
$$

wobei $\mathop{\rm grad} p_\nu=\eta_\nu-1$, $\nu=1,\ldots,k$.
Der Fall $\mathop{\rm grad} p_\nu=0$ bedeutet dabei Konstante.
Beweis:  Sei $u_m:=(x_{m-1+\ell},\ldots,x_m)\in\mathbb{C}^\ell$.
Die Lösung der Differenzengleichung $L(E)x_m=0$ lautet
$u_m = C_1^m u_0 = X J^m Y u_0$, wobei $Y=X^{-1}$ die Matrix der
Linksjordanvektoren und $X$ die Matrix der Rechtsjordanvektoren ist.
Die Multiplikation von links mit $X$ und von rechts mit $Y$ bewirkt eine
Vermischung der einzelnen Jordankästchen.
Nach Ausklammern von gemeinsamen Faktoren stehen vor $\mu_\nu$
Summen von Binomialkoeffizienten $m\choose\rho_\nu$, $0\le\rho_\nu<\eta_\nu$,
$\nu=1,\ldots,k$, also Polynome in $m$.
Da $C_1$ stets nicht-derogatorisch ist
-- betrachte Minor $(C_1)_{1,\ldots,n-1}^{2,\ldots,n}$ --
beträgt der Grad von $p_\nu$
genau $\eta_\nu-1$, wegen $\mathop{\rm grad}{m\choose\eta_\nu-1}=\eta_\nu-1$.
Aufgrund von $\sum\eta_\nu=\ell$ hat man insgesamt $\ell$ freie Parameter.
Noch zu zeigen: die lineare Unabhängigkeit der angegebenen Lösung.
    ☐
10. Corollar:  Die Folgen $(m^i{\mskip 3mu}\mu_\nu^m)$, $i=0,\ldots,\eta_\nu-1$,
für $\nu=1,\ldots,\ell$, bilden eine Basis für den Lösungsraum der
Differenzengleichung.
2. Homomorphismus in obere Dreiecksmatrizen
1. Es gibt auch einen anderen Zugang zu holomorphen Matrixfunktionen,
siehe den Artikel der beiden Autoren Yasuhiko Ikebe und Toshiyuki Inagaki,
Ikebe/Inagaki (1986),
"An Elementary Approach to the Functional Calculus for Matrices",
The American Mathematical Monthly, Vol 93, No 3, May 1986,
pp.390--392
Sei $f$ in einer Umgebung von $\{\lambda_1,\ldots,\lambda_r\}$ genügend oft
differenzierbar.
Für ein festes $n\in\mathbb{N}$ setzt man

$$
    f^*(z) := \pmatrix{
        f(z) & f'(z) & f''(z)/2! & \ldots & f^{(n-1)}(z)/(n-1)!\cr
             & f(z)  & f'(z)     & \ldots & \vdots\cr
             &       & \ddots    & \ddots & f''(z)/2!\cr
        0    &       &           & \ddots & f'(z)\cr
             &       &           &        & f(z)\cr
    }
$$

Für $f(z)=z$ ergibt sich

$$
    f^*(z) = \pmatrix{
        \lambda & 1      & \ldots & 0\cr
                & \ddots & \ddots & \vdots\cr
                &        & \ddots & 1\cr
                &        &        & \lambda\cr
    } = J,
$$

d.h. also ein einfacher Jordanblock der Größe $n\times n$ zum Eigenwert $\lambda$.
Mit $J$ sei stets ein solcher Jordanblock gemeint.
Ist $f(z)\equiv c=\rm const$, so ist $f^*(z)=cI$.
Die Abbildung $*\colon f\rightarrow f^*$ ist ein Homomorphismus der Algebra
der analytischen Funktionen in einer Umgebung von $\{\lambda_1,\ldots,\lambda_r\}$
in die kommutative Algebra der oberen Dreiecksmatrizen.
2. Satz:  (Homomorphiesatz) Es gelten
(1)      $(f+g)^* = f^* + g^*$,     Additivität,
(2)      $(cf)^* = c{\mskip 3mu}f^*$, $c\in\mathbb{C}$ fest,     Homogenität,
(3)      $(fg)^* = f^* {\mskip 3mu} g^* = g^* {\mskip 3mu} f^*$,     Multiplikation und Kommutativität,
(4)      $(f/g)^* = f^* {\mskip 3mu} (g^*)^{-1} = (g^*)^{-1}{\mskip 3mu} f^*$, falls $g^*(z)\ne0$,
    Quotientenbildung und Kommutativität,
(5)      $(1/g)^* = (g^*)^{-1}$, falls $g^*(z)\ne0$,     Inversenbildung.
Durch wiederholtes Anwenden von (1), (2) und (4) ergibt sich sofort
3. Corollar:  Sei $f$ eine rationale Funktion ohne Pol in $\lambda$
und sei $f=p/q$ die vollständig gekürzte Darstellung, mit also teilerfremden
Polynomen $p$ und $q$.
Dann gilt

$$
    f^*(\lambda) = p(J){\mskip 3mu} \left[q(J)\right]^{-1} = \left[q(J)\right]^{-1} p(J).
$$

Aber auch für Potenzreihen rechnet man wie erwartet.
Dies zeigt die
4. Folgerung:  Sei $f(\lambda)=a_0+a_1z+\cdots{\mskip 3mu}$ eine Potenzreihe
mit Konvergenzradius echt größer als $\left|\lambda\right|$.
Dann gilt
$
f^*(\lambda) = a_0I+a_1J+\cdots{\mskip 3mu}.
$
Zu einer vorgegebenen festen quadratischen Matrix $A$ sei die (bis auf
Permutation eindeutige) Jordannormalform
$X^{-1}AX=\mathop{\rm diag}\left(J_1,\ldots,J_m\right)$ betrachtet.
Hierbei ist $X$ (Matrix der Rechtsjordanketten) invertierbar.
$J_i$ bezeichnet einen einfachen Jordanblock zum Eigenwert $\mu_i$,
$i=1,\ldots,m$.
Die $\mu_i$ müssen nicht notwendig verschieden sein.
Ist $f$ eine analytische Funktion in der Umgebung von $\{\mu_1,\ldots,\mu_m\}$,
so definiert man $f(A)$ durch

$$
    X^{-1} f(A) X := \mathop{\rm diag}\left[f^*(\mu_1), \ldots, f^*(\mu_m)\right].
$$

Das Corollar und die Folgerung zeigen, daß $f(A)$ übereinstimmt mit dem,
was man gängigerweise erwartet, zumindestens für rationale Funktionen und
für Potenzreihen.
Direkt aus der Definition folgt
5. Satz: Identitätssatz.
Seien $\lambda_1,\ldots,\lambda_r$ die verschiedenen Eigenwerte von $A$.
Die Funktionen $f$ und $g$ seinen analytisch in einer Umgebung von
$\{\lambda_1,\ldots,\lambda_r\}$.
Dann gilt Gleichheit $f(A)=g(A)$ genau dann, wenn die Ableitungen an den
Eigenwerten bis zu entsprechender Ordnung übereinstimmen, also

$$
    f^{(i)}(\lambda_k) = g^{(i)}(\lambda_k),\qquad i=0,\ldots,m_k-1,\quad k=1,\ldots,r,
$$

wobei $m_k$ die Größe des größten Jordanblockes zum Eigenwert $\lambda_k$
bezeichnet.
Die oben als Definition für $f(A)$ benutzte Integralformel lässt sich nun,
da Funktionen von Matrizen jetzt anders definiert wurden, auch beweisen.
6. Satz: Integraldarstellung für $f(A)$.
Sei $\Gamma$ eine einfache geschlossene Kurve, die in ihrem Inneren die
sämtlichen Eigenwerte von $A$ umschließt.
Sei $f$ holomorph auf $\Gamma$ und im Inneren von $\Gamma$.
Dann gilt

$$
    f(A) = {1\over2\pi i} \int_\Gamma f(\tau) (I\tau-A)^{-1} d\tau.
$$

Beweis:  Wie üblich reduziert sich der Beweis auf die Betrachtung eines
einzelnen Jordanblockes $J$ der Größe $n\times n$.
Man rechnet

$$
\def\fracstrut{}
\eqalignno{
    f(J) &= f^*(\lambda_k) \qquad\hbox{(nach Corollar und Folgerung)}\cr
         &= {1\over2\pi i} \int_\Gamma f(\tau) \pmatrix{
            \displaystyle{1\over\tau-\lambda_k} &
                \displaystyle{1\over(\tau-\lambda_k)^2} &
                \ldots & \displaystyle{1\over(\tau-\lambda_k)^n}\fracstrut\cr
            & \ddots & \ddots & \vdots\fracstrut\cr
            0 &       & \ddots & \displaystyle{1\over(\tau-\lambda_k)^2}\fracstrut\cr
            &        &        & \displaystyle{1\over\tau-\lambda_k}\fracstrut\cr} d\tau \cr
         &= {1\over2\pi i} \int_\Gamma f(\tau) (I\tau-J)^{-1} d\tau.\cr
}
$$

Beim Übergang von der ersten Zeile zur zweiten Zeile wurde benutzt
%

$$
    f^{(\nu)}(z) = {\nu!\over2\pi i} \int_\Gamma {f(\tau)\over(\tau-z)^{\nu+1}} d\tau,
    \qquad \nu=0,\ldots,k
$$

und beim Übergang von der $2^{\rm ten}$ zur $3^{\rm ten}$, daß die
Inverse von $I\tau-J$ halt so aussieht.
    ☐
Der vorletzte Satz (Identitätssatz für Matrixfunktionen) zeigt, daß für eine
feste Matrix $A$, die Matrixfunktion als Matrixpolynom darstellbar
ist, da eine Übereinstimmung nur an endlich vielen Ableitungen gefordert ist.
Sind die $m_k$ ($k=1,\ldots,r$) bekannt, so kann für eine feste Matrix
$A\in\mathbb{C}^{n\times n}$ ein Ansatz der Form
$g(\lambda) = a_{n-1}\lambda^{n-1} + \cdots + a_1\lambda + a_0$
gemacht werden, und man erhält die $a_i$ als Lösung einer
Hermiteschen Interpolationsaufgabe.
Sind alle Eigenwerte verschieden, also $m_k=1$ ($k=1,\ldots,r$), so liegt
eine gewöhnliche Interpolationsaufgabe zugrunde.
Die Lösung geschieht beispielsweise mit Newtonschen Differenzen oder
der Lagrangeschen Formel, u.U. auch über die Cramersche Regel.
Zu überprüfen ist, ob $f(\lambda)$ für die Eigenwerte
$\lambda_1,\ldots,\lambda_r$ auch tatsächlich definiert ist.
Probleme treten z.B. auf bei $f(\lambda)=\sqrt\lambda$,
$f(\lambda)=\ln\lambda$, für $\lambda\notin\mathbb{R}^+$.
Für $A\in\mathbb{C}^{1\times1}$ entartet die Aussage des Identitätssatzes in eine
leere Aussage, nämlich $f=g\iff f=g$.
Einfache Folgerungen direkt aus der Definition von Matrizenfunktionen sind
nun die folgenden Ergebnisse.
7. Satz:  Satz von Cayley/Hamilton, 1.te Fassung,
Cayley, Arthur (1821--1895),
Hamilton, William Rowan (1805--1865).
Das charakteristische Polynom $\chi(z)=\det(Iz-A)$ für $A\in\mathbb{C}^{n\times n}$
annulliert als Matrixpolynom aufgefaßt $A$, also es gilt
$\chi(A)=0\in\mathbb{C}^{n\times n}$.
Beweis: Nach Charles A. McCarthy (1975):
"The Cayley-Hamilton Theorem",
The American Mathematical Monthly, April 1975, Vol 82, No 4,
pp.390--391.
Die Inverse von $Iz-A$ ist
$\left[\det(Iz-A)\right]^{-1} M_{\mu\nu}(z)$, wobei $\mathop{\rm grad} M_{\mu\nu}\le n-1$.
Die Integraldarstellung von $\chi(A)$ liefert

$$
      \left.\chi(A)\right|_{\mu\nu}
    = {1\over2\pi i} \int_\Gamma \chi(z) (Iz-A)^{-1}{\mskip 3mu}dz
    = {1\over2\pi i} \int_\Gamma \det(Iz-A)
        \left[\det(Iz-A)\right]^{-1} M_{\mu\nu}(z){\mskip 3mu}dz
    = 0,
$$

nach dem Cauchyschen Integralsatz ($\int_\Gamma f=0$, $f$ holomorph).
    ☐
8. Definition:  Zu einer Matrix $A\in\mathbb{C}^{n\times n}$ mit
Eigenwerten $\lambda_1,\ldots,\lambda_r$ ($r\le n$) und Jordannormalform
$A\sim\mathop{\rm diag}(J_1,\ldots,J_s)$ ($r\le s\le n$) heißt

$$
    \hat\chi(z) = (z-\lambda_1)^{m_1}\cdot\ldots\cdot(z-\lambda_r)^{m_r}
$$

das Minimalpolynom zu $A$.
Hierbei ist $m_\nu$ die Ordnung des größten Jordanblocks zum
Eigenwert $\lambda_\nu$.
Zu $A={1{\mskip 3mu}0\choose0{\mskip 3mu}2}$ ist $\hat\chi(z)=(z-1)(z-2)$, zu $A=I$ ist
$\hat\chi(z)=z-1$ unabhängig von $n$, und zu $A=\mathop{\rm diag}[{1{\mskip 3mu}1\choose0{\mskip 3mu}1},1,1,1]$
ist $\hat\chi(z)=(z-1)^2$ das Minimalpolynom.
9. Ähnliche Matrizen haben die gleiche Jordannormalform bis auf
Umnumerierung von Jordanblöcken, daher das gleiche Minimalpolynom und auch
das gleiche charakteristische Polynom.
Offensichtlich verschwindet jeder Faktor $(J-\lambda_\nu I)^k$
($\forall k\ge m_\nu$) für jeden Jordanblock $J$ zum Eigenwert $\lambda_\nu$,
also $\hat\chi(A)=0\in\mathbb{C}^{n\times n}$, aber $(J-\lambda_\nu I)^k\ne0$
($\forall k
10. Satz:  Satz von Cayley/Hamilton, 2.te Fassung,
Cayley, Arthur (1821--1895),
Hamilton, William Rowan (1805--1865).
$\chi(A)=0\in\mathbb{C}^{n\times n}$.
In Worten: Die Matrix $A$ annulliert ihr eigenes charakteristisches Polynom.

		


Stetigkeit der Eigenwerte in Abhängigkeit der Matrixkomponenten
Mon, 05 Feb 2024 11:15:00 +0100

Die Eigenwerte einer Matrix hängen stetig von den Komponenten der Matrix ab.
Dies soll hier bewiesen werden.
Man kann sogar noch weitere Abhängigkeitssätze beweisen, jedoch werden die
Begründungen dann länger, siehe
das Buch von Gohberg/Lancaster/Rodman (1982),
Autoren sind Gohberg, Izrael' TSudikovich,
Lancaster, Peter
und Rodman, Leiba.
1. Satz:  Satz von Rouché,
Rouché, Eugéne (1832--1910).
Voraussetzung: $f$ und $g$ seien meromorph; $Z_f,Z_g$ und $P_f,P_g$ seien
die Anzahl der Nullstellen bzw. Pole von $f,g$ innerhalb $\Gamma$,
entsprechend ihrer Vielfachheit.
Behauptung: Gilt
$\mathopen|f+g\mathclose|<\mathopen|f\mathclose|+\mathopen|g\mathclose|<\infty$
auf $\Gamma$, so folgt $Z_f-P_f=Z_g-P_g$ innerhalb von $\Gamma$.
Beweis:  Nach Conway, John B., Conway (1978)
"Functions of One Complex Variable",
Springer-Verlag, New York Heidelberg Berlin, Second Edition, 1978, xiii+317 S.
und
Irving Leonard Glicksberg:
"A Remark on Rouché's Theorem",
The American Mathematical Monthly, March 1976, Vol 83, No 3, pp.186--187.
Aufgrund der strikten Dreiecksungleichung haben $f$ und $g$ keine Pole oder
Nullstellen auf $\Gamma$.
Weiter ist also

$$
    \left|{f(z)\over g(z)}+1\right| \lt  \left|f(z)\over g(z)\right| + 1,
    \qquad\forall z\in\Gamma.
$$

Die meromorphe Funktion $\lambda=f/g$ bildet $\Gamma$ auf
$\Omega=\mathbb{C}\setminus\left[0,\infty\right[$ ab, da andernfalls für positive
reelle $\lambda(z)$ gelten müsste $\lambda(z)+1<\lambda(z)+1$.
Sei $\ell$ ein Zweig des Logarithmus auf $\Omega$.
$\ell(f/g)$ ist eine Stammfunktion von $(f/g)^{-1}\cdot(f/g)'$.
Somit

$$
    0 = {1\over2\pi i}\int_\Gamma (f/g)^{-1}\cdot(f/g)'
      = {1\over2\pi i}\int_\Gamma {f'\over f} - {g'\over g}
      = (Z_f-P_f) - (Z_g-P_g).
$$

    ☐
Bei mehrfacher Umlaufung von $\Gamma$ ist die Aussage entsprechend zu
modifizieren.
Nach Glicksberg (1976)
gilt der Sachverhalt allgemeiner in kommutativen,
halbeinfachen Banachalgebren mit Einselement.
Bekannter ist die schwächere Aussage: Aus
$\mathopen|f+g\mathclose|<\mathopen|f\mathclose|<\infty$ auf $\Gamma$, folgt
$Z_f=Z_g$ innerhalb $\Gamma$.
2. Beispiel:  Für $p(z)=z^n+a_1z^{n-1}+\cdots+a_n$ gilt

$$
    {p(z)\over z^n} = 1 + {a_1\over z} + \cdots + {a_n\over z^n}
    \longrightarrow 1 \quad(\mathopen|z\mathclose|\to\infty).
$$

Also

$$
    \left|{p(z)\over z^n}-1\right| \lt  1, \qquad\hbox{oder}\qquad
    \left|p(z)-z^n\right| \lt  \left|z^n\right|,
$$

für $\mathopen|z\mathclose|\ge R$, $R$ geeignet groß.
Der Satz von Rouché sagt, daß die Polynome $p(z)$ und $z^n$
gleichviele Nullstellen innerhalb der Kreisscheibe mit Radius $R$
haben.
Dies ist der Fundamentalsatz der Algebra.
Der nächste Satz besagt: Wenn sich die Koeffizienten zweier Polynome
wenig unterscheiden, so differieren auch die Nullstellen nur wenig.
Erinnert sei daran, daß eine Implikation wahr sein kann, falls die Prämisse
falsch ist.
3. Satz:  (Stetigkeit der Wurzeln von Polynomen)
Voraussetzungen: Es seien
$p(\lambda):=\lambda^n+a_{n-1}\lambda^{n-1}+\cdots+a_1\lambda+a_0$ und
$q(\mu):=\mu^n+b_{n-1}\mu^{n-1}+\cdots+b_1\mu+b_0$ zwei
komplexe Polynome mit den Nullstellen $\lambda_1,\ldots,\lambda_n$ für $p$
und $\mu_1,\ldots,\mu_n$ für $q$.
Die Koeffizienten $a_i$ und $b_i$ sind beliebige komplexe Zahlen.
Behauptung: $\forall\varepsilon>0: \exists\delta>0:\mskip 5mu$
$\left|a_i-b_i\right|<\delta{\mskip 3mu}\Longrightarrow{\mskip 3mu}\left|\lambda_i-\mu_i\right|<
\varepsilon$, bei geeigneter Numerierung der Nullstellen
$\lambda_i$ und $\mu_i$.
Beweis:  Nach Ortega, James McDonough,
Ortega (1972):
"Numerical Analysis---A Second Course",
Academic Press, New York and London, 1972, xiii+201 S.
Es seien $\gamma_1,\ldots,\gamma_k$ ($k\ge1$) die verschiedenen Wurzeln
von $p$.
Sei $\varepsilon$ kleiner gewählt als der kleinste halbe Abstand zwischen
allen verschiedenen Nullstellen, also

$$
    0\lt \varepsilon\lt {1\over2}\left|\gamma_i-\gamma_j\right|, \qquad
    \hbox{für}\quad i,j=1,\ldots,k \quad i\ne j.
$$

Um $\gamma_i$ seien Scheiben $D_i$ mit Radius kleiner $\varepsilon$ gelegt, also

$$
    D_i := \left\{z: \left|z-\gamma_i\right|\le\varepsilon\right\}, \qquad
    \hbox{für}\quad i=1,\ldots,k \quad (k\ge1)
$$

$p$ verschwindet auf keiner der $k$ Scheibenränder, also $p(z)\ne0$,
$\forall z\in\partial D_i$, $\forall i=1,\ldots,k$.
Aufgrund der Stetigkeit von $p$ und der Kompaktheit der Ränder, nimmt $p$
jeweils das Minimum und Maximum an.
Es gibt also Zahlen $m_i$ [$i=1,\ldots,k$, die Minima halt], sodaß

$$
    \left|p(z)\right|\ge m_i, \qquad\hbox{für}\quad
        \forall z\in\partial D_i,{\mskip 3mu}\forall i=1,\ldots,k.
$$

Weiter sei

$$
    M_i := \max_{z\in\partial D_i}
        \left\{\left|z^{n-1}\right|+\cdots+\left|z\right|+1\right\}
$$

das Maximum von Polynom“resten” auf den jeweiligen Scheibenrändern und sei
nun $\delta$ so klein gewählt, daß

$$
    \left|p(z) - q(z)\right| \le \delta M_i, \qquad\forall z\in\partial D_i,
    \quad i=1,\ldots,k.
$$

Der obige Satz von Rouché ist nun anwendbar
und sagt, daß $p$ und $q$ auf den vollen Scheiben die gleiche Anzahl von
Nullstellen besitzen.
M.a.W. die Nullstellen sind also nicht “weggelaufen”, sondern haben sich
nur jeweils innerhalb der Scheiben bewegt.
    ☐
Der Satz sagt nicht, daß die Wurzeln reell bleiben, sofern sie reell waren,
bei Variation der Koeffizienten.
Eine solche Aussage gilt so nicht.
Hierzu bräuchte man stärkere Voraussetzungen.
4. Corollar:  Die Eigenwerte einer Matrix hängen stetig von
sämtlichen Matrixelementen ab.
Beweis:  Die Eigenwerte der Matrix sind die Nullstellen des
charakteristischen Polynomes.
Die Koeffizienten des charakteristischen Polynoms hängen als
Determinantenfunktion stetig von den Matrixelementen ab.
Die Verkettung stetiger Funktionen ist wiederum stetig.
    ☐
Das obige Corollar gilt nicht unbedingt für die Eigenvektoren.
5. Beispiel: Siehe
Ortega (1972):
Die Matrix nach J.W. Givens

$$
    A(\varepsilon) := \pmatrix{
        1+\varepsilon\cos{2\over\varepsilon} & -\varepsilon\sin{2\over\varepsilon}\cr
        -\varepsilon\sin{2\over\varepsilon} & 1-\varepsilon\cos{2\over\varepsilon}\cr
    }, \qquad\quad\varepsilon\ne0,
$$

hat die Eigenwerte $1\pm\varepsilon$ und die beiden Eigenvektoren

$$
    \left(\sin{1\over\varepsilon},{\mskip 3mu}\cos{1\over\varepsilon}\right)^\top,\qquad\qquad
    \left(\cos{1\over\varepsilon}, -\sin{1\over\varepsilon}\right)^\top,
$$

welche offensichtlich gegen keinerlei Grenzwert streben ($\varepsilon\to0$),
jedoch $A(\varepsilon)\to{1{\mskip 3mu}0\choose 0{\mskip 3mu}1}$ und dies obwohl die Eigenräume
jeweils eindimensional und gut separiert sind.
6. Folgerung:  Der Nullstellengrad eines Polynomes ist lokal
konstant.
Als ein Teilergebnis für Eigenvektoren erhält man
7. Satz:  Voraussetzungen: Sei $\lambda$ ein einfacher
Eigenwert von $A\in\mathbb{C}^{n\times n}$ und $x\ne0$ der zu $\lambda$ gehörige
Eigenvektor.
Weiter sei $E_\nu\in\mathbb{C}^{n\times n}$ beliebig aber derart, daß
$\lambda(E_\nu)\to\lambda$, falls $E_\nu\to0$, wobei $\lambda(E_\nu)$ ein
zu $A+E_\nu$ korrespondierender Eigenwert ist.
Die $\left|E_\nu\right|$ seien so klein, daß $\lambda(E_\nu)$ ebenfalls
einfacher Eigenwert ist und $A+E_\nu-\lambda(E_\nu)I$ den Rang $(n-1)$ hat,
für alle $\nu$.
Behauptung: $\def\mapright#1{\mathop{\longrightarrow}\limits^{#1}}\displaystyle\lambda(E_\nu)\mapright{\nu\to\infty}\lambda$
und $\displaystyle x(E_\nu)\mapright{\nu\to\infty}x$, falls
$\displaystyle E_\nu\to0$.
Beweis:  Weil $\lambda$ einfacher Eigenwert ist, folgt durch Betrachtung
einer Jordannormalform von $A$, daß $A-\lambda I$ den Rang $(n-1)$ hat.
Somit gibt es Indizes $i$ und $j$, sodaß

$$
    \sum_{m\ne j} \left(a_{km} - \lambda\delta_{km}\right) x_m =
    \left(a_{kj} - \lambda\delta_{kj}\right) x_j, \qquad k\ne i.
$$

($\delta_{km}$ Kronecker-Delta)
Die Koeffizienten Matrix vor $x_m$ ist invertierbar.
Sei o.B.d.A. angenommen $x_j=1$,

$$
    \begin{pmatrix}
        & & & j\downarrow & & & \cr
        & * & * & & & & \cr
        & * & * & & & & \cr
        k\rightarrow& & & \lambda & & & \cr
        & & & & * & * & *\cr
        & & & & * & * & *\cr
        & & & & * & * & *\cr
    \end{pmatrix}
$$

Sei nun $\lambda(E_\nu)$ der Eigenwert von $A+E_\nu$, sodaß
$\lambda(E_\nu)\to\lambda$, für $E_\nu\to0$; man beachte hier die stetige
Abhängigkeit nach obigen Satz.
Nach der Folgerung ist die Nullstellenordnung lokal konstant.
Nun ist die Matrix $A+E-\lambda(E_\nu)I$ nach Streichen der $i$-ten Zeile
und $j$-ten Spalte ebenfalls eine invertierbare $(n-1)\times(n-1)$ Matrix.
Somit besitzt das lineare Gleichungssytem

$$
    \sum_{m\ne j} \left(a_{km} - e_{km} - \lambda(E_\nu)\delta_{km}\right) x_m(E_\nu)
    = \left(a_{kj} + e_{kj} -\lambda(E_\nu)\delta_{kj}\right), \qquad k\ne i
$$

genau eine Lösung $x_m(E_\nu)$ ($m\ne j$).
Diese eindeutig bestimmte Lösung ist eine stetige Funktion in Abhängigkeit
von $E_\nu$ (Cramersche Regel).
    ☐
Wenn also die Folge der Matrizen $(E_\nu)$ so beschaffen ist, daß
$A+E_\nu-\lambda_\nu I$ stets den Rang $(n-1)$ hat, so überträgt sich die
stetige Abhängigkeit der Eigenwerte von den Matrixelementen auf eine stetige
Abhängigkeit der Eigenvektoren von den Matrixelementen.
Falls $(E_\nu)$ nicht der obigen Rangeinschränkung unterliegt, so liefert
der Satz keine Information.

		


Die Spur einer Matrix
Sun, 04 Feb 2024 11:00:00 +0100

1. Die Spur (engl./franz.: trace) einer Matrix $A\in\mathbb{C}^{n\times n}$
ist definiert zu $\def\tr{\mathop{\rm tr}}\tr A=a_{11}+\cdots+a_{nn}$, somit die Summe der
Hauptdiagonalelemente.
Durch elementare Rechnung zeigt man $\tr AB=\tr BA$, für zwei beliebige
Matrizen $A\in\mathbb{C}^{n\times m}$, $B\in\mathbb{C}^{m\times n}$.
$A$ und $B$ brauchen nicht zu kommutieren oder quadratisch sein.
Insbesondere gilt $\def\adj#1{#1^*}\adj ab=\tr b\adj a$, für zwei beliebige
Vektoren $a,b\in\mathbb{C}^n$.
$\tr\adj AB$ ist das Skalarprodukt für zwei quadratische Matrizen
$A,B\in\mathbb{C}^{n\times n}$.
Deswegen gilt: $\forall B:\tr\adj AB=0$ $\Rightarrow$ $A=0$
(Nichtausgeartetheit des Skalarproduktes/Anisotropie).
Aus dem Rieszschen Darstellungssatz,
Riesz, Friedrich (1880--1956),
folgt die Äquivalenz: $g$ ist eine Linearform genau dann, wenn
$\exists B:$ $g=\tr BA$ für alle $A$.
Weiterhin gilt
2. Satz:  Die folgenden beiden Aussagen sind äquivalent:
(1) $g\colon\mathbb{C}^{n\times n}\to\mathbb{C}$ ist (komplexes) Vielfaches der
Spurfunktion.
(2) $g\colon\mathbb{C}^{n\times n}\to\mathbb{C}$ ist eine Linearform, also
$g(\lambda A+\mu B)=\lambda g(A)+\mu g(B)$ und es gilt $g(AB)=g(BA)$, für
alle $\lambda,\mu\in\mathbb{C}$ und alle $A,B\in\mathbb{C}^{n\times n}$.
Beweis: “(1)$\Rightarrow$(2)”: Dies sind einfache Rechenregeln für die
Spurfunktion.
“(2)$\Rightarrow$(1)”: siehe Nicolas Bourbaki (1970)*1970+2A:
"Éléments de mathématique: Algèebre", Hermann, Paris, 1970, 167+210+258S. = 635S.
Für $n=1$ ist dies klar.
Für $n\ge2$ sei $A=E_{ij}$ und $B=E_{jk}$ mit $i\ne k$.
Hierbei ist $E_{\rho\tau}$ diejenige Matrix, welche an der Stelle $(\rho,\tau)$
eine 1 enthält und sonst nur Nullen.
Für derartige Matrizen bestätigt man leicht $E_{ik} E_{j\ell} = 0$, falls
$k\ne j$ und $E_{ik} E_{k\ell} = E_{i\ell}$.
Damit gilt $g(E_{ik})=0$ $(i\ne k)$ und mit $A=E_{ij}$ und $B=E_{ji}$ ergibt
sich $g(E_{ii})=g(E_{jj})$.
Da die $E_{\rho\tau}$ eine Basis von $\mathbb{C}^{n\times n}$ bilden, folgt
$g(A)=\lambda\tr A$ $\forall A$, mit geeignetem, festem $\lambda$.
    ☐
Der Satz zeigt, daß es Linearformen auf der Algebra $\mathbb{C}^{n\times n}$, die
gegenüber Vertauschungen invariant sind, nicht viele gibt.
Durch Normierung, etwa $g(E_{11})=1$ oder $g(I)=n$, ist die Spurfunktion
eindeutig bestimmt.
3. Lemma:  $\forall C,D\in\mathbb{C}^{n\times n}$:
$\mathop{\rm Re}\nolimits \tr CD\le{1\over2}\left(\tr C\adj C+\tr D\adj D\right)$.
Beweis: Siehe Sha, Hu-yun (1986):
"Estimation of the Eigenvalues of $AB$ for $A>0$, $B>0$", Linear Algebra and Its Applications, Vol 73,
January 1986, pp.147--150.
Es ist $\mathop{\rm Re}\nolimits \tr CD=\mathop{\rm Re}\nolimits \sum_{i,k}c_{ik}d_{ki}={1\over2}\sum_{i,k}\bigl(
c_{ik}d_{ki}+\overline{c_{ik}d_{ki}}\bigr)$, und weiter ist
${1\over2}\bigl(\tr C\adj C+\tr D\adj D\bigr)={1\over2}\sum_{i,k}\bigl(
c_{ik}\overline{c_{ik}}+d_{ik}\overline{d_{ik}}\bigr)=
{1\over2}\sum_{i,k}\bigl(c_{ik}\overline{c_{ik}}+d_{ki}\overline{d_{ki}}\bigr)$.
In abkürzender Schreibweise sei $c_{ik}=e+fi$ und $d_{ki}=g+hi$.
Damit hat man

$$
\eqalignno{
    c_{ik}d_{ki}+\overline{c_{ik}d_{ki}} &= (e+fi)(g+hi)+(e-fi)(g-hi)
        = 2eg-2fh,\cr
    c_{ik}\overline{c_{ik}}+d_{ki}\overline{d_{ki}} &= (e+fi)(e-fi)+(g+hi)(g-hi)
        = e^2+f^2+g^2+h^2,\cr
}
$$

also $c_{ik}d_{ki}+\overline{c_{ik}d_{ki}} \ge
c_{ik}\overline{c_{ik}}+d_{ki}\overline{d_{ki}}$, somit
${1\over2}\sum_{i,k}\left(c_{ik}d_{ki}+\overline{c_{ik}d_{ki}}\right) \ge
{1\over2}\sum_{i,k}\left(c_{ik}\overline{c_{ik}}+d_{ki}\overline{d_{ki}}\right)$.
    ☐
Ist eine hermitesche Matrix $A$ invertierbar, so ist die Inverse $A^{-1}$
ebenfalls hermitesch, da $AB=I=\adj B\adj A=\adj BA=A\adj B$, also
$B=\adj B$, weil eine invertierbare Matrix stets mit seiner Inversen
kommutiert.
Genauso gilt: die Inverse eine normalen Matrix ist normal.
($A=UD\adj U\Rightarrow A^{-1}=(UD\adj A)^{-1}=(\adj U)^{-1} D^{-1} U^{-1}
=UD^{-1}\adj U$.)
Daraus ergibt sich sofort: die Inverse einer positiv definiten Matrix ist
wieder positiv definit.
Entsprechend ist die Inverse einer negativ definiten Matrix selbst wieder
negativ definit.
Es zeigt sich nun, daß das Produkt zweier positiv definiter Matrizen
zumindestens wieder positve Eigenwerte besitzt.
4. Satz:  Voraussetzungen: Es seien $A\succ0$, $B\succ0$ zwei
positiv definite (hermitesche) Matrizen aus $\mathbb{C}^{n\times n}$ mit
Eigenwerten $0<\mu_1\le\cdots\le\mu_n$
bzw. $0<\nu_1\le\cdots\le\nu_n$.
Behauptung: (1) $AB$ hat nur positive reelle Eigenwerte
$0<\lambda_1\le\cdots\le\lambda_n$.
(2)      $\displaystyle{{2\over\sum_i\mu_i^{-2}+\sum_i\nu_i^{-2}} \le \tr AB \le
{1\over2}\left(\sum_i\mu_i^2+\sum_i\nu_i^2\right)}.$
Da alle Eigenwerte $\lambda_i$ von $AB$ echt positiv sind, gilt insbesondere
als Vergröberung

$$
    {2\over n}{\mu_1^2 \nu_1^2 \over \mu_1^2 + \nu_1^2} \lt  \lambda_i \lt 
    {n\over2} \left(\mu_n^2 + \nu_n^2\right).
$$

Beweis: Siehe Sha, Hu-yun (1986):
Zu $A$ existiert $P$ mit $A=P\adj P$.
Wegen $B\succ0$ also $P^{-1}B(\adj P)^{-1}\succ0$, daher existiert eine
unitäre Matrix $U$, sodaß

$$
    P^{-1}B(\adj P)^{-1}=U\mathop{\rm diag}(x_1,\ldots,x_n)\adj U,
$$

mit entsprechenden Eigenwerten $x_i>0$.
Nun ist

$$
\eqalign{
    0 \lt  x_1+\cdots+x_n &= \tr P^{-1}B(\adj P)^{-1} \cr
                       &=\tr(\adj P)^{-1}P^{-1}B \cr
                       &= \tr AB\le{1\over2}\left(\tr A\adj A+\tr B\adj B\right) \cr
                       & ={1\over2}\left( \sum_i\mu_i^2+\sum_i\nu_i^2\right). \cr
}
$$

Die $x_i$ sind die Eigenwerte von $AB$, da

$$
\eqalign{
    \left|\lambda I-AB\right| &= \left|A\right| \left|\lambda A^{-1}-B\right| \cr
    &= \left|A\right| \bigl|\lambda P\adj P-PU\mathop{\rm diag}(x_1,\ldots,x_n)\adj{(PU)}\bigr| \cr
    &=\left|A\right| \left|PU\right| \left|\mathop{\rm diag}(\lambda-x_1,\ldots,\lambda-x_n)\right| \bigl|(PU)^\top\bigr|. \cr
}
$$

Nach dem selben Muster setzt man $B=Q\adj Q$, $Q^{-1}A^{-1}(\adj Q)^{-1}=
V\mathop{\rm diag}(y_1,\ldots,y_n)\adj V$.
Also

$$
\eqalign{
    0\lt y_1+\cdots+y_n &= \tr Q^{-1}A^{-1}(\adj Q)^{-1} \cr
    &= \tr(\adj Q)^{-1}Q^{-1}A^{-1}=\tr B^{-1}A^{-1}\le {1\over2}\tr A^{-1}\adj{(A^{-1})}+\tr B^{-1}\adj{(B^{-1})} \cr
    &= {1\over2}\left(\sum_i \mu_i^{-2} + \sum_i \nu_i^{-2}\right).
}
$$

Die $y_i$ sind zugleich Eigenwerte von $(AB)^{-1}$, denn

$$
\eqalign{
    \left|\lambda I-AB\right| &= \left|A\right| {\mskip 3mu} \left|\lambda A^{-1}-B\right| \cr
    &= \left|A\right| {\mskip 3mu} \bigl|\lambda QV\mathop{\rm diag}(y_1,\ldots,y_n)\adj{(QV)} - Q\adj Q\bigr| \cr
    &= \left|A\right| {\mskip 3mu} \left|QV\right| {\mskip 3mu}
        \left|\mathop{\rm diag}(\lambda y_1-1,\ldots,\lambda y_n-1)\right|
        {\mskip 3mu} \bigl|\adj{(QV)}\bigr|. \cr
}
$$

    ☐
5. Beispiel:  Für $A={1{\mskip 3mu}0\choose0{\mskip 3mu}3}$, $B={2,{\mskip 3mu}-1\choose-1,{\mskip 3mu}2}$,
$AB={2,{\mskip 3mu}-1\choose-3,{\mskip 3mu}6}$ lauten die Eigenwerte $1\le3$, $3\le5$ und $3\le5$,
insbesondere muß $AB$ nicht hermitesch sein.

		


Hermitesche, unitäre und normale Matrizen
Sat, 03 Feb 2024 14:40:00 +0100

Hermitesche Matrizen $(\def\adj#1{#1^*}\adj A=A)$, unitäre Matrizen $(\adj A=A^{-1})$
und normale Matrizen $(\adj AA=A\adj A)$ lassen sich unitär
diagonalisieren.
Dies ist das zentrale Ergebnis dieses Abschnittes.
Während die Jordansche Normalform für jede komplexe Matrix eine
Fast-Diagonalgestalt ermöglicht [genauer $(0,1)$-Bandmatrixform mit
Eigenwerten als Diagonalelementen], so erlaubt das nachfolgende Lemma von
Schur eine Triagonalgestalt, allerdings auf vollständig unitärer Basis.
Genau wie die Jordansche Normalform, gilt die Schursche Normalform nicht
für reelle Matrizen in reeller Form, falls das charakteristische Polynom
über $\mathbb{R}$ nicht zerfällt.
Es entstehen dann $(2\times2)$ reelle Blöcke.
Doch spielen hier und im weiteren reelle Matrizen keine bedeutende Rolle.
1. Satz:  Satz über eine Schursche Normalform,
Schur, Issai (10.01.1875--10.01.1941).
$\forall A\in\mathbb{C}^{n\times n}$: $\exists U$ unitär:

$$
    \adj UAU=
    \pmatrix{\lambda_1&*&\ldots&*\cr &\lambda_2&&*\cr &&\ddots&\vdots\cr 0&&&\lambda_n},
$$

mit $\lambda_i$ Eigenwerte von $A$.
Beweis:  Sei $\lambda_1$ Eigenwert von $A$ und $x_1$ normierter zugehöriger
Eigenvektor $\def\iadj#1{#1^*}\iadj x1 x_1 = 1$.
Es existieren linear unabhängige, paarweise unitäre (orthogonale)
$y_2,\ldots,y_n\in\mathbb{C}^n$, sodaß $X_1:=(x_1,y_2,\ldots,y_n)$ unitär ist
(Basisergänzungssatz, Schmidtsches Orthogonalisierungsverfahren).
Schmidt, Erhard (1876--1959).
Also $\iadj X1 X_1 = I$, somit $\iadj x1 y_i = 0$ $(i=2,\ldots,n)$, daher

$$
    \iadj X1 A X_1 = \pmatrix{\iadj x1\cr \iadj y2\cr \vdots\cr \iadj yn\cr}
        (Ax_1, Ay_2, \ldots, Ay_n)
    = \pmatrix{\lambda_1&*&\ldots&*\cr 0&&&\cr \vdots&&A_1&\cr 0&&&\cr}.
$$

$A_1\in\mathbb{C}^{(n-1)\times(n-1)}$ enthält außer $\lambda_1$, aufgrund der
Ähnlichkeitstransformation, genau die gleichen Eigenwerte wie $A$.
Man verfährt jetzt erneut wie oben: Zum Eigenwert $\lambda_2$ von $A_1$
(und auch $A$) gehört ein normierter Eigenvektor $x_2$,
$A x_2 = \lambda_2 x_2$, mit $\iadj x2 x_2 = 1$.
Man ergänzt wieder zu einem paarweise orthogonalen Vektorsystem
$x_2,z_3,\ldots,z_n\in\mathbb{C}^{n-1}$, entsprechend

$$
    X_2 := \pmatrix{1&0&0&\ldots&0\cr 0&x_2&z_3&\ldots&z_n\cr} \in \mathbb{C}^{n\times n}
$$

und somit

$$
    \iadj{X_2}\iadj{X_1}A X_1 X_2 = \pmatrix{
        \lambda_1 & * & * & \ldots & *\cr
        0 & \lambda_2 & * & \ldots & *\cr
        0 & 0 &&&\cr
        \vdots & \vdots && A_3 &\cr
        0 & 0 &&&\cr
    } .
$$

Da unitäre Matrizen eine multiplikative, nicht-abelsche Gruppe (sogar kompakte
Gruppe) bilden, insbesondere abgeschlossen sind, folgt nach nochmaliger
$(n-2)$-facher Wiederholung die behauptete Darstellung.
    ☐
Aus dem Lemma von Schur folgt übrigens sofort der Dimensionssatz

$$
    A:\mathbb{C}^m\to\mathbb{C}^n, \qquad
    m = \dim\ker A + \dim\mathop{\rm Im} A,
$$

wenn man bei nicht quadratischen Matrizen, $A$ zu einer
quadratischen Matrix aus $\mathbb{C}^{(m\lor n)\times(m\lor n)}$ durch
Nullauffüllung ergänzt.
2. $A$ heißt normal, falls $\adj AA=A\adj A$, also $A$ und
$\adj A$ kommutieren.
Beispielsweise sind hermitesche, schiefhermitesche und (komplexe) Vielfache
unitärer Matrizen normal

$$
    \adj A=A^{-1}{\mskip 5mu}\Rightarrow{\mskip 5mu}\adj AA=I=A\adj A.
$$

“Kleine” und spezielle normale Matrizen lassen sich leicht klassifizieren,
wie man durch elementare Rechnung leicht nachweist.
3. Lemma:  (1) Normale $(2\times2)$ Matrizen sind entweder
hermitesch oder komplexe Vielfache unitärer Matrizen.
(2) Eine Dreiecksmatrix ist genau dann normal, wenn sie eine Diagonalmatrix ist.
Die Art einer Diagonalisierbarkeit bestimmt eindeutig Normalität,
Hermitizität und Unitärheit.
4. Satz:  (1) $A$ normal $\iff$ $A$ unitär diagonalisierbar.
(2) $A$ hermitesch $\iff$ $A$ unitär reell-diagonalisierbar.
(3) $A$ schiefhermitesch $\iff$ $A$ unitär imaginär-diagonalisierbar.
(4) $A$ unitär $\iff$ $A$ unitär unimodular-diagonalisierbar.
Beweis:  zu (1): “$\Rightarrow$”: Anwendung des vorstehenden Lemmas auf
eine Schursche Normalform von $A$.
“$\Leftarrow$”: Mit $A=UD\adj U$, Diagonalmatrix $D$ und unitärem $U$
($\adj UU=I$) rechnet man

$$
\eqalign{
    \adj AA &= U\adj{(DU)}{\mskip 3mu}UD\adj U = U\overline DD\adj U,\cr
    A\adj A &= UD\adj U{\mskip 3mu}U\adj{(DU)} = UD\overline D\adj U.\cr
}
$$

zu (2): “$\Rightarrow$”: $A=\adj A$ $\Rightarrow$ $\adj xAx=\lambda\left
=\adj x\adj Ax=\adj{(Ax)}x=\overline\lambda\left$, also
$\lambda=\overline\lambda$.
“$\Leftarrow$”: $Ax_i=\lambda x_i=\overline\lambda x_i=\adj Ax_i$ $\forall i$,
also stimmen $A$ und $\adj A$ auf einer Eigenbasis $x_1,\ldots,x_n$ überein,
also $A=\adj A$ in jeder Basis.
zu (3): “$\Rightarrow$”: $A=-\adj A$, also $A\adj A=-A^2=\adj AA$, daher $A$ normal.
$\adj xAx=\lambda \adj xx=-\adj x\adj Ax=-\overline\lambda \adj xx$, somit
$\lambda=-\overline\lambda$, folglich $\lambda\in i\mathbb{P}$.
“$\Leftarrow$”: Mit $A=UD\adj U$ und Diagonalmatrix $D=-\adj D$ ist
$-\adj A=-U\overline D\adj U=UD\adj U=A$.
zu (4): “$\Rightarrow$”: Wegen $\adj AA=I$ ist $A$ invertierbar.
Für ein Eigenelement $(\lambda,x)$ zu $A$, also $Ax=\lambda x$, ergibt sich
$\adj Ax=\overline\lambda x=A^{-1}x={1\over\lambda}x$, somit
$\lambda\overline\lambda=1=\left|\lambda\right|$, für unitäre Matrizen $A$
sind sämtliche Eigenwerte daher unimodular.
“$\Leftarrow$”: Eine unimodulare Diagonalmatrix ist unitär.
Unitäre Matrizen bilden eine (nicht-abelsche) Gruppe.
    ☐
Wegen $AX=XD$ ist $X$ die Matrix der Rechtseigenvektoren und $X^{-1}$ wegen
$X^{-1}A=DX^{-1}$ die Matrix der Linkseigenvektoren.
Eine Umformulierung von (1) des Satzes ist: Das Minimalpolynom einer Matrix
besteht genau dann nur aus einfachen Nullstellen, wenn die Matrix normal ist.
Natürlich gilt nicht notwendig, daß diagonalähnliche Matrizen hermitesch,
unitär oder normal sind, wie $B={1{\mskip 3mu}2\choose0{\mskip 3mu}3}$ zeigt
($BB^\top\ne B^\top B$).
Ist $A$ hermitesch, so ist $\adj AA=A^2$ positiv semidefinit und positiv
definit genau dann, wenn $A$ invertierbar ist, da alle Eigenwerte von $A^2$
nichtnegativ (bzw. positiv) sind.
Der Rang einer schiefsymmetrischen Matrix ist wegen
$\left|A\right|=(-1)^n\left|A\right|$, immer gerade.
Dies hätte man auch mit Hilfe von (3) erkennen können, da die Determinante
einer Diagonalmatrix das Produkt der Diagonalelemente ist.
Während die Schursche Normalform eine beliebige Matrix unitär zu triangulieren
vermochte, so kann man sogar jede beliebige Matrix $A$ “unitär-diagonalisieren”,
wenn man darauf verzichtet auf beiden Seiten der Matrix $A$ die gleiche unitäre
Matrix $U$ bzw. $\adj U$ zu verlangen.
5. Proposition:  $\forall A\in\mathbb{C}^{n\times n}$:
$\exists U,V$ unitär: $A=UDV$, mit $D=\mathop{\rm diag}\sqrt{\lambda_i}$,
mit $\lambda_i$ Eigenwerte von $\adj AA$.
Beweis:  Die Matrix $\adj AA$ ist hermitesch, also
$\adj AA = W\hat DW^\top$, mit unitärem $W$ und reeller
Diagonalmatrix $\hat D=\mathop{\rm diag}\lambda_i$.
Es ist

$$
    \lambda_i = e_i^\top \hat D e_i = e_i \adj W \adj A AWe_i =
    \left\|AWe_i\right\|_2^2 \gt  0 .
$$

Setze $D=\mathop{\rm diag}\sqrt{\lambda_i}$.
Dann ist $D^{-1} \adj W \adj A AWD^{-1}=I$, also $U:=AWD^{-1}$ unitär.
$V:=W^{-1}$ ist ebenfalls unitär und es gilt $UDV=AWD^{-1}DW^{-1}=A$.
    ☐
Zur Notation siehe Das äußere Produkt und Determinanten.
Für positiv definite (hermitesche) Matrizen erkennt man auch gleich die
Existenz einer beliebigen Wurzel, also $\root r \of A$.
Insbesondere für eine reelle symmetrische Matrix $A$ mit lauter nicht-negativen
Eigenwerten ($\Longleftrightarrow$ positiv semidefinit) gilt:
$\exists Q:$ $QQ=A$.
Ist $A$ nicht quadratisch, so kann man durch Ergänzen von Nullspalten oder
Nullzeilen quadratische Form erreichen und man erhält
6. Satz:  Singulärwertzerlegung.
$\forall A\in\mathbb{C}^{m\times n}$: $\exists U\in\mathbb{C}^{m\times m}$ unitär,
$V\in\mathbb{C}^{n\times n}$ unitär: $A=UDV$, mit $D\in\mathbb{C}^{m\times n}$:
$D=\mathop{\rm row}(\mathop{\rm diag}\sqrt{\lambda_i},0) \lor D=\mathop{\rm col}(\mathop{\rm diag}\sqrt{\lambda_i},0)$,
mit $\lambda_i$ Eigenwerte von $\adj AA$.
Die Quadratwurzeln der Eigenwerte von $\adj AA$ heißen singuläre Werte,
die Zerlegung $A=UDV$ (w.o.) eine Singulärwertzerlegung.
An ihr liest man die Pseudoinverse unmittelbar ab: $A^+=UD^+V$, wobei
$D^+$ aus $D$ entsteht, indem man alle Nichtnull-Werte invertiert und die
Nullen belässt.
7. Satz:  Hurwitz-Kriterium, Adolf Hurwitz (1859--1919).
Voraussetzungen: $A$ sei hermitesch und $A\succ0$, $A\succeq0$,
$A\prec0$, $A\preceq0$
kennzeichne positive, positive Semi-, negative, negative Semidefinitheit.
$r$ laufe stets von 1 bis $n$ und der Multiindex $i=(i_1,\ldots,i_r)$ sei
stets in natürlicher Reihenfolge angeordnet, also $i_1<\cdots
Behauptung:

$$
\def\multisub#1#2{{\textstyle\mskip-3mu{\scriptstyle1\atop\scriptstyle#2_1}{\scriptstyle2\atop\scriptstyle#2_2}{\scriptstyle\ldots\atop\scriptstyle\ldots}{\scriptstyle#1\atop\scriptstyle#2_#1}}}
\def\multisup#1#2{{\textstyle\mskip-3mu{\scriptstyle#2_1\atop\scriptstyle1}{\scriptstyle#2_2\atop\scriptstyle2}{\scriptstyle\ldots\atop\scriptstyle\ldots}{\scriptstyle#2_{#1}\atop\scriptstyle#1}}}
\def\multisubsup#1#2#3{{\textstyle\mskip-3mu{\scriptstyle#3_1\atop\scriptstyle#2_1}{\scriptstyle#3_2\atop\scriptstyle#2_2}{\scriptstyle\ldots\atop\scriptstyle\ldots}{\scriptstyle#3_{#1}\atop\scriptstyle#2_{#1}}}}
\displaylines{
    A\succ0   \iff A_{1\ldots r}^{1\ldots r}\gt 0
              \iff A_{r\ldots n}^{r\ldots n}\gt 0
              \iff A\multisubsup rii\gt 0, \cr
    A\succeq0 \iff A\multisubsup rii\ge0, \cr
    A\prec0   \iff (-1)^r A_{1\ldots r}^{1\ldots r}\gt 0
              \iff (-1)^{n-r} A_{r\ldots n}^{r\ldots n}\gt 0
              \iff (-1)^r A\multisubsup rii\gt 0, \cr
    A\preceq0 \iff (-1)^r A\multisubsup rii\ge0. \cr
}
$$

Beweis:  Sei $A=XDX^{-1}$ mit othogonalem $X$, also $X^{-1}=X^\top$,
und $D$ sei die Diagonalmatrix der Eigenwerte.
Es ist

$$
    1 = (XX^{-1})_i^i = \sum_\ell X_i^\ell (X^{-1})_\ell^i = \sum_\ell (X_i^\ell)^2,
$$

also können nicht sämtliche $X_i^\ell$ verschwinden.
Aus $A=XDX^{-1}=X(XD)^\top$ folgt

$$
    A_i^i = \sum_\ell X_i^\ell (XD)_i^\ell = \sum_{k,\ell} X_i^\ell X_i^k D_k^\ell
    = \sum_\ell (X_i^\ell)^2 D_\ell^\ell,
$$

da $D_k^\ell=0$, für $k\ne\ell$.
An dieser Darstellung von $A_i^i$ als Summe von Quadraten liest man nun alles
ab.
Für den Fall einer hermiteschen Matrix führt man die Überlegungen genauso mit
unitärem $X$ $(\adj X=X^{-1})$ unter Beachtung von
$\overline{\det C}=\det\overline C$.
    ☐
8. Bemerkung:  Kann man eine Matrix nicht unitär diagonalisieren
oder treten nicht-lineare Elementarteiler auf, so hat man nicht mehr die
Darstellung als Summe von Quadraten (Summe von Beträgen) und man kann dann
nicht mehr so einfach entscheiden, ob alle Eigenwerte positiv oder dergleichen
sind.
Beispielsweise für die Begleitmatrix zu $(\lambda-1)(\lambda-2)(\lambda-3)=
\lambda^3-6\lambda^2+11\lambda-6$ verschwinden die ersten beiden Hauptminoren.
Ist man nur an dem Vorzeichenverhalten einer Form $\adj xAx$ interessiert,
so kann man das Hurwitz-Kriterium anwenden auf die hermitesche Matrix
${1\over2}(\adj A + A)$.
Es gilt zwar $A\succeq0{\mskip 5mu}\Rightarrow{\mskip 5mu}A_{1\ldots r}^{1\ldots r}\ge0 \land a_{ii}\ge0$,
jedoch die Rückrichtung stimmt nicht, wie man erkennt anhand der Matrix

$$
    A = \pmatrix{0&0&0&1\cr 0&0&0&0\cr 0&0&0&0\cr 1&0&0&0\cr},
$$

mit Eigenwerten 0 (zweifach), $(+1)$ und $(-1)$.

		


Elementarsymmetrische Polynome
Wed, 31 Jan 2024 21:00:00 +0100

1. Definition:  Ein Polynom $f(x_1,\ldots,x_n)$ in den Unbestimmten
$x_1,\ldots,x_n$ heißt symmetrisch,
falls das Polynom invariant bleibt unter jeder beliebigen zyklischen
Vertauschung der Unbestimmten.
2. Beispiel:  $f(x_1,x_2)=x_1^2+x_2^2$ oder $f(x_1,x_2)=x_1^3+x_2^3$
sind symmetrische Polynome, da Vertauschungen der Rollen von $x_1$ gegen
$x_2$ nichts ändert.
3. Besonders wichtig sind die sogenannten elementarsymmetrischen Polynome

$$
\def\tr{\mathop{\rm tr}}
\eqalignno{
    s_1 &= x_1 + x_2 + \cdots + x_n,\cr
    s_2 &= x_1x_2 + x_1x_3 + \cdots + x_1x_n + \cdots + x_{n-1}x_n,\cr
    s_3 &= x_1x_2x_3 + x_1x_2x_4 + \cdots + x_{n-2}x_{n-1}x_n,\cr
    \vdots\:  & \qquad\vdots\qquad\vdots\cr
    s_n &= x_1\ldots x_n.\cr
}
$$

Das Polynom $s_i$ heißt $i$-tes elementarsymmetrisches Polynom.
Die $s_i$ üben gewisse Basisfunktionen im Raum der symmetrischen Polynome aus.
4. Satz:  (Hauptsatz über elementarsymmetrische Polynome)
Zu jedem symmetrischen $n$-stelligen Polynom $f(x_1,\ldots,x_n)$ existiert genau ein
Polynom $F(x_1,\ldots,x_n)$, sodaß $f(x_1,\ldots,x_n)=F(s_1,\ldots,s_n)$,
$\forall x_1,\ldots,x_n$.
Beweis: Siehe László Rédei (1967), Algebra I,
László Rédei (15 November 1900 – 21 November 1980).
Existenz: $f(x_1,\ldots,x_n)$ sei bzgl. der Potenzen lexikographisch
geordnet und es sei $q=ax_1^{k_1}\ldots x_n^{k_n}$ der lexikographisch letzte
Term, $k_1\ge\cdots\ge k_n$.
Betrachtet man

$$
    a s_1^{k_1-k_2} s_2^{k_2-k_3} \ldots{\mskip 3mu} s_{n-1}^{k_{n-1}-k_n} s_n^{k_n},
$$

so erkennt man, daß dieser Ausdruck als führenden Koeffizienten den Term

$$
    a x_1^{k_1-k_2} (x_1x_2)^{k_2-k_3} \ldots (x_1\ldots x_{n-1})^{k_{n-1}-k_n}
        (x_1\ldots x_n)^{k_n} \tag{*}
$$

besitzt, welcher offensichtlich gleich $q$ ist.
Also enthält

$$
    f_1(x_1,\ldots,x_n) := f(x_1,\ldots,x_n) - a s_1^{k_1-k_2} \ldots s_n^{k_n}
$$

nur Terme, die lexikographisch vor $q$ kommen, man beachte $(*)$.
$f_1(x_1,\ldots,x_n)$ ist symmetrisch und man wiederholt das Verfahren,
welches irgendwann abbricht, da es nur endlich viele Terme der Form
$b x_1^{\ell_1} \ldots x_n^{\ell_n}$ ($\ell_1\ge\cdots\ge\ell_n$) gibt.
Eindeutigkeit: Ist $f=F_1=F_2$, so ist $F_1-F_2$ identisch gleich Null, also
das Nullpolynom.
    ☐
5. Bemerkung:  $f$ ist symmetrisch, $F$ ist i.d.R. nicht symmetrisch,
wie $x_1^2+x-2^2=s_1^2-2s_2$, oder $x_1^3+x_2^3=s_1^3-3s_1s_2$ zeigen;
$s_1=x_1+x_2$, $s_2=x_1x_2$.
Die Symmetrie von $f$ verlagert sich also in die Symmetrie der Basen der
Polynome.
6. Definition:
Es seien $f(x)=a_0x^m+\cdots+a_m$ und $g(x)=b_0x^n+\cdots+b_n$ zwei Polynome.
Dann nennt man die Determinante

$$
    \def\abc{\phantom{\matrix{\imath_1\cr \imath_1\cr \imath_1\cr}}}
    R = \left|\matrix{
        a_0 & \ldots & a_m\cr
        & \ddots & \ddots & \ddots\cr
        && a_0 & \ldots & a_m\cr
        b_0 & \ldots & b_n\cr
        & \ddots & \ddots & \ddots\cr
        && b_0 & \ldots & b_n\cr
    }\right|
    \eqalign{
        \left.\abc\right\} & \hbox{$n$ Zeilen}\cr
        \left.\abc\right\} & \hbox{$m$ Zeilen}\cr
    }
$$

die Resultante von $f(x)$ und $g(x)$, für
$m,n\ge1$, $a_0\ne0$, $b_0\ne0$.
7. Es sei $u$ eine ^{gemeinsame Nullstelle}, also
$f(u)=0$ und $g(u)=0$.
Dann gilt

$$
\eqalign{
    a_0u^{m+n-1} + \cdots + a_mu^{n-1}\qquad &= 0\cr
    \qquad\ddots\qquad\ddots\qquad\ddots\quad & \phantom{=0}\kern-1pt\vdots\cr
    \qquad\qquad a_0u^m + \cdots + a_m &= 0\cr
    b_0u^{n+m-1} + \cdots + b_nu^m\qquad &= 0\cr
    \qquad\ddots\qquad\ddots\qquad\ddots\quad & \phantom{=0}\kern-1pt\vdots\cr
    \qquad\qquad b_0u^n + \cdots + b_n &= 0\cr
}
$$

Dieses homogene Gleichungssystem hat den nicht-trivialen Lösungsvektor

$$
    \left(u^{n+m-1}, u^{n+m-2}, \ldots, u^2, u, 1\right)^\top \in \mathbb{C}^{n+m}
$$

Daher: falls eine gemeinsame Nullstelle $u$ vorliegt, so ist $R=0$.
Es gilt sogar: wenn $R=0$, dann liegt eine gemeinsame Nullstelle vor.
8. Lemma:  Es sei $d(x)=\mathop{\rm ggT}(f(x),g(x))$, wobei
$\deg f(x)=m\ge1$, $\deg g(x)=n\ge1$.
Dann gilt

$$
    d(x)=\hbox{const}\iff f(x)g_1(x)+g(x)f_1(x)=0\quad\cases{
        \deg f_1(x)\lt m,&$f_1(x)\ne0$,\cr
        \deg g_1(x)\lt n,&$g_1(x)\ne0$.\cr}
$$

Beweis: “$\Rightarrow$”: Offensichtlich ist $f(x)=d(x)f_1(x)$ und
$g(x)=-d(x)g_1(x)$ mit zwei Polynomen $f_1(x)$ und $g_1(x)$ mit allen
oben gewünschten Eigenschaften.
“$\Leftarrow$”: siehe Rédei, L., Rédei (1967).
    ☐
9. Satz:  Für die Resultante $R$ gilt: $f(x)F(x)+g(x)G(x)=R$,
wobei $\deg F(x)
Beweis:  Addiere für $j=1,2,\ldots,m+n-1$ die $j$-te Spalte multipliziert
mit $x^{m+n-j}$ zur letzten ($m$-ten) Spalte von $R$, welche zu

$$
    \left(x^{n-1}f(x), \ldots, f(x), {\mskip 5mu} x^{m-1}g(x), \ldots, g(x)\right)^\top
$$

wird.
Entwickeln nach der letzten Spalte und dann Ausklammern von $f(x)$ bzw. $g(x)$,
liefert die angegebene Darstellung.
    ☐
Mit Hilfe des Lemmas folgt, daß $R$ genau dann verschwindet, falls $f(x)$ und
$g(x)$ einen gemeinsamen Faktor besitzen.
10. Satz:  Ist $f(x)=a_0(x-y_1)\ldots(x-y_m)$ und
$g(x)=b_0(x-z_1)\ldots(x-z_n)$, $m,n\ge1$, so hat man für die Resultante
die drei Darstellungen

$$
    R = a_0^n b_0^m \prod_{1\le k,\ell\le n} %\prod_{\scriptstyle{1\le k\le n}\atop\scriptstyle{1\le\ell\le n}}
        (y_k-z_\ell)
      = a_0^n \prod_{1\le k\le m} g(y_k)
      = (-1)^{mn} b_0^m \prod_{1\le\ell\le n} f(z_\ell).
$$

11. Die Newton-Identitäten.
Newton, Sir Isaac (1643--1727), Urbain Le Verrier (1811--1877).
Zur Matrix $A\in\mathbb{C}^{n\times n}$ mit den Eigenwerten $\lambda_i$, sei
$p_k=\sum\lambda_i^k=\tr A^k$ und das charakteristische Polynom sei
$f(x)=x^n+c_1x^{n-1}+\cdots+c_n=(x-\lambda_1)\ldots(x-\lambda_n)$.
Nach der Produktregel ist

$$
    f'(x) = {f(x)\over x-\lambda_1} + \cdots + {f(x)\over x-\lambda_n},
$$

und durch Polynomdivision verifiziert man

$$
    f(x):(x-\lambda) = x^{n-1} + (\lambda+c_1)x^{n-2} +
        (\lambda^2+c_1\lambda+c_2)x^{n-3} + \cdots +
        (\lambda^{n-1}+c_1\lambda^{n-2}+\cdots+c_{n-1}).
$$

Summation liefert $f'(x)=nx^{n-1}+(p_1+nc_1)x^{n-2}+(p_2+c_1p_1+nc_2)x^{n-3}
+\cdots+(p_{n-1}+c_1p_{n-2}+\cdots+nc_{n-1})$.
Koeffizientenvergleich mit $f'(x)=nx^{n-1}+(n-2)x^{n-2}+\cdots+2c_{n-2}+c_{n-1}$
ergibt

$$
\eqalignno{
    &p_1 + c_1 = 0\cr
    &p_2 + c_1p_1 + 2c_2 = 0\cr
    &\qquad\vdots\qquad\qquad\ddots\cr
    &p_{n-1} + c_1p_{n-2} + \cdots + c_{n-2}p_1 + (n-1)c_{n-1} = 0\cr
}
$$

und $\lambda_1^k f(\lambda_1) + \cdots + \lambda_n^k f(\lambda_n) = 0$
ergibt

$$
    p_{n+k} + c_1p_{n-1+k} + \cdots + c_{n-1}p_{1+k} + nc_n = 0,
    \qquad k=0,1,2,\ldots
$$


		


Das äußere Produkt und Determinanten
Tue, 30 Jan 2024 14:20:00 +0100


1. Das äußere Produkt
2. Definition einer Determinante
3. Eigenschaften einer Determinante
4. Der Laplacesche Entwicklungssatz
5. Weitere Folgerungen aus dem Satz von Cauchy/Binet

1. Das äußere Produkt
Es gibt eine Fülle von Möglichkeiten Determinanten einzuführen.
Ein Weg ist, über das äußere Produkt zu gehen.
Die folgenden Ausführungen erfolgen in enger Anlehnung an das Buch Matrizenrechnung
von Wolfgang Gröbner (1966).
Es sei $K$ ein beliebiger Körper.
Jeden Vektor eines $n$-dimensionalen Vektorraumes über $K$ kann man
darstellen als Linearkombination der Basisvektoren (im weiteren Einheiten
genannt)

$$
\eqalign{
    a &= a_1\varepsilon_1+a_2\varepsilon_2+\cdots+a_n\varepsilon_n,\cr
    b &= b_1\varepsilon_1+b_2\varepsilon_2+\cdots+b_n\varepsilon_n,\cr
}
\qquad a_i, b_i\in K.
$$

Das äußere Produkt (Zeichen $\land$) wird zunächst für die Einheiten
erklärt:

$$
    \varepsilon_i\land\varepsilon_k := \varepsilon_{ik} := \varepsilon_{ki}
$$


$$
    a\land b = \sum a_ib_k(\varepsilon_i\land\varepsilon_k)
    = \sum a_ib_k\varepsilon_{ik} = \sum_{i\lt k} (a_ib_k - a_kb_i)\varepsilon_{ik}
$$


$$
    a\land b=-(b\land a)
$$

insbesondere

$$
\displaylines{
    a\land a=0, \qquad
    (\lambda a)\land b = a\land(\lambda b) = \lambda\cdot(a\land b),\cr
    a\land(b+c) = (a\land b)+(a\land c), \qquad
    (b+c)\land a = (b\land a)+(c\land a).\cr
}
$$

Im $\mathbb{C}^3$ kann dem äußeren Produkt eine anschauliche Bedeutung beigelegt
werden.
Identifiziert man

$$
    \varepsilon_{12}=\varepsilon_3, \quad
    \varepsilon_{23}=\varepsilon_1, \quad
    \varepsilon_{31}=\varepsilon_2,
$$

liegen also die Einheiten höherer Stufe wieder im ursprünglichen
Vektorraume, so gilt in diesem Falle für das äußere Produkt, welches man
auch vektorielles Produkt nennt (Schreibweise: $a\times b$)

$$
    a\land b = a\times b = (a_2b_3-a_3b_2)\varepsilon_1
        +(a_3b_1-a_1b_3)\varepsilon_2+(a_1b_2-a_2b_1)\varepsilon_3.
$$

Die Verallgemeinerung auf das äußere Produkt von Vektoren höherer Stufe
geschieht nach der Regel

$$
    \varepsilon_{i_1}\land\varepsilon_{i_2}\land\cdots\land\varepsilon_{i_k}
    := \varepsilon_{i_1i_2\ldots i_k},
$$

entsprechend

$$
    \varepsilon_{i_1i_2\ldots i_k}\land\varepsilon_{j_1j_2\ldots j_\ell} =
    \varepsilon_{i_1}\land\varepsilon_{i_2}\land\cdots\land\varepsilon_{i_k}
       \: \land\: \varepsilon_{j_1}\land\varepsilon_{j_2}\land\cdots\land\varepsilon_{j_\ell}.
$$

Unter einem Vektor $k$-ter Stufe versteht man allgemein eine Linearform in
den $n\choose k$ Einheiten $k$-ter Stufe.
Summe, Differenz und inneres Produkt solcher Vektoren sind nach den üblichen
Regeln der Algebra erklärt.
Man darf also Vektoren derselben Stufe beliebig mit Skalaren multiplizieren
und addieren.
1. Satz:  Sind $a_1,a_2,\ldots,a_k$ Vektoren 1-ter Stufe, so
ändert sich das äußere Produkt nicht, wenn man zu einem dieser Vektoren,
etwa $a_1$, ein lineares Kompositum der übrigen Vektoren addiert:

$$
    a_1\land a_2\land\cdots\land a_k = (a_1+\lambda_2a_2+\cdots+\lambda_ka_k)
        \land a_2\land\cdots\land a_k,\qquad\forall\lambda_2,\ldots,\lambda_k\in K.
$$

Der Beweis ergibt sich durch direktes Ausmultiplizieren der rechten Seite.
Bis auf den ersten Summand verschwinden alle weiteren Summanden, da bei allen
anderen Produkten, außer dem ersten, stets zwei gleiche Vektoren miteinander
äußerlich multipliziert werden.
$
\def\multisub#1#2{{\textstyle\mskip-3mu{\scriptstyle1\atop\scriptstyle#2_1}{\scriptstyle2\atop\scriptstyle#2_2}{\scriptstyle\ldots\atop\scriptstyle\ldots}{\scriptstyle#1\atop\scriptstyle#2_#1}}}
\def\multisup#1#2{{\textstyle\mskip-3mu{\scriptstyle#2_1\atop\scriptstyle1}{\scriptstyle#2_2\atop\scriptstyle2}{\scriptstyle\ldots\atop\scriptstyle\ldots}{\scriptstyle#2_{#1}\atop\scriptstyle#1}}}
\def\multisubsup#1#2#3{{\textstyle\mskip-3mu{\scriptstyle#3_1\atop\scriptstyle#2_1}{\scriptstyle#3_2\atop\scriptstyle#2_2}{\scriptstyle\ldots\atop\scriptstyle\ldots}{\scriptstyle#3_{#1}\atop\scriptstyle#2_{#1}}}}
\def\diag{\mathop{\rm diag}}
\def\tridiag{\mathop{\rm tridiag}}
\def\col{\mathop{\rm col}}
\def\row{\mathop{\rm row}}
\def\dcol{\mathop{\rm col\vphantom {dg}}}
\def\drow{\mathop{\rm row\vphantom {dg}}}
\def\rank{\mathop{\rm rank}}
\def\grad{\mathop{\rm grad}}
\def\adj#1{#1^*}
\def\iadj#1{#1^*}
\def\tr{\mathop{\rm tr}}
\def\mapright#1{\mathop{\longrightarrow}\limits^{#1}}
\def\fracstrut{}
$
2. Definition einer Determinante
1. Während das Produkt von $k$ Vektoren erster Stufe insgesamt
$n\choose k$ Komponenten hat, so hat insbesondere das Produkt von $n$
Vektoren nur noch eine Komponente.
Diese Komponente heißt eine Determinante.

$$
    a_1\land a_2\land\cdots\land a_n = \sum a_{1i_1}a_{2i_2}\ldots a_{ni_n}
        \varepsilon_{i_1}\land\varepsilon_{i_2}\land\cdots\varepsilon_{i_n}.
$$

Alle Glieder, welche das Produkt von zwei Einheiten mit gleichem Index
enthalten, verschwinden.
Für die Determinante schreibt man

$$
    \left|A\right|=\left|a_{ik}\right|=\sum\pm a_{1i_1}a_{2i_2}\cdots a_{ni_n},
$$

wobei $\pm=(-1)^{i_1+\ldots+i_n}$.
2. Beispiel:  $n=2$: Es ist

$$
    \left|\matrix{a_{11}&a_{12}\cr a_{21}&a_{22}\cr}\right| =
    a_{11}a_{22} - a_{21}a_{12}.
$$

$n=3$: Hier berechnet man $\left|A\right|$ zu

$$
    a_{11}a_{22}a_{33}+a_{12}a_{23}+a_{13}a_{21}a_{32}
    -a_{31}a_{22}a_{13}-a_{32}a_{23}a_{11}-a_{33}a_{21}a_{12}.
$$

Aufgrund der hohen Anzahl der Summanden, nämlich $n!$ (jeder Summand ist
$n$-faches Produkt), benutzt man zur eigentlichen Berechnung von
Determinanten i.d.R. ab $n\ge3$ Determinantenregeln.
3. Mit Hilfe von Determinanten lassen sich auch die Produkte von weniger als
$n$ Vektoren genauer ausschreiben.
Das Produkt von

$$
    a = a_1\varepsilon_1+a_2\varepsilon_2+\cdots+a_n\varepsilon_n,\qquad
    b = b_1\varepsilon_1+b_2\varepsilon_2+\cdots+b_n\varepsilon_n,
$$

ist

$$
    a\land b = \sum_{i\lt k} \left|a_i,b_k\right|\varepsilon_{ik},
$$

wo $\left|a_i,b_k\right|=\left|{a_i,a_k\atop b_i,b_k}\right|$ bedeutet.
Für einen weiteren dritten Vektor

$$
    c=c_1\varepsilon_1+c_2\varepsilon_2+\cdots+c_n\varepsilon_n,
$$

gilt

$$
    a\land b\land c=\sum_{i\lt j\lt k}\left|a_i,b_j,c_k\right|\varepsilon_{ijk},
$$

mit

$$
    \left|a_i,b_j,c_k\right|=\left|\matrix{
        a_i & a_j & a_k\cr  b_i & b_j & b_k\cr  c_i & c_j & c_k\cr
    }\right|
$$

3. Eigenschaften einer Determinante
1. Bemerkung:  Es gelten:
(1) Die Determinante einer quadratischen Matrix $A=(a_{ik})$

$$
    \left|A\right|=\left|a_{ik}\right|=\sum\pm a_{1i_1}a_{2i_2}\cdots a_{ni_n},
$$

ist eine homogene, lineare Funktion der Elemente einer jeden Zeile und einer
jeden Spalte.
(2) Eine Determinante ändert ihr Vorzeichen, wenn man zwei Zeilen oder zwei
Spalten miteinander vertauscht.
(3) Eigenschaft (1) und (2) sind für eine Determinante charakteristisch.
Bis auf eine Normierung durch einen Skalar, gibt es keine weiteren
multilinearen, alternierenden Formen dieser Art.
2. Satz:  Ist $\Phi\colon \mathop{\rm GL}(n,K)\rightarrow K^\times$ eine
Abbildung mit $\Phi(AB)=\Phi(A)\Phi(B)$, für alle $A,B\in \mathop{\rm GL}(n,K)$, dann
gibt es $\varphi\colon K^\times\rightarrow K^\times$, mit
$\varphi(\alpha\beta)=\varphi(\alpha)\varphi(\beta)$, für alle
$\alpha,\beta\in K^\times$ und es ist $\Phi(A)=\varphi(\det A)$, für alle
$A\in \mathop{\rm GL}(n,K)$.
Beweis: siehe Max Koecher (1985), S.119.
    ☐
3. Siehe Wolfgang Gröbner (1966).
Es seien $A=(a_{ik})$ und $B=(b_{ik})$ zwei $n$-zeilige quadratische
Matrizen, $C=(c_{ik})=AB$ sei die Produktmatrix.
Es ist $c_{ik}=\sum a_{ij}b_{jk}$.
Die Zeilenvektoren von $C$ sind

$$
    c_i = \sum_k c_{ik}\varepsilon_k = \sum_{j,k} a_{ij}b_{jk}\varepsilon_k
        = \sum_j a_{ij}b_j,
$$

wobei $b_j=\sum b_{jk}\varepsilon_k$ die Zeilenvektoren von $B$ bedeuten.
Nun ist

$$
\eqalign{
    c_1\land c_2\land\cdots\land c_n &= \left|C\right|\varepsilon_{12\ldots n}\cr
    &= (a_{11}b_1+a_{12}b_2+\cdots+a_{1n}b_n)\land\cdots\land
       (a_{n1}b_1+a_{n2}b_2+\cdots+a_{nn}b_n)\cr
    &= \left|A\right| b_1\land b_2\land\cdots\land b_n
     = \left|A\right| \left|B\right| \varepsilon_{12\ldots n}.
}
$$

Durch Vergleich der ersten und letzten Zeile sieht man
$\left|C\right| = \left|A\right| \left|B\right|$, also
$\left|AB\right| = \left|A\right| \left|B\right|$.
4. Der oben abgeleitete Determinantenproduktsatz, wie auch letztlich das
kanonische Skalarprodukt, ist ein Spezialfall der Formel von
Cauchy/Binet, auch Determinantenproduktsatz für rechteckige Matrizen
genannt. Cauchy, Augustin Louis (1789--1857)
Binet, Jacques Philipe Marie (1786--1856)
Es sei $A=(a_{ik})$ eine $m\times n$-Matrix, $B=(b_{k\ell})$ eine
$n\times s$-Matrix.
Ihr Produkt $AB=C=(c_{i\ell})$ ist eine $m\times s$-Matrix mit den
Elementen

$$
    c_{i\ell} = \sum_k a_{ik}b_{k\ell},\qquad i=1,\ldots,m,\quad \ell=1,\ldots,s.
$$

Jeder Zeilenvektor $c_i$ von $C$ ist

$$
    c_i = \sum_\ell c_{i\ell}\varepsilon_\ell
        = \sum_{k,\ell} a_{ik}b_{k\ell}\varepsilon_\ell
        = \sum_k a_{ik}b_k,
$$

mit den Zeilenvektoren $b_k$ der Matrix $B$ zu
$b_k = \sum_\ell b_{k\ell}\varepsilon_\ell$.
Nun ist

$$
\eqalign{
    c_1\land c_2\land\cdots\land c_m
    &= \sum_\ell C\multisup m\ell \varepsilon_{\ell_1\ell_2\ldots\ell_m}\cr
    &= \sum_k A\multisup mk (b_{k_1}\land b_{k_2}\land\cdots\land b_{k_m})\cr
    &= \sum_{k,\ell} A\multisup mk B\multisubsup mk\ell \varepsilon_{\ell_1\ell_2\ldots\ell_m},\cr
}
$$

Durch Vergleich der Koeffizienten vor
$\varepsilon_{\ell_1\ell_2\ldots\ell_m}$ findet man

$$
    \sum_{k,\ell} A\multisup mk B\multisubsup mk\ell
    = C\multisup m\ell % C_{12\ldots m}^{\ell_1\ell_2\ldots\ell_m}.
$$

5. Diese Formel kann man noch etwas verallgemeinern, wenn man statt
$c_1\land c_2\land\cdots\land c_m$ das äußere Produkt von irgend welchen
$r$ Zeilenvektoren $c_{i_1}\land c_{i_2}\land\cdots\land c_{i_r}$ auf
genau die gleiche Weise auswertet:

$$
    \sum_{k,\ell} A\multisubsup rik B\multisubsup rk\ell
    = C\multisubsup mi\ell
$$

In Worten: Jede $r$-zeilige Unterdeterminante der Produktmatrix ist darstellbar als
Summe von Produkten $r$-reihiger Unterdeterminanten aus $A$ und $B$, die so
kombiniert sind, daß jeweils die Spaltenindizes der ersten mit den
Spaltenindizes der zweiten übereinstimmen, während die Zeilenindizes der
ersten und die Spaltenindizes der zweiten mit den entsprechenden Indizes
in der Produktmatrix übereinstimmen.
6. Man untersucht nun Spezialfälle der obigen Formel.
Ist $r=m=s$ (also $C$ quadratisch), so hat man

$$
    \left|C\right| = \sum_k A\multisup mk B\multisub mk
$$

Ist $n

$$
    c_{i\ell} = \sum_k a_{ik}a_{\ell k} = a_i\cdot a_\ell,
$$

mit den Zeilenvektoren $a_i$ der Matrix $A$.
Andererseits ist $B\multisub mk = A\multisup mk$
und zusammen mit

$$
    a_{i_1}\land a_{i_2}\land\cdots\land a_{ir}
    = \sum_k A\multisubsup rik \varepsilon_{k_1k_2\ldots k_r},
$$

ergibt sich

$$
    \left|AA^\top\right| = \left|a_i\cdot a_k\right|
    = \sum \left(A\multisup mk\right)^2
    = \left|a_1\land a_2\land\cdots\land a_m\right|^2.
    \tag{*}
$$

Eine Anwendung dieser Formel liefert mit $m=2$ und
$A={a_1a_2\ldots a_m\choose b_1b_2\ldots b_m}$ die Formel von Lagrange,
Lagrange, Joseph Louis (1736--1813)

$$
    \left|a\times b\right| = \sum_{i\lt k} \left(a_ib_k-a_kb_i\right)^2
    = \left(\sum a_i^2\right) \left(\sum b_k^2\right) - \left(\sum a_ib_k\right)^2
    = \left|a\right|^2 \left|b\right|^2 - (ab)^2.
$$

Sind die $a_1,a_2,\ldots,a_m$ paarweise othogonal, also

$$
    a_i\cdot a_k = \cases{0,& für $i\ne k$,\cr \left|a_i\right|^2,& für $i=k$,}
$$

so folgt unmittelbar aus $(*)$

$$
    \left|a_1\land a_2\land\cdots\land a_m\right| =
    \left|a_1\right|\cdot\left|a_2\right|\ldots\left|a_m\right|.
$$

In Worten: Der Betrag des äußeren Produktes von paarweise othogonalen
Vektoren ist gleich dem Produkt ihrer Beträge.
Dies ist die anschauliche Bedeutung des Spatproduktes.
Das Volumen, welches von paarweise othogonalen Vektoren aufgespannt wird,
ist gleich dem Produkt der Seitenlängen.
4. Der Laplacesche Entwicklungssatz
1. Siehe Wolfgang Gröbner (1966).
Es seien $(i_1,\ldots,i_r)$ und $(i'_1,\ldots,i'_s)$ zueinander komplementäre
Anordnungen, also $r+s=n$,

$$
    i_1 \lt  i_2 \lt  \cdots \lt  i_r, \qquad i'_1 \lt  i'_2 \lt  \cdots \lt  i'_s,
$$

und $(i_1,\ldots,i_r, i'_1,\ldots,i'_s)$ ist eine Permutation von
$(1,2,\ldots,n)$, also $s=n-r$.
Komplementär geordnete Anordnungen $(i_1,\ldots,i_r, i'_1,\ldots,i'_s)$ brauchen

$$
    (i_1-1)+(i_2-2)+\cdots+(i_r-r) = i_1+i_2+\cdots+i_r - {r\over2}(r+1)
$$

Transpositionen um die natürliche Anordnung $(1,\ldots,n)$ zu erreichen.
Durch Zusammenfassen von Zeilenvektoren von $A$ rechnet man

$$
\eqalignno{
    \left|A\right| \varepsilon_{1\ldots n} &= a_1\land\cdots\land a_n \cr
    &= (-1)^p \left(a_{i_1}\land\cdots\land a_{i_r}\right) \land
         \left(a_{i'_1}\land\cdots\land a_{i'_{n-r}}\right) \cr
    &= (-1)^p \left(\sum_k A\multisubsup rik \varepsilon_{k_1\ldots k_r}\right) \land
         \left(\sum_k A\multisubsup {n-r}{i'}{k'} \varepsilon_{k'_1\ldots k'_{n-r}}\right) \cr
    &= \sum_k (-1)^{m+p} A\multisubsup rik A\multisubsup {n-r}{i'}{k'}
         \varepsilon_{1\ldots n}. \cr
}
$$

mit

$$
    p = i_1+\cdots+i_r - {r\over2}(r+1),\qquad
    m = k_1+\cdots+i_r - {r\over2}(r+1),
$$

und es wurde benutzt

$$
    \varepsilon_{k_1\ldots k_r}\land\varepsilon_{k'_1\ldots k'_{n-r}} =
    \varepsilon_{k_1\ldots k_r k'_1\ldots k'_{n-r}} =
    (-1)^m \varepsilon_{1\ldots n},
$$

oder allgemeiner

$$
    \varepsilon_{k_1\ldots k_r}\land\varepsilon_{\nu_1\ldots\nu_{n-r}} =
    \varepsilon_{k_1\ldots k_r\nu_1\ldots\nu_{n-r}} = \cases{
        (-1)^m \varepsilon_{1\ldots n}, & falls $\nu_1=k'_1,\ldots,\nu_{n-r}=k'_{n-r}$,\cr
        0, & sonst.\cr
    }
$$

Zur Schreibvereinfachung definiert man das algebraische
Komplement $\alpha\multisubsup rik$ zu

$$
    \alpha\multisubsup rik := (-1)^{i_1+\cdots+i_r+k_1+\cdots+k_r}
        A\multisubsup r{i'}{k'}.
$$

Statt algebraisches Komplement sagt man auch Adjunkte der Unterdeterminante
$A\multisubsup rik$.
Mit dieser Notation erhält man den
2. Satz:  Allgemeiner Laplacescher Entwicklungssatz.
Laplace, Pierre Simon (1749--1827).
Man erhält den Wert der $n$-reihigen Determinante $\left|A\right|$,
entwickelt nach nach den Zeilen $i_1,i_2,\ldots,i_r$,
($1\le i_1

$$
\eqalignno{
    \left|A\right| &= \sum_k A\multisubsup rik \alpha\multisubsup rik,\cr
    \left|A\right| &= \sum_i A\multisubsup rik \alpha\multisubsup rik.\cr
}
$$

Die Summen sind über alle $n\choose r$ Kombination $(k_1,\ldots,k_r)$,
bzw. $(i_1,\ldots,i_r)$ zu erstrecken.
3. Nach dem oben hergeleiteten gilt offensichtlich leicht allgemeiner

$$
    \sum_k A\multisubsup rik \alpha\multisubsup r\ell k = \cases{
        \left|A\right|, & falls $i_\nu=\ell_\nu$,\cr
        0, & sonst.\cr
    }
$$

Für $r=1$ erhält man das übliche Entwickeln nach einer Zeile oder Spalte
insbesondere

$$
    \pmatrix{a_{11} & \ldots & a_{1n}\cr \vdots & \ddots & \vdots\cr a_{n1} & \ldots & a_{nn}\cr}
    \pmatrix{\alpha_1^1 & \ldots & \alpha_n^1\cr \vdots & \ddots & \vdots\cr
        \alpha_1^n & \ldots & \alpha_n^n\cr} =
    \pmatrix{\left|A\right| && 0\cr &\ddots&\cr 0&&\left|A\right|\cr}.
$$

Wie üblich für $\xi_i^j$: $i$ Zeilenindex, $j$ Spaltenindex,
für $(\alpha)$ also transponierte Matrix.
Damit liegt eine explizite Beschreibung der inversen Matrix vor, also
$\alpha_i^j / \left|A\right|$ für das $(j,i)$-Element der Inversen.
4. Satz: (Minor Inverser) Es sei $B=A^{-1}$, wobei $A$
invertierbar sei.
Jeden Minor der Inversen kann man ausdrücken durch die Adjunkte der
Ursprungsmatrix:

$$
    B\multisubsup rik = {\alpha\multisubsup rik\over\left|A\right|}
    = {(-1)^m\over\left|A\right|} A\multisubsup {n-r}{i'}{k'},
    \qquad m = i_1+\cdots+i_r + k_1+\cdots+k_r.
$$

Beweis:  Nach Cauchy/Binet ist

$$
    \sum_k A\multisubsup rik B\multisubsup rk\ell = \cases{
        1, & falls $i_\nu=\ell_\nu$ $\forall\nu$,\cr
        0, & sonst.\cr} \tag{*}
$$

Nach dem Laplaceschen Entwicklungssatz ist

$$
    \sum_k A\multisubsup rik \alpha\multisubsup r\ell k = \cases{
        \left|A\right|, & falls $i_\nu=\ell_\nu$ $\forall\nu$,\cr
        0, & sonst.\cr}
$$

Es sind $(A\multisubsup rik)_k$ und $(\alpha\multisubsup rk\ell)_k$ beides
Matrizen mit ${n\choose r}={n\choose n-r}$ Zeilen und Spalten.
Nach $(*)$ ist $(B\multisubsup rk\ell)_k$ offensichtlich Inverse, genauso
aber auch $\alpha\multisubsup rik / \left|A\right|$.
Da Inversen eindeutig bestimmt sind, folgt Gleichheit.
    ☐
5. Beispiel:  Sowohl für Cauchy/Binet, Laplaceschen
Entwicklungssatz als auch Minoren Inverser.
Es seien

$$
    A = \pmatrix{
        13 & 14 & 6 & 4\cr
        8 & -1 & 13 & 9\cr
        6 & 7 & 3 & 2\cr
        9 & 5 & 16 & 11\cr
    }, \qquad
    A^{-1} = \pmatrix{
        1 & 0 & -2 & 0\cr
        -5 & 1 & 11 & -1\cr
        287 & -67 & -630 & 65\cr
        -416 & 97 & 913 & -94\cr
    }.
$$

(1) Die Determinante von $A$ berechnet man z.B. so:

$$
    \left|A\right| = A_{12}^{12} A_{34}^{34} - A_{12}^{13} A_{34}^{24}
        + A_{12}^{14} A_{34}^{23} + A_{12}^{23} A_{34}^{14}
        - A_{12}^{24} A_{34}^{13} + A_{12}^{34} A_{34}^{12}
    = 1.
$$

Hierbei muß nicht wie bei dem Laplaceschen Entwicklungssatz nach einer
Zeile (oder Spalte) immer ein Vorzeichenwechsel von einem Term zum nächsten
stattfinden.
(2) Es ist $AB=:C=I$. Also nach Cauchy/Binet wie oben $4\choose2$ Summanden

$$
    C_{23}^{34} = \left|\matrix{0&0\cr 1&0\cr}\right| =
    A_{23}^{12} B_{12}^{34} + A_{23}^{13} B_{13}^{34} + A_{23}^{14} B_{14}^{23}
        + A_{23}^{23} B_{23}^{34} + A_{23}^{24} B_{24}^{34} + A_{23}^{34} B_{34}^{34}
    = 0.
$$

(3) Für den Minor $B_{12}^{34}$ der Inversen $B$ rechnet man

$$
    B_{12}^{34} = \left|\matrix{-2&0\cr 11&-1\cr}\right|
    = {(-1)^{10}\over1} A_{12}^{34} = \left|\matrix{6&4\cr 13&9\cr}\right| = 2,
$$

genauso

$$
    B_{23}^{24} = \left|\matrix{1&-1\cr 67&65\cr}\right|
    = (-1)^{11} A_{13}^{14} = -\left|\matrix{13&4\cr 6&2\cr}\right| = -2.
$$

5. Weitere Folgerungen aus dem Satz von Cauchy/Binet
Aufgrund seiner großen Bedeutung sei für den Determinantenmultiplikationssatz
von Cauchy/Binet ein weiterer Beweis angegeben, der nicht Bezug nimmt
auf das äußere Produkt.
1. Satz:  (Satz von Cauchy/Binet) Es sei $C=AB$.
Dann gilt $C_{1\ldots r}^{1\ldots r} = \sum_i A\multisup ri B\multisub ri$.
Beweis: (für Cauchy/Binet)
siehe Gantmacher, Felix R. (1908--1964), Gantmacher (1986).
Man rechnet

$$
\eqalignno{
    \left|\matrix{
        c_{11} & \ldots & c_{1r}\cr
        \vdots & \ddots & \vdots\cr
        c_{r1} & \ldots & c_{rr}\cr
    }\right| &=
    \left|\matrix{
        \sum_{i_1=1}^n a_{1i_1}b_{i_11} & \ldots & \sum_{i_r=1}^n a_{1i_r}b_{i_rr}\cr
        \vdots & \ddots & \vdots\cr
        \sum_{i_1=1}^n a_{ri_1}b_{i_11} & \ldots & \sum_{i_r=1}^n a_{ri_r}b_{i_rr}\cr
    }\right| &\cr &= \sum_{i_1,\ldots,i_r=1}^n \left|\matrix{
        a_{1i_1}b_{i_11} & \ldots & a_{1i_r}b_{i_rr}\cr
        \vdots & \ddots & \vdots\cr
        a_{ri_1}b_{i_11} & \ldots & a_{ri_r}b_{i_rr}\cr
    }\right| &\cr &= \sum_{i_1,\ldots,i_r=1}^n A\multisup ri b_{i_11}\ldots b_{i_rr}. &\cr
}
$$

Unter allen $n^r$ Summanden sind nur $n(n-1)\ldots(n-r+1)={n\choose r}r!$
Summanden von Interesse, bei denen die Minoren $A\multisup ri$ nicht zwei,
drei, $\ldots$, $r$ gleiche Spalten enthalten.
Von den ${n\choose r}r!$ sind aber wiederum nur $n\choose r$ echt verschieden,
die restlichen sind nichts anderes als Vertauschungen zweier Spalten.
Also rechnet man weiter

$$
\eqalignno{
    &\phantom{{}={}} \sum_{1\le i_1\lt \cdots\lt i_r\le n} \:  \sum_{(\nu_1,\ldots,\nu_r)\in{\rm Perm}(i_1,\ldots,i_r)}
        \sigma(\nu_1,\ldots,\nu_r) A\multisup ri b_{\nu_11}\ldots b_{\nu_rr} \cr
    &= \sum_{1\le i_1\lt \cdots\lt i_r\le n} A\multisup ri \sum \sigma(\nu_1,\ldots,\nu_r)
        b_{\nu_11}\ldots b_{\nu_rr} \cr
    &= \sum_{1\le i_1\lt \cdots\lt i_r\le n} A\multisup ri B\multisub ri . \cr
}
$$

    ☐
2. Der Satz von Cauchy/Binet liest sich für mehr als
zwei Matrizen wie folgt

$$
\eqalignno{
    (AB)_i^j &= \sum_k A_i^k B_k^j, \cr
    (ABC)_i^j &= \sum_{k,\ell} A_i^k B_k^\ell C_\ell^j, \cr
    (ABCD)_i^j &= \sum_{k,\ell,m} A_i^k B_k^\ell C_\ell^m D_m^j, \cr
    (ABCDE)_i^j &= \sum_{k,\ell,m,p} A_i^k B_k^\ell C_\ell^m D_m^p E_p^j. \cr
}
$$

3. Es sei

$$
    {\cal A}_p := (A\multisubsup pik)_{i_1\lt \cdots\lt i_p,{\mskip 3mu}k_1\lt \cdots\lt k_p}
    \in \mathbb{C}[\textstyle{{n\choose p}\times{n\choose p}}]
$$

die ^{$p$-te assoziierte Matrix} zu $A$.
Die Anordnungen seien in lexikographischer Reihenfolge durchlaufen.
Beispielsweise erhält man für eine $4\times4$ Matrix $A$ die $6\times6$ Matrix

$$
    {\cal A}_6 = \pmatrix{
        A_{12}^{12} & A_{12}^{13} & \ldots & A_{12}^{34}\cr
        \vdots      & \vdots      & \ddots & \vdots\cr
        A_{34}^{12} & A_{34}^{13} & \ldots & A_{34}^{34}\cr
    }
$$

Eine Umformulierung des Satzes von Cauchy/Binet ist:
Aus $C=AB$ folgt ${\cal C}_p = {\cal A}_p {\cal B}_p$, $p=1,2,\ldots,n$.
Insbesondere: Aus $B=A^{-1}$ folgt ${\cal B}_p = {\cal A}_p^{-1}$,
$p=1,2,\ldots,n$.
4. Satz:  Es sei $A=(a_{ij})_{i,j=1}^n$ und

$$
    \left|A-\lambda I\right| = (-\lambda)^n + c_{n-1}(-\lambda)^{n-1} +
        c_{n-2}(-\lambda)^{n-2} + \cdots + c_1(-\lambda) + c_0.
$$

Dann gilt

$$
    c_{n-1} = \sum_{1\le i\le n} a_{ii}, \qquad
    c_{n-2} = \sum_{1\le i_1\lt i_2\le n} A_{i_1i_2}^{i_1i_2}, \qquad
    c_{n-3} = \sum_{1\le i_1\lt i_2\lt i_3\le n} A_{i_1i_2i_3}^{i_1i_2i_3}, \quad
    \ldots,\quad
    c_0 = A_{1\ldots n}^{1\ldots n}=\left|A\right|.
$$

Beweis: Siehe Felix Ruvimovich Gantmacher (1908--1964),
Gantmacher, 1986, "Matrizentheorie",
§3.7.
Die Potenz $(-\lambda)^{n-p}$ tritt in denjenigen Termen von
$\left|A-\lambda I\right|$ auf, die

$$
    a_{k_1k_1}-\lambda, {\mskip 5mu} a_{k_2k_2}-\lambda, {\mskip 5mu} \ldots, {\mskip 5mu}
    a_{k_{n-p}k_{n-p}}-\lambda, \qquad k_1\lt \cdots\lt k_{n-p}
$$

enthalten.
Anwendung des allgemeinen Laplaceschen Entwicklungssatzes entwickelt nach
$(k_1,\ldots,k_{n-p})$ liefert

$$
    \left|A-\lambda I\right| = (a_{k_1k_1}-\lambda) (a_{k_2k_2}-\lambda) \ldots
        (a_{k_{n-p}k_{n-p}}-\lambda) A_{i_1\ldots i_p}^{i_1\ldots i_p} + \hbox{Rest},
$$

wobei $(i_1,\ldots,i_p)$  die zu $(k_1,\ldots,k_{n-p})$ komplementäre
Anordnung ist, also $\{k_1,\ldots,k_{n-p},{\mskip 3mu}i_1,\ldots,i_p\} = \{1,\ldots,n\}$.
Bildet man alle möglichen ${n\choose n-p}={n\choose p}$ Kombinationen von
$n-p$ Elementen $k_1<\cdots
5. Beispiel:  zu $c_{n-k}=\sum_i A\multisubsup kii$ im Falle $n=3$.
Für

$$
    \left|\matrix{
        a_{11}-\lambda & a_{12} & a_{13}\cr
        a_{21} & a_{22}-\lambda & a_{23}\cr
        a_{31} & a_{32} & a_{33}-\lambda\cr
    }\right|
$$

erhält man

$$
\eqalignno{
    &\phantom{=} (-\lambda)^3 + (a_{11}+a_{22}+a_{33})\lambda^2 + (a_{11}a_{22}-a_{21}a_{12}
        a_{11}a_{33}-a_{31}a_{13}+a_{22}a_{33}-a_{32}a_{23})(-\lambda) +
        \left|A\right| &\cr
    &= (-\lambda)^3 + (A_1^1+A_2^2+A_3^3)\lambda^2 +
        (A_{12}^{12}+A_{13}^{13}+A_{23}^{23})(-\lambda) + \left|A\right|. &\cr
}
$$

6. Eine direkte Folge ist der Vietascher Wurzelsatz.
Vieta (siehe  Viète), Fran\c cois Viéte (1540--1603).
Entweder benutzt man eine Jordansche Normalform ($A=XJX^{-1}$) oder eine
Schursche Normalform ($A=UT\adj U$).
Das charakteristische Polynom bleibt bei einer Ähnlichkeitstransformation
invariant, daher

$$
    c_{n-k} = \sum_{i_1\lt \cdots\lt i_k} \lambda_{i_1}\ldots\lambda_{i_k}
        = \sum_{i_1\lt \cdots\lt i_k} A\multisubsup kii.
$$

Es wird nicht behauptet, daß i.a. $\lambda_{i_1}\ldots\lambda_{i_k}=A\multisubsup kii$.
Beispielsweise für eine invertierbare Begleitmatrix $C_1\in\mathbb{C}^{n\times n}$
gilt $\lambda_1\ldots\lambda_k\ne (C_1)_{1\ldots k}^{1\ldots k}=0$,
für $k
7. Satz:  Bei zwei diagonalähnlichen Matrizen $A,B\in\mathbb{C}^{n\times n}$
mögen sämtliche Eigenvektoren gleich sein.
Dann gilt: $AB=BA$, d.h. $A$ und $B$ kommutieren.
Beweis: $X$ enthalte sämtliche Eigenvektoren, $D_1=\mathop{\rm diag}\lambda_i$,
$D_2=\mathop{\rm diag}\mu_i$, $A=XD_1X^{-1}$, $B=XD_2X^{-1}$.
Also $AB=XD_1X^{-1}XD_2X^{-1}=XD_1D_2X^{-1}=XD_2X^{-1}XD_1X^{-1}=BA$.
    ☐
8. Satz:  Es gelte $AB=BA$.
Dann gilt: $A$ und $B$ haben gemeinsame Eigenvektoren.
Beweis: Siehe James H. Wilkinson (1919--1986),
Wilkinson (1965) "The Algebraic Eigenvalue Problem",
siehe Gantmacher, Felix R. (1908--1964), Gantmacher (1986) "Matrizentheorie", §9.10.
Für ein beliebiges Eigenelement $(\lambda,x)$ von $A$ gilt
$AB^kx=\lambda B^kx$, $k=0,1,2,\ldots$
In der Vektorfolge $x$, $Bx$, $B^2x$, $\ldots$ seien die ersten $p$ Vektoren
linear unabhängig, also der $(p+1)$-te Vektor $B^px$ ist eine Linearkombination
der $p$ vorhergehenden.
Der Unterraum ${\cal S}:=\left$ ist bzgl. $B$
invariant, also $B{\cal S}\subseteq\cal S$, daher existiert ein Eigenvektor
$y\in\cal S$ für $B|\cal S$, damit auch für $B$.
$AB^kx=\lambda B^kx$ zeigt, daß $x$, $Bx$, $B^2x$, $\ldots$ Eigenvektoren zum
selben Eigenwert $\lambda$ sind.
Insbesondere jede Linearkombination dieser Vektoren ist Eigenvektor von $A$,
also auch $y\in\cal S$.
    ☐
9. Bemerkung:  Beim Beweis war wesentlich, daß $B$ einen Eigenvektor
besitzt.
Bei komplexen Matrizen ist dies aufgrund des Fundamentalsatzes der Algebra
klar.
Bei reellen Matrizen (über $\mathbb{R}$) braucht kein reeller Eigenwert zu
existieren und somit auch kein Eigenvektor.
Die Drehungsmatrix $T={\cos\alpha{\mskip 3mu}-\sin\alpha\choose\sin\alpha{\mskip 3mu}\cos\alpha}$
hat für geeignetes $\alpha$ keinen reellen Eigenwert.
Anschaulich ist dies ersichtlich, weil nicht jede Drehung streckt, staucht
oder Fixpunkte hat.
Algebraisch ist dies ersichtlich, weil $\det(A-\lambda I)=
\lambda^2-2\lambda\cos\alpha+1=(\lambda-\cos\alpha)^2+(1-\cos^2\alpha)$
nicht für jedes $\alpha$ über $\mathbb{R}$ zerfällt.
Sehr wohl hat $T$ jedoch in $\mathbb{C}$ die beiden Eigenwerte
$\lambda=\pm i\sin\alpha$.
Der Satz bleibt richtig, wenn man im Reellen zusätzlich fordert, daß $B$
nur reelle Eigenwerte hat, z.B. falls $B$ hermitesch ist.
Der Satz bleibt auch richtig, wenn man voraussetzt:
$A$ und $B$ enthalten $1\times1$ Jordanblöcke (lineare Elementarteiler).
10. Satz:  Es sei $A\in\mathbb{C}^{m\times n}$ und $B\in\mathbb{C}^{n\times m}$.
Sind beide Matrizen quadratisch ($m=n$) so haben $AB$ und $BA$ dasselbe
charakteristische Polynom und damit die gleichen Eigenwerte samt
Multiplizitäten.
Im Falle $m\ne n$ haben $AB$ und $BA$ die gleichen Eigenwerte samt Multiplizitäten
außer, daß das Produkt der höheren Ordnung $\left|m-n\right|$ zusätzliche Nullen
im Spektrum hat.
Beweis: siehe Wilkinson, J.H., Wilkinson (1965).
Es ist

$$
    \left|\matrix{ I&0\cr -B&\mu I\cr }\right|
    \left|\matrix{ \mu I&A\cr B&\mu I\cr }\right| =
    \left|\matrix{ \mu I&A\cr 0&\mu^2I-BA\cr }\right|
$$

und

$$
    \left|\matrix{ \mu I&-A\cr 0&I\cr }\right|
    \underbrace{ \left|\matrix{\mu I&A\cr B&\mu I\cr}\right| }_{{}=:\alpha} =
    \left|\matrix{ \mu^2I-AB&0\cr B&\mu I\cr }\right|.
$$

Also

$$
    \mu^n \alpha = \mu^n \left|\mu^2I-BA\right| = \mu^n \left|\mu^2I-AB\right|.
$$

Für $\mu=0$ beachte man $\left|AB\right|=\left|BA\right|$.
Der Fall $m\ne n$ wird genauso bewiesen.
    ☐
Den Beweis hätte man auch direkt über die Koeffizienten des charakteristischen
Polynomes führen können.
Nämlich mit

$$
\eqalignno{
    \left|AB-\lambda I\right| &= (-\lambda)^n+c_{n-1}(-\lambda)^{n-1}+
        \cdots+c_1(-\lambda)+c_0,\cr
    \left|BA-\lambda I\right| &= (-\lambda)^n+d_{n-1}(-\lambda)^{n-1}+
        \cdots+d_1(-\lambda)+d_0,\cr
}
$$

berechnet man die $c_i$ und $d_i$ zu

$$
    c_{n-k} = \sum_i (AB)\multisubsup kii = \sum_{i,\ell} A_i^\ell B_\ell^i,
    \qquad d_{n-k} = \sum_i (BA)_i^i = \sum_{i,\ell} B_i^\ell A_\ell^i.
$$

Vertauschung von $i$ und $\ell$ in einer der beiden Summen zeigt Gleichheit,
einmal abgesehen von möglichen “Stellenverschiebungen”.
Also $c_{n-k+\ell}=d_{n-k}$, was aber gerade Multiplikation des charakteristischen
Polynomes mit $\lambda^\ell$ bedeutet.

		


Lines of Code of various Open-Source Projects
Wed, 24 Jan 2024 20:10:00 +0100

As of today the following open-source projects have the below lines of code (LOC).



Name
LOC in million




Linux kernel
34.987


Chrome
30.992


PHP
1.814


Apache HTTP Server
1.659


WordPress
1.157


Slurm
0.844


Git
0.580


X server
0.511


bash
0.249


Zola
0.022


Simplified Saaze
0.002




		


Matrixpolynome
Tue, 23 Jan 2024 19:45:00 +0100

Matrixpolynome (oder gelegentlich auch $\lambda$-Matrizen genannt) sind
Polynome, bei denen die Koeffizienten Matrizen sind, quadratisch oder
rechteckig, dies ist vorerst gleichgültig.
Also

$$
    L(\lambda) = A_\ell\lambda^\ell + A_{\ell-1}\lambda^{\ell-1} +
        \cdots + A_1\lambda + A_0, \qquad
    A_\ell,A_{\ell-1},\ldots,A_1,A_0\in\mathbb{C}^{m\times n}.
$$

Für den Fall $\ell=1$ gilt häufig $L(\lambda)=I\lambda-A$.
1. Vektorräume und lineare Abbildungen
1. Definition:  (1) Ein Vektor $a_1$ heißt linear-abhängig von
den Vektoren $a_2,\ldots,a_n$ genau dann, wenn $a_1$ lineares Komposituum
dieser $(n-1)$ Vektoren ist, also

$$
    a_1 = \lambda_2a_2 + \cdots + \lambda_na_n, \qquad
    \lambda_2,\ldots,\lambda_n\in\mathbb{C}.
$$

In Zeichen: $a_1{\mathrel{\underline\perp}}(a_2,\ldots,a_n)$.
Die $n$ Vektoren $a_1,\ldots,a_n$ heißen dann ebenfalls linear-abhängig,
in Zeichen ${\mathrel{\underline\perp}}(a_1,\ldots,a_n)$.
(2) $a_1$ ist von $a_2,\ldots,a_n$ linear-unabhängig genau dann, wenn $a_1$
von $a_2,\ldots,a_n$ nicht linear-abhängig ist, also $a_1$ nicht als lineares
Komposituum der anderen $(n-1)$ Vektoren darstellbar ist.
In Zeichen $a_1{\mathrel{\underline{\not\perp}}}(a_2,\ldots,a_n)$.
(3) Ist $a_1{\mathrel{\underline{\not\perp}}} a_2,\ldots,a_n$ linear-unabhängig,
$a_2{\mathrel{\underline{\not\perp}}} a_1,a_3,\ldots,a_n$, $\ldots$, $a_n{\mathrel{\underline{\not\perp}}} a_1,\ldots,a_{n-1}$,
so heißt die Vektorfamilie $(a_1,\ldots,a_n)$ linear-unabhängig (schlechthin),
in Zeichen ${\mathrel{\underline{\not\perp}}}(a_1,\ldots,a_n)$.
Ist $a_1$ von $a_2,\ldots,a_n$ linear-abhängig, so ist $a_1$ in gewisser
Hinsicht überflüssig, da $a_1$ ja aus den anderen Vektoren zusammengesetzt
werden kann.
Liegen $a_2,\ldots,a_n$ in einer Ebene, so liegt damit natürlich auch $a_1$
in der gleichen Ebene.
Man beachte, daß eine (zweistellige) Relation zwischen einem Vektor und
$(n-1)$ anderen Vektoren definiert wurde und eine Eigenschaft zwischen
$n$ Vektoren, also eine $n$-stellige Relation.
2. Definition und Eigenschaften von Standard-Tripeln
Gegeben sei das monische Matrixpolynom

$$
    L(\lambda)=\sum_{i=0}^\ell A_i\lambda^i,
    \qquad A_\ell=I,\quad A_i\in\mathbb{C}^{n\times n}.
$$

Die Vektorfamilie $x_0,\ldots,x_k$, mit $x_0\ne\bf0$,
$x_i\in\mathbb{C}^{n\times1}$, heißt rechte Jordan-Kette (oder auch
rechte Keldysh-Kette), Keldysh, M.V., der Länge $(k+1)$ für das
Matrixpolynom $L(\lambda)$ zum Eigenwert $\lambda_0$ genau dann, wenn

$$
 \pmatrix{
 L(\lambda_0) & & & \llap{0}\cr
 L'(\lambda_0) & L(\lambda_0) & & \cr
 \vdots & \vdots & \ddots & \cr
 {1\over k!}L^{(k)}(\lambda_0) & {1\over(k-1)!}L^{(k-1)}(\lambda_0) & \ldots & L(\lambda_0)\cr}
 \pmatrix{x_0\cr x_1\cr \vdots\cr x_k\cr}
 =
 \pmatrix{0\cr 0\cr \vdots\cr 0\cr}.
$$

Die hierbei links auftretende Matrix $\mathbb{P}$ ist natürlich nicht
invertierbar, weil $L(\lambda_0)$ nicht invertierbar ist.
Die Vektorfamilie $y_0,\ldots,y_k$, mit $y_0\ne\bf0^\top$,
$y_i\in\mathbb{C}^{1\times n}$, heißt linke Jordan-Kette der
Länge $(k+1)$ für das
Matrixpolynom $L(\lambda)$ zum Eigenwert $\lambda_0$ genau dann, wenn

$$
    (y_0,\,\ldots,\,y_n)\cdot\mathbb{P}=(0^\top,\,\ldots,\,0^\top),
$$

d.h. also, wenn $y_0^\top,\ldots,y_k^\top$ eine rechte Jordan-Kette
ist.
Das Paar von Matrizen von Matrizen $(X,T)$, mit $X$ von der Größe
$n\times n\ell$ und $T$ der Größe $n\ell\times n\ell$, heißt
Standard-Paar genau dann, wenn gilt:

$\mathop{\rm col}(XT^i)_{i=0}^{\ell-1}$ ist invertierbar,
$\sum_{i=0}^\ell A_iXT^i=\bf 0$.

Ist $T$ eine Jordan-Matrix, so heißt das Paar $(X,T)$ auch
Jordan-Paar.
Das Matrizentripel $(X,T,Y)$, mit $X$ der Größe $n\times n\ell$, $T$ der
Größe $n\ell\times n\ell$ und $Y$ der Größe $n\ell\times n$, heißt
Standard-Tripel des Matrixpolynoms $L(\lambda)$ genau dann,
wenn gilt:

$(X,T)$ ist Standard-Paar,



$$
    Y = \pmatrix{X\cr XT\cr \vdots\cr XT^{\ell-1}\cr}^{-1}
        \pmatrix{0\cr \vdots\cr 0\cr I\cr}.
$$

Ist $T$ wiederum eine Jordan-Matrix, so heißt $(X,T,Y)$ auch
Jordan-Tripel.
Ist $(X,T,Y)$ Jordan-Tripel, dann sind die Spalten von $X$ rechte
Jordanketten (Keldysh-Ketten), Keldysh, M.V., von $L(\lambda)$, falls $X$
derart in Blöcke aufgespalten wird, sodaß diese konsistent mit der
Unterteilung der Jordan-Matrix $J$ sind.
Hierzu dual sind die Zeilen von $Y$ Links-Jordan-Ketten
zu $L(\lambda)$.
Zusammenfassend entnimmt man die nötigen Dimensionen der Matrizen
$X$, $T$ und $Y$ dem Schema

$$
    \left(X, T, Y\right): \qquad
    \eqalign{X\colon{}&n\times n\ell\cr
             T\colon{}&n\ell\times n\ell\cr
             Y\colon{}&n\ell\times n\cr} \qquad
    \eqalign{X\colon{}&\mathbb{C}^{n\ell}\rightarrow\mathbb{C}^n\cr
             T\colon{}&\mathbb{C}^{n\ell}\rightarrow\mathbb{C}^{n\ell}\cr
             Y\colon{}&\mathbb{C}^n\rightarrow\mathbb{C}^{n\ell}\cr} \qquad
    \eqalign{X\colon{}&\mathbb{R}^\ell\rightarrow\mathbb{R}\cr
             T\colon{}&\mathbb{R}^\ell\rightarrow\mathbb{R}^\ell\cr
             Y\colon{}&\mathbb{R}\rightarrow\mathbb{R}^\ell\cr}
$$

Ist $(X,T,Y)$ Standard-Tripel, so gilt

$$
    XT^iY=\cases{0,&für $i=0,\ldots,\ell-2$\cr I,&für $i=\ell-1$.\cr}
$$

1. Äquivalente Charakterisierungen für Standard-Tripel. Es gelten die
folgenden Eigenschaften.
Das Matrizentripel $(X,T,Y)$ ist genau dann Standard-Tripel, wenn
für die Inverse des Matrixpolynomes $L(\lambda)$ die Darstellung gilt

$$
    L^{-1}(\lambda) = X (I\lambda-T)^{-1} Y, \qquad\lambda\notin\sigma(L).
$$

$L^{-1}(\lambda)$ kann man auffassen als Übertragungsfunktion des linearen
Systems

$$
    {d{\bf x}\over dt} = T{\bf x}+Y{\bf x},\qquad y=X{\bf x},\quad{\bf x}(0)=0.
$$

Weiterhin gilt

$$
    {1\over2\pi i}\int_\Gamma f(\lambda)L^{-1}(\lambda)d\lambda
    = X f(T) Y,
$$

wobei $\Gamma$ eine rektifizierbare Kurve ist, sodaß $\sigma(L)$ innerhalb
von $\Gamma$ liegt, und $f$ ist eine holomorphe Funktion innerhalb
von $\Gamma$ und innerhalb einer Umgebung von $\Gamma$.
2. Linearisierungen.
Das Matrixpolynom $I\mu-A$ der Größe $(n+p)\times(n+p)$ ist eine
Linearisierung des Matrixpolynomes $L(\mu)$ der
Größe $\ell\times\ell$ und des Grades $n$ genau dann, wenn

$$
    I\mu-A\sim\pmatrix{L(\mu) & 0\cr 0 & I\cr}.
$$

Zwei Matrixpolynome $M_1(\mu)$ und $M_2(\mu)$ sind äquivalent, also
$M_1(\mu)\sim M_2(\mu)$, genau dann, wenn

$$
    M_1(\mu) = E(\mu) M_2(\mu) F(\mu), \qquad\forall\mu\in\mathbb{C},
$$

mit Matrixpolynomen $E(\mu)$ und $F(\mu)$, mit nicht verschwindender
konstanter Determinante.
Offensichtlich muß $n+p=n\ell$ sein.
Zwei Linearisierungen sind stets zueinander ähnlich.
Jede zu einer Linearisierung ähnliche Matrix, ist ebenfalls eine
Linearisierung.
Nebenläufig sei darauf hingewiesen, daß bei quadratischen Matrizen, jede
Matrix zu ihrer Transponierten ähnlich ist.
Weiter gilt nun der
3. Satz:  Ist eine Matrix $T\in\mathbb{C}^{m\times m}$ gegeben,
so ist $T$ genau
dann eine Linearisierung eines monisches Matrixpolynoms vom Grade $\ell$
und der Größe $n\times n$, wenn die beiden folgenden Bedingungen erfüllt
sind:

$m=n\ell$ und
$\displaystyle\max_{\lambda\in\mathbb{C}}\dim\ker(I\lambda-T)\le n$.

Den Beweis führt man auf den Smith'schen Normalformensatz zurück.
Zum Beweise dieser und anderer hier relevanter Tatsachen, sei auf das
Buch von Gohberg/Lancaster/Rodman (1982)
hingewiesen, wo auch weiterführende Literaturstellen zu diesem Thema
angegeben werden.
Autoren sind Gohberg, Izrael' TSudikovich,
Lancaster, Peter
und Rodman, Leiba.
4. Matrixdifferenzengleichungen und Standard-Tripel.
Bei linearen Mehrschrittverfahren der Form

$$
    \alpha_0y_n+\alpha_1y_{n+1}+\cdots+\alpha_ky_{n+k} =
    h\left(\beta_0f_n+\beta_1f_{n+1}+\cdots+\beta_kf_{n+k}\right),
    \qquad\alpha_k\ne0,
$$

tauchen in natürlicher Form skalare Differenzengleichungen auf.
Bei zyklischen, linearen Verfahren, wie z.B. der Form

$$
\begin{align}
-2y_{3m-2}&+&9y_{3m-1}&-&18y_{3m}&+&11y_{3m+1}&&&&%
    &=&6h\dot y_{3m+1},\cr
&-&2y_{3m-1}&+&9y_{3m}&-&18y_{3m+1}&+&11y_{3m+2}&&%
    &=&&&6h\dot y_{3m+2},\cr
&&&&&&9y_{3m+1}&-&12y_{3m+2}&+&3y_{3m+3}%
    &=&h\bigl(-4\dot y_{3m+1}&-&4\dot y_{3m+2}+2\dot y_{3m+3}\bigr).\cr
\end{align}
$$

tauchen Matrixdifferenzengleichungen der Form

$$
    u_{\ell+r}+A_{\ell-1}u_{\ell-1+r}+\cdots+A_1u_{1+r}+A_0u_r = f_r,
    \qquad r=0,1,\ldots
$$

in ebenso natürlicher Weise auf.
Gelegentlich ist es von Vorteil, eine Darstellung für die Lösung der
Differenzengleichung zu haben, welche deutlich macht, wie sämtlich bisher
berechneten Werte für nachfolgende Werte eingehen.
5. Satz:
Es gilt für die Lösung der Matrixdifferenzengleichung

$$
    Iu_{\ell+r}+\sum_{i=0}^{\ell-1}A_iu_{i+r}=f_r,\qquad r=0,1,\ldots,
$$

die Darstellung der Lösung zu

$$
    u_{m+1}=XT^{m+1}c+X\sum_{i=0}^m T^{m-i}Yf_i,\qquad m=0,1,\ldots,
$$

wobei $(X,T,Y)$ Standard-Tripel ist zum Matrixpolynom

$$
    L(\lambda)=I\lambda^\ell+\sum_{i=0}^{\ell-1}A_i\lambda^i.
$$

Der Vektor $c\in\mathbb{C}^{n\ell}$ ist durch Vorgabe der Startwerte

$$
    u_r=a_r,\qquad r=0,\ldots,\ell-1
$$

eindeutig bestimmt und gegeben durch

$$
    c = \pmatrix{Y,&TY,&\ldots,&T^{\ell-1}Y}\pmatrix{
        A_1 & A_2    & \ldots & I\cr
        A_2 & \vdots & \unicode{x22F0} & 0\cr
        \vdots & I   &        & \vdots\cr
        I   & 0      & \ldots & 0\cr}
    \pmatrix{a_0\cr a_1\cr \vdots\cr a_{\ell-1}\cr}
    = \left(\mathop{\rm col}_{i=0}^{\ell-1} XT^i\right)^{-1}\mathop{\rm col}_{\nu=0}^{\ell-1} a_\nu.
$$

Setzt man $R=\mathop{\rm row}_{i=0}^{\ell-1}T^iY$, $Q=\mathop{\rm col}_{i=0}^{\ell-1}XT^i$, so
ist $RBQ=I$ und $c=RBa=Q^{-1}a$.

		


Member of 250KB club
Sat, 20 Jan 2024 19:10:00 +0100

I am now a member of the 250KB club. See "Proud member":

eklausmeier.goip.de
Proud member of the exclusive 250KB Club!
Added: 2024-01-19 | Last updated: 2024-01-19
eklausmeier.goip.de is a member of the exclusive 250KB Club. The page weighs only 78kb and has a content-to-bloat ratio of 13%.
They are now entitled to add one of those shiny badges to your page. But don't forget, even though I tried to make them as small as possibe, a badge will add some kilobytes to your page weight. A code snipped can be found by clicking on the respective badge.
  

  

  

  

While the overall size of 78kb, compressed size, is OK, the bloat ratio of 13% is not so good. I.e., 87% is effectively bloat.
In my case the major contributing factors are:

Google fonts, no fault on Google
JavaScript for Pagefind for having instant search

For example, the post Moved Blog To eklausmeier.goip.de measured with tools.pingdom.com loads in 244ms from Frankfurt and needs 8 requests.

The distribution among content type is as below.

Again, 90% is fonts, script, and CSS, i.e., bloat.
Without losing any information, but with losing appearance and slickness I could spare 80%!
Looking at the waterfall diagram one can see that dropping fonts would not lead to any significant faster website.
This is because Google is pretty fast serving all those fonts.
Similarly, Pagefind's processing can be seen overlapping the other processing, so not adding much waiting.

Though I am also a little guilty in the overall website obesity crisis.

Most of the talk about web performance is similarly technical, involving compression, asynchronous loading, sequencing assets, batching HTTP requests, pipelining, and minification.
All of it obscures a simpler solution.
If you're only going to the corner store, ride a bicycle.
If you're only displaying five sentences of text, use vanilla HTML. Hell, serve a textfile! Then you won't need compression hacks, integral signs, or elaborate Gantt charts of what assets load in what order.
Browsers are really, really good at rendering vanilla HTML.
We have the technology.

Being a member of the 250KB club is not very surprising as I am already a member of the 512KB club, in particular their "green team", i.e., the team with websites smaller than 100kB uncompressed.

		


Performance Comparison of Lemire Website: WordPress vs. Simplified Saaze
Sun, 14 Jan 2024 20:00:00 +0100

In the previous post Example Theme for Simplified Saaze: Lemire I demonstrated the transition from a website using WordPress to Simplified Saaze. This very blog also uses Simplified Saaze. This post shows how much better performance-wise this transition was. The comparison is therefore between:

Original: WordPress version, lemire.me
Modified: Simplified Saaze version of Lemire

The original website is hosted by SiteGround and Cloudflare. It uses WordPress.
1. Comparison. For the comparison I use the website tools.pingdom.com, which provides various metrics to evaluate the performance of a website:

Page size
Number of requests
Load time
Concrete tips to improve performance
Waterfall diagram of requests
Breakdown of content types

All tests in Pingdom were conducted for Europe/Frankfurt, as I host all stuff on below machine in my living room not far from Frankfurt.

The post in question is Fast integer compression with Stream VByte on ARM Neon processors.
The version using Simplified Saaze is here.
This post has no comments, therefore the WordPress site has no disadvantage against the Simplified Saaze powered site.
This post contains C code shown in syntax-highlighted form.
The results are thus:



Original (WordPress)
Modified (Simplified Saaze)









The results for the original website, based on WordPress, are indeed worse on every dimension: page size, load time, number of requests. In comparison to the modified version using Simplified Saaze the ratio is roughly:

Page size is more than 4:1
Load time is almost 3:1
Number of requests is 4:1

So Simplified Saaze is better in all dimensions by a factor. This is particularly striking as the Simplified Saaze version is entirely self-hosted, i.e., upload to the internet is limited to 50 MBit/s!
The recommendations for the original website are therefore not overly surprising:

The missing compression is clearly an oversight on the web-server part.
The breakdown of the content type for the original website is:

I uploaded the Simplified Saaze version to Netlify, which provides CDN functionality.
I measured again the WordPress post requested from San Francisco, and the Simplified Saaze version from San Francisco.
The measurements are pretty similar to the Frankfurt results.



Original (WordPress) San Francisco
Modified (Simplified Saaze) San Francisco









2. Modified website. The breakdown of the modified site, based on Simplified Saaze, is as below.

Actual loading of the modified site will roughly follow below waterfall diagram.
This waterfall diagram shows that a major part of the loading time is spent in syntax highlighting (prism.js) and searching (pagefind).
The fonts from Google load in record time.

3. Security considerations. Prof. Lemire's blog had been the target of a hack in 2008: My blog got hacked.
Using a static site this attack could probably have been prevented, assuming HashOver is not affected.
End of 2008 problems still persisted: Need help protecting my blog.
A site using Markdown files as input is easy to backup.
This is way easier to backup than a database.
Just think about any schema changes in the databases during version upgrades.
See Simplified Saaze:

Simplified Saaze works with ordinay files in your filesystem. No database required. This means less setup and maintenance, better security and more speed.

Storing your Markdown files in Git is one option.
4. Caching content. Prof. Lemire reported caching problems:

I estimate that I get somewhere between 30,000 and 50,000 unique visitors a month. Despite my efforts, my blog keeps on failing under the load. It becomes unavailable for hours.

These caching problems would go away with a static site. Obviously.
The static site would handle the "Slashdot effect" quite effectively.

		


Vodafone Internet Outage
Mon, 08 Jan 2024 20:10:00 +0100

Today, 08-Jan-2024, starting at 18:49 (CET), internet provided by Vodafone was unavailable.
I called the hotline of Vodafone and they confirmed that they had a major outage in my region.
This means: my homepage, i.e., this blog, is unavailable.

In 2022 the internet router was defective.
This time Vodafone confirmed that the router is fine. The fault is on their end.
BetterUptime noticed the error in a timely fashion via e-mail, which, of course, I could not read, as I had no internet:
Monitor: eklausmeier.goip.de/…txt
Checked URL: GET https://eklausmeier.goip.de/betterUptime.txt
Cause: Failure when receiving data from the peer

Started at: 8 Jan 2024 at 06:53pm CET

Since around 22:00 (CET) internet is available again.
BetterUptime reported a resolved incident at 22:55 (CET):
Monitor: eklausmeier.goip.de/…txt
Checked URL: GET https://eklausmeier.goip.de/betterUptime.txt
Cause: Failure when receiving data from the peer

Started at: 8 Jan 2024 at 06:53pm CET
Resolved at: 8 Jan 2024 at 10:55pm CET (automatically)
Length: 3 hours and 58 seconds

So overall, betterUptime did a good job here.

Type	URL
Releases	https://github.com/:owner/:repo/releases.atom
Commits	https://github.com/:owner/:repo/commits.atom
Private feed	https://github.com/:user.private.atom?token=:secret
Tags	https://github.com/:user/:repo/tags.atom
User activity	https://github.com/:user.atom

Nr.	Column	type	nullable	Example or meaning
1	email	text	not null	primary key, e.g., Peter.Miller@super.com
2	Firstname	text	null	e.g., Peter
3	Lastname	text	null	e.g., Miller
4	registration	date	not null	date of registration, e.g., 06-Feb-2024
5	IP	text	not null	e.g., 84.119.108.23, IP address of web client during initial subscription
6	status	int	not null	1=in-limbo 2=active 3=inactive 4=bounced during registration 5=bounced
7	token	text	not null	e.g., `uIYkEk+ylks=` computed with `$token = base64_encode(random_bytes(8));`

Nr.	Column	type	nullable	Example or meaning
8	lastRegist	date	null	date of last registration, relevant only for multiple subscriptions for the same e-mail
9	lastIP	text	null	last used IP of the web client, when used for multiple subscriptions

Name	LOC in million
Linux kernel	34.987
Chrome	30.992
PHP	1.814
Apache HTTP Server	1.659
WordPress	1.157
Slurm	0.844
Git	0.580
X server	0.511
bash	0.249
Zola	0.022
Simplified Saaze	0.002