OpenRadioss 2025.1.11
OpenRadioss project
Loading...
Searching...
No Matches
slaqr6.f
Go to the documentation of this file.
1 SUBROUTINE slaqr6( JOB, WANTT, WANTZ, KACC22, N, KTOP, KBOT,
2 $ NSHFTS, SR, SI, H, LDH, ILOZ, IHIZ, Z, LDZ,
3 $ V, LDV, U, LDU, NV, WV, LDWV, NH, WH, LDWH )
4*
5* Contribution from the Department of Computing Science and HPC2N,
6* Umea University, Sweden
7*
8* -- ScaLAPACK auxiliary routine (version 2.0.2) --
9* Univ. of Tennessee, Univ. of California Berkeley, Univ. of Colorado Denver
10* May 1 2012
11*
12 IMPLICIT NONE
13*
14* .. Scalar Arguments ..
15 CHARACTER JOB
16 INTEGER IHIZ, ILOZ, KACC22, KBOT, KTOP, LDH, LDU, LDV,
17 $ ldwh, ldwv, ldz, n, nh, nshfts, nv
18 LOGICAL WANTT, WANTZ
19* ..
20* .. Array Arguments ..
21 REAL H( LDH, * ), SI( * ), SR( * ), U( LDU, * ),
22 $ V( LDV, * ), WH( LDWH, * ), WV( LDWV, * ),
23 $ z( ldz, * )
24* ..
25*
26* This auxiliary subroutine called by PSLAQR5 performs a
27* single small-bulge multi-shift QR sweep, moving the chain
28* of bulges from top to bottom in the submatrix
29* H(KTOP:KBOT,KTOP:KBOT), collecting the transformations in the
30* matrix HV *or* accumulating the transformations in the matrix
31* Z (see below).
32*
33* This is a modified version of DLAQR5 from LAPACK 3.1.
34*
35* ======================================================================
36*
37* JOB (input) character scalar
38* Set the kind of job to do in SLAQR6, as follows:
39* JOB = 'I': Introduce and chase bulges in submatrix
40* JOB = 'C': Chase bulges from top to bottom of submatrix
41* JOB = 'O': Chase bulges off submatrix
42*
43* WANTT (input) logical scalar
44* WANTT = .true. if the quasi-triangular Schur factor
45* is being computed. WANTT is set to .false. otherwise.
46*
47* WANTZ (input) logical scalar
48* WANTZ = .true. if the orthogonal Schur factor is being
49* computed. WANTZ is set to .false. otherwise.
50*
51* KACC22 (input) integer with value 0, 1, or 2.
52* Specifies the computation mode of far-from-diagonal
53* orthogonal updates.
54* = 0: SLAQR6 does not accumulate reflections and does not
55* use matrix-matrix multiply to update far-from-diagonal
56* matrix entries.
57* = 1: SLAQR6 accumulates reflections and uses matrix-matrix
58* multiply to update the far-from-diagonal matrix entries.
59* = 2: SLAQR6 accumulates reflections, uses matrix-matrix
60* multiply to update the far-from-diagonal matrix entries,
61* and takes advantage of 2-by-2 block structure during
62* matrix multiplies.
63*
64* N (input) integer scalar
65* N is the order of the Hessenberg matrix H upon which this
66* subroutine operates.
67*
68* KTOP (input) integer scalar
69* KBOT (input) integer scalar
70* These are the first and last rows and columns of an
71* isolated diagonal block upon which the QR sweep is to be
72* applied. It is assumed without a check that
73* either KTOP = 1 or H(KTOP,KTOP-1) = 0
74* and
75* either KBOT = N or H(KBOT+1,KBOT) = 0.
76*
77* NSHFTS (input) integer scalar
78* NSHFTS gives the number of simultaneous shifts. NSHFTS
79* must be positive and even.
80*
81* SR (input) REAL array of size (NSHFTS)
82* SI (input) REAL array of size (NSHFTS)
83* SR contains the real parts and SI contains the imaginary
84* parts of the NSHFTS shifts of origin that define the
85* multi-shift QR sweep.
86*
87* H (input/output) REAL array of size (LDH,N)
88* On input H contains a Hessenberg matrix. On output a
89* multi-shift QR sweep with shifts SR(J)+i*SI(J) is applied
90* to the isolated diagonal block in rows and columns KTOP
91* through KBOT.
92*
93* LDH (input) integer scalar
94* LDH is the leading dimension of H just as declared in the
95* calling procedure. LDH.GE.MAX(1,N).
96*
97* ILOZ (input) INTEGER
98* IHIZ (input) INTEGER
99* Specify the rows of Z to which transformations must be
100* applied if WANTZ is .TRUE.. 1 .LE. ILOZ .LE. IHIZ .LE. N
101*
102* Z (input/output) REAL array of size (LDZ,IHI)
103* If WANTZ = .TRUE., then the QR Sweep orthogonal
104* similarity transformation is accumulated into
105* Z(ILOZ:IHIZ,ILO:IHI) from the right.
106* If WANTZ = .FALSE., then Z is unreferenced.
107*
108* LDZ (input) integer scalar
109* LDA is the leading dimension of Z just as declared in
110* the calling procedure. LDZ.GE.N.
111*
112* V (workspace) REAL array of size (LDV,NSHFTS/2)
113*
114* LDV (input) integer scalar
115* LDV is the leading dimension of V as declared in the
116* calling procedure. LDV.GE.3.
117*
118* U (workspace) REAL array of size
119* (LDU,3*NSHFTS-3)
120*
121* LDU (input) integer scalar
122* LDU is the leading dimension of U just as declared in the
123* in the calling subroutine. LDU.GE.3*NSHFTS-3.
124*
125* NH (input) integer scalar
126* NH is the number of columns in array WH available for
127* workspace. NH.GE.1 is required for usage of this
128* workspace, otherwise the updates of the far-from-diagonal
129* elements will be updated without level 3 BLAS.
130*
131* WH (workspace) REAL array of size (LDWH,NH)
132*
133* LDWH (input) integer scalar
134* Leading dimension of WH just as declared in the
135* calling procedure. LDWH.GE.3*NSHFTS-3.
136*
137* NV (input) integer scalar
138* NV is the number of rows in WV agailable for workspace.
139* NV.GE.1 is required for usage of this
140* workspace, otherwise the updates of the far-from-diagonal
141* elements will be updated without level 3 BLAS.
142*
143* WV (workspace) REAL array of size
144* (LDWV,3*NSHFTS-3)
145*
146* LDWV (input) integer scalar
147* LDWV is the leading dimension of WV as declared in the
148* in the calling subroutine. LDWV.GE.NV.
149*
150*
151* ================================================================
152* Based on contributions by
153* Karen Braman and Ralph Byers, Department of Mathematics,
154* University of Kansas, USA
155*
156* Robert Granat, Department of Computing Science and HPC2N,
157* Umea University, Sweden
158*
159* ============================================================
160* Reference:
161*
162* K. Braman, R. Byers and R. Mathias, The Multi-Shift QR
163* Algorithm Part I: Maintaining Well Focused Shifts, and
164* Level 3 Performance, SIAM Journal of Matrix Analysis,
165* volume 23, pages 929--947, 2002.
166*
167* ============================================================
168* .. Parameters ..
169 REAL ZERO, ONE
170 PARAMETER ( ZERO = 0.0e0, one = 1.0e0 )
171* ..
172* .. Local Scalars ..
173 REAL ALPHA, BETA, H11, H12, H21, H22, REFSUM,
174 $ SAFMAX, SAFMIN, SCL, SMLNUM, SWAP, TST1, TST2,
175 $ ulp
176 INTEGER I, I2, I4, INCOL, J, J2, J4, JBOT, JCOL, JLEN,
177 $ JROW, JTOP, K, K1, KDU, KMS, KNZ, KRCOL, KZS,
178 $ m, m22, mbot, mend, mstart, mtop, nbmps, ndcol,
179 $ ns, nu, sincol, eincol, uincol, iphv, chunk,
180 $ threads, jlen2, jcol2, gchunk, jrow2, maxchunk
181 LOGICAL ACCUM, BLK22, BMP22, INTRO, CHASE, OFF, ALL
182* ..
183* .. External Functions ..
184 LOGICAL LSAME
185 INTEGER PILAENVX
186 REAL SLAMCH
187 EXTERNAL lsame, slamch, pilaenvx
188* ..
189* .. Intrinsic Functions ..
190*
191 INTRINSIC abs, float, max, min, mod
192* ..
193* .. Local Arrays ..
194 REAL VT( 3 )
195* ..
196* .. External Subroutines ..
197 EXTERNAL sgemm, slabad, slamov, slaqr1, slarfg, slaset,
198 $ strmm
199* ..
200* .. Executable Statements ..
201*
202* ==== If there are no shifts, then there is nothing to do. ====
203*
204 IF( nshfts.LT.2 )
205 $ RETURN
206*
207* ==== If the active block is empty or 1-by-1, then there
208* . is nothing to do. ====
209*
210 IF( ktop.GE.kbot )
211 $ RETURN
212 threads = 1
213*
214* ==== Shuffle shifts into pairs of real shifts and pairs
215* . of complex conjugate shifts assuming complex
216* . conjugate shifts are already adjacent to one
217* . another. ====
218*
219 DO 10 i = 1, nshfts - 2, 2
220 IF( si( i ).NE.-si( i+1 ) ) THEN
221*
222 swap = sr( i )
223 sr( i ) = sr( i+1 )
224 sr( i+1 ) = sr( i+2 )
225 sr( i+2 ) = swap
226*
227 swap = si( i )
228 si( i ) = si( i+1 )
229 si( i+1 ) = si( i+2 )
230 si( i+2 ) = swap
231 END IF
232 10 CONTINUE
233*
234* ==== NSHFTS is supposed to be even, but if it is odd,
235* . then simply reduce it by one. The shuffle above
236* . ensures that the dropped shift is real and that
237* . the remaining shifts are paired. ====
238*
239 ns = nshfts - mod( nshfts, 2 )
240*
241* ==== Machine constants for deflation ====
242*
243 safmin = slamch( 'SAFE MINIMUM' )
244 safmax = one / safmin
245 CALL slabad( safmin, safmax )
246 ulp = slamch( 'PRECISION' )
247 smlnum = safmin*( float( n ) / ulp )
248*
249* ==== Use accumulated reflections to update far-from-diagonal
250* . entries ? This is only performed if both NH and NV is
251* greater than 1. ====
252*
253 accum = ( kacc22.EQ.1 ) .OR. ( kacc22.EQ.2 )
254 accum = accum .AND. nh.GE.1 .AND. nv.GE.1
255*
256* ==== If so, exploit the 2-by-2 block structure? ====
257*
258 blk22 = ( ns.GT.2 ) .AND. ( kacc22.EQ.2 )
259*
260* ==== Decode JOB ====
261*
262 all = lsame( job, 'A' )
263 IF( .NOT. all )
264 $ intro = lsame( job, 'I' )
265 IF( .NOT. all .AND. .NOT. intro )
266 $ chase = lsame( job, 'C' )
267 IF( .NOT. all .AND. .NOT. intro .AND. .NOT. chase ) THEN
268 off = lsame( job, 'o' )
269.NOT. IF( OFF )
270 $ RETURN
271 END IF
272*
273* ==== clear trash ====
274*
275.OR..AND..LE. IF( INTROALL KTOP+2KBOT )
276 $ H( KTOP+2, KTOP ) = ZERO
277*
278* ==== NBMPS = number of 2-shift bulges in the chain ====
279*
280 NBMPS = NS / 2
281*
282* ==== KDU = width of slab ====
283*
284 KDU = 6*NBMPS - 3
285*
286* Set loop limits for bulge-chasing depending on working mode
287*
288 IF( ALL ) THEN
289 SINCOL = 3*( 1-NBMPS ) + KTOP - 1
290 EINCOL = KBOT - 2
291 UINCOL = 3*NBMPS - 2
292 ELSEIF( INTRO ) THEN
293 SINCOL = 3*( 1-NBMPS ) + KTOP - 1
294 EINCOL = KBOT - 3*NBMPS - 1
295 UINCOL = 3*NBMPS - 2
296 ELSEIF( CHASE ) THEN
297 SINCOL = KTOP
298 EINCOL = KBOT - 3*NBMPS - 1
299 UINCOL = 3*NBMPS - 2
300 ELSEIF( OFF ) THEN
301 SINCOL = KTOP
302 EINCOL = KBOT - 2
303 UINCOL = 3*NBMPS - 2
304 END IF
305 IPHV = 0
306*
307* ==== Create and/or chase chains of NBMPS bulges ====
308*
309 DO 220 INCOL = SINCOL, EINCOL, UINCOL
310 NDCOL = MIN( INCOL + KDU, EINCOL )
311 IF( ACCUM )
312 $ CALL SLASET( 'all', KDU, KDU, ZERO, ONE, U, LDU )
313*
314* ==== Near-the-diagonal bulge chase. The following loop
315* . performs the near-the-diagonal part of a small bulge
316* . multi-shift QR sweep. Each 6*NBMPS-2 column diagonal
317* . chunk extends from column INCOL to column NDCOL
318* . (including both column INCOL and column NDCOL). The
319* . following loop chases a 3*NBMPS column long chain of
320* . NBMPS bulges 3*NBMPS-2 columns to the right. (INCOL
321* . may be less than KTOP and and NDCOL may be greater than
322* . KBOT indicating phantom columns from which to chase
323* . bulges before they are actually introduced or to which
324* . to chase bulges beyond column KBOT.) ====
325*
326 DO 150 KRCOL = INCOL, MIN( EINCOL, INCOL+3*NBMPS-3, KBOT-2 )
327*
328* ==== Bulges number MTOP to MBOT are active double implicit
329* . shift bulges. There may or may not also be small
330* . 2-by-2 bulge, if there is room. The inactive bulges
331* . (if any) must wait until the active bulges have moved
332* . down the diagonal to make room. The phantom matrix
333* . paradigm described above helps keep track. ====
334*
335 MTOP = MAX( 1, ( ( KTOP-1 )-KRCOL+2 ) / 3+1 )
336 MBOT = MIN( NBMPS, ( KBOT-KRCOL ) / 3 )
337 M22 = MBOT + 1
338.LT..AND..EQ. BMP22 = ( MBOTNBMPS ) ( KRCOL+3*( M22-1 ) )
339 $ ( KBOT-2 )
340*
341* ==== Generate reflections to chase the chain right
342* . one column. (The minimum value of K is KTOP-1.) ====
343*
344 DO 20 M = MTOP, MBOT
345 K = KRCOL + 3*( M-1 )
346.EQ. IF( KKTOP-1 ) THEN
347 CALL SLAQR1( 3, H( KTOP, KTOP ), LDH, SR( 2*M-1 ),
348 $ SI( 2*M-1 ), SR( 2*M ), SI( 2*M ),
349 $ V( 1, M ) )
350 ALPHA = V( 1, M )
351 CALL SLARFG( 3, ALPHA, V( 2, M ), 1, V( 1, M ) )
352 ELSE
353 BETA = H( K+1, K )
354 V( 2, M ) = H( K+2, K )
355 V( 3, M ) = H( K+3, K )
356 CALL SLARFG( 3, BETA, V( 2, M ), 1, V( 1, M ) )
357*
358* ==== A Bulge may collapse because of vigilant
359* . deflation or destructive underflow. In the
360* . underflow case, try the two-small-subdiagonals
361* . trick to try to reinflate the bulge. ====
362*
363.NE..OR..NE. IF( H( K+3, K )ZERO H( K+3, K+1 )
364.OR..EQ. $ ZERO H( K+3, K+2 )ZERO ) THEN
365*
366* ==== Typical case: not collapsed (yet). ====
367*
368 H( K+1, K ) = BETA
369 H( K+2, K ) = ZERO
370 H( K+3, K ) = ZERO
371 ELSE
372*
373* ==== Atypical case: collapsed. Attempt to
374* . reintroduce ignoring H(K+1,K) and H(K+2,K).
375* . If the fill resulting from the new
376* . reflector is too large, then abandon it.
377* . Otherwise, use the new one. ====
378*
379 CALL SLAQR1( 3, H( K+1, K+1 ), LDH, SR( 2*M-1 ),
380 $ SI( 2*M-1 ), SR( 2*M ), SI( 2*M ),
381 $ VT )
382 ALPHA = VT( 1 )
383 CALL SLARFG( 3, ALPHA, VT( 2 ), 1, VT( 1 ) )
384 REFSUM = VT( 1 )*( H( K+1, K )+VT( 2 )*
385 $ H( K+2, K ) )
386*
387 IF( ABS( H( K+2, K )-REFSUM*VT( 2 ) )+
388.GT. $ ABS( REFSUM*VT( 3 ) )ULP*
389 $ ( ABS( H( K, K ) )+ABS( H( K+1,
390 $ K+1 ) )+ABS( H( K+2, K+2 ) ) ) ) THEN
391*
392* ==== Starting a new bulge here would
393* . create non-negligible fill. Use
394* . the old one with trepidation. ====
395*
396 H( K+1, K ) = BETA
397 H( K+2, K ) = ZERO
398 H( K+3, K ) = ZERO
399 ELSE
400*
401* ==== Stating a new bulge here would
402* . create only negligible fill.
403* . Replace the old reflector with
404* . the new one. ====
405*
406 H( K+1, K ) = H( K+1, K ) - REFSUM
407 H( K+2, K ) = ZERO
408 H( K+3, K ) = ZERO
409 V( 1, M ) = VT( 1 )
410 V( 2, M ) = VT( 2 )
411 V( 3, M ) = VT( 3 )
412 END IF
413 END IF
414 END IF
415 20 CONTINUE
416*
417* ==== Generate a 2-by-2 reflection, if needed. ====
418*
419 K = KRCOL + 3*( M22-1 )
420 IF( BMP22 ) THEN
421.EQ. IF( KKTOP-1 ) THEN
422 CALL SLAQR1( 2, H( K+1, K+1 ), LDH, SR( 2*M22-1 ),
423 $ SI( 2*M22-1 ), SR( 2*M22 ), SI( 2*M22 ),
424 $ V( 1, M22 ) )
425 BETA = V( 1, M22 )
426 CALL SLARFG( 2, BETA, V( 2, M22 ), 1, V( 1, M22 ) )
427 ELSE
428 BETA = H( K+1, K )
429 V( 2, M22 ) = H( K+2, K )
430 CALL SLARFG( 2, BETA, V( 2, M22 ), 1, V( 1, M22 ) )
431 H( K+1, K ) = BETA
432 H( K+2, K ) = ZERO
433 END IF
434 ELSE
435*
436* ==== Initialize V(1,M22) here to avoid possible undefined
437* . variable problems later. ====
438*
439 V( 1, M22 ) = ZERO
440 END IF
441*
442* ==== Multiply H by reflections from the left ====
443*
444 IF( ACCUM ) THEN
445 JBOT = MIN( MAX(INCOL+KDU,NDCOL), KBOT )
446 ELSE IF( WANTT ) THEN
447 JBOT = N
448 ELSE
449 JBOT = KBOT
450 END IF
451 DO 40 J = MAX( KTOP, KRCOL ), JBOT
452 MEND = MIN( MBOT, ( J-KRCOL+2 ) / 3 )
453 DO 30 M = MTOP, MEND
454 K = KRCOL + 3*( M-1 )
455 REFSUM = V( 1, M )*( H( K+1, J )+V( 2, M )*
456 $ H( K+2, J )+V( 3, M )*H( K+3, J ) )
457 H( K+1, J ) = H( K+1, J ) - REFSUM
458 H( K+2, J ) = H( K+2, J ) - REFSUM*V( 2, M )
459 H( K+3, J ) = H( K+3, J ) - REFSUM*V( 3, M )
460 30 CONTINUE
461 40 CONTINUE
462 IF( BMP22 ) THEN
463 K = KRCOL + 3*( M22-1 )
464 DO 50 J = MAX( K+1, KTOP ), JBOT
465 REFSUM = V( 1, M22 )*( H( K+1, J )+V( 2, M22 )*
466 $ H( K+2, J ) )
467 H( K+1, J ) = H( K+1, J ) - REFSUM
468 H( K+2, J ) = H( K+2, J ) - REFSUM*V( 2, M22 )
469 50 CONTINUE
470 END IF
471*
472* ==== Multiply H by reflections from the right.
473* . Delay filling in the last row until the
474* . vigilant deflation check is complete. ====
475*
476 IF( ACCUM ) THEN
477 JTOP = MAX( KTOP, INCOL )
478 ELSE IF( WANTT ) THEN
479 JTOP = 1
480 ELSE
481 JTOP = KTOP
482 END IF
483 DO 90 M = MTOP, MBOT
484.NE. IF( V( 1, M )ZERO ) THEN
485 K = KRCOL + 3*( M-1 )
486 DO 60 J = JTOP, MIN( KBOT, K+3 )
487 REFSUM = V( 1, M )*( H( J, K+1 )+V( 2, M )*
488 $ H( J, K+2 )+V( 3, M )*H( J, K+3 ) )
489 H( J, K+1 ) = H( J, K+1 ) - REFSUM
490 H( J, K+2 ) = H( J, K+2 ) - REFSUM*V( 2, M )
491 H( J, K+3 ) = H( J, K+3 ) - REFSUM*V( 3, M )
492 60 CONTINUE
493*
494 IF( ACCUM ) THEN
495*
496* ==== Accumulate U. (If necessary, update Z later
497* . with with an efficient matrix-matrix
498* . multiply.) ====
499*
500 KMS = K - INCOL
501 DO 70 J = MAX( 1, KTOP-INCOL ), KDU
502 REFSUM = V( 1, M )*( U( J, KMS+1 )+V( 2, M )*
503 $ U( J, KMS+2 )+V( 3, M )*U( J, KMS+3 ) )
504 U( J, KMS+1 ) = U( J, KMS+1 ) - REFSUM
505 U( J, KMS+2 ) = U( J, KMS+2 ) - REFSUM*V( 2, M )
506 U( J, KMS+3 ) = U( J, KMS+3 ) - REFSUM*V( 3, M )
507 70 CONTINUE
508 ELSE IF( WANTZ ) THEN
509*
510* ==== U is not accumulated, so update Z
511* . now by multiplying by reflections
512* . from the right. ====
513*
514 DO 80 J = ILOZ, IHIZ
515 REFSUM = V( 1, M )*( Z( J, K+1 )+V( 2, M )*
516 $ Z( J, K+2 )+V( 3, M )*Z( J, K+3 ) )
517 Z( J, K+1 ) = Z( J, K+1 ) - REFSUM
518 Z( J, K+2 ) = Z( J, K+2 ) - REFSUM*V( 2, M )
519 Z( J, K+3 ) = Z( J, K+3 ) - REFSUM*V( 3, M )
520 80 CONTINUE
521 END IF
522 END IF
523 90 CONTINUE
524*
525* ==== Special case: 2-by-2 reflection (if needed) ====
526*
527 K = KRCOL + 3*( M22-1 )
528 IF( BMP22 ) THEN
529.NE. IF( V( 1, M22 )ZERO ) THEN
530 DO 100 J = JTOP, MIN( KBOT, K+3 )
531 REFSUM = V( 1, M22 )*( H( J, K+1 )+V( 2, M22 )*
532 $ H( J, K+2 ) )
533 H( J, K+1 ) = H( J, K+1 ) - REFSUM
534 H( J, K+2 ) = H( J, K+2 ) - REFSUM*V( 2, M22 )
535 100 CONTINUE
536*
537 IF( ACCUM ) THEN
538 KMS = K - INCOL
539 DO 110 J = MAX( 1, KTOP-INCOL ), KDU
540 REFSUM = V( 1, M22 )*( U( J, KMS+1 ) +
541 $ V( 2, M22 )*U( J, KMS+2 ) )
542 U( J, KMS+1 ) = U( J, KMS+1 ) - REFSUM
543 U( J, KMS+2 ) = U( J, KMS+2 ) -
544 $ REFSUM*V( 2, M22 )
545 110 CONTINUE
546 ELSE IF( WANTZ ) THEN
547 DO 120 J = ILOZ, IHIZ
548 REFSUM = V( 1, M22 )*( Z( J, K+1 )+V( 2, M22 )*
549 $ Z( J, K+2 ) )
550 Z( J, K+1 ) = Z( J, K+1 ) - REFSUM
551 Z( J, K+2 ) = Z( J, K+2 ) - REFSUM*V( 2, M22 )
552 120 CONTINUE
553 END IF
554 END IF
555 END IF
556*
557* ==== Vigilant deflation check ====
558*
559 MSTART = MTOP
560.LT. IF( KRCOL+3*( MSTART-1 )KTOP )
561 $ MSTART = MSTART + 1
562 MEND = MBOT
563 IF( BMP22 )
564 $ MEND = MEND + 1
565.EQ. IF( KRCOLKBOT-2 )
566 $ MEND = MEND + 1
567 DO 130 M = MSTART, MEND
568 K = MIN( KBOT-1, KRCOL+3*( M-1 ) )
569*
570* ==== The following convergence test requires that
571* . the tradition small-compared-to-nearby-diagonals
572* . criterion and the Ahues & Tisseur (LAWN 122, 1997)
573* . criteria both be satisfied. The latter improves
574* . accuracy in some examples. Falling back on an
575* . alternate convergence criterion when TST1 or TST2
576* . is zero (as done here) is traditional but probably
577* . unnecessary. ====
578*
579.NE. IF( H( K+1, K )ZERO ) THEN
580 TST1 = ABS( H( K, K ) ) + ABS( H( K+1, K+1 ) )
581.EQ. IF( TST1ZERO ) THEN
582.GE. IF( KKTOP+1 )
583 $ TST1 = TST1 + ABS( H( K, K-1 ) )
584.GE. IF( KKTOP+2 )
585 $ TST1 = TST1 + ABS( H( K, K-2 ) )
586.GE. IF( KKTOP+3 )
587 $ TST1 = TST1 + ABS( H( K, K-3 ) )
588.LE. IF( KKBOT-2 )
589 $ TST1 = TST1 + ABS( H( K+2, K+1 ) )
590.LE. IF( KKBOT-3 )
591 $ TST1 = TST1 + ABS( H( K+3, K+1 ) )
592.LE. IF( KKBOT-4 )
593 $ TST1 = TST1 + ABS( H( K+4, K+1 ) )
594 END IF
595.LE. IF( ABS( H( K+1, K ) )MAX( SMLNUM, ULP*TST1 ) )
596 $ THEN
597 H12 = MAX( ABS( H( K+1, K ) ), ABS( H( K, K+1 ) ) )
598 H21 = MIN( ABS( H( K+1, K ) ), ABS( H( K, K+1 ) ) )
599 H11 = MAX( ABS( H( K+1, K+1 ) ),
600 $ ABS( H( K, K )-H( K+1, K+1 ) ) )
601 H22 = MIN( ABS( H( K+1, K+1 ) ),
602 $ ABS( H( K, K )-H( K+1, K+1 ) ) )
603 SCL = H11 + H12
604 TST2 = H22*( H11 / SCL )
605*
606.EQ..OR..LE. IF( TST2ZERO H21*( H12 / SCL )
607 $ MAX( SMLNUM, ULP*TST2 ) )H( K+1, K ) = ZERO
608 END IF
609 END IF
610 130 CONTINUE
611*
612* ==== Fill in the last row of each bulge. ====
613*
614 MEND = MIN( NBMPS, ( KBOT-KRCOL-1 ) / 3 )
615 DO 140 M = MTOP, MEND
616 K = KRCOL + 3*( M-1 )
617 REFSUM = V( 1, M )*V( 3, M )*H( K+4, K+3 )
618 H( K+4, K+1 ) = -REFSUM
619 H( K+4, K+2 ) = -REFSUM*V( 2, M )
620 H( K+4, K+3 ) = H( K+4, K+3 ) - REFSUM*V( 3, M )
621 140 CONTINUE
622*
623* ==== End of near-the-diagonal bulge chase. ====
624*
625 150 CONTINUE
626*
627* ==== Use U (if accumulated) to update far-from-diagonal
628* . entries in H. If required, use U to update Z as
629* . well. ====
630*
631 IF( ACCUM ) THEN
632 IF( WANTT ) THEN
633 JTOP = 1
634 JBOT = N
635 ELSE
636 JTOP = KTOP
637 JBOT = KBOT
638 END IF
639 K1 = MAX( 1, KTOP-INCOL )
640 NU = ( KDU-MAX( 0, MAX(INCOL+KDU,NDCOL)-KBOT ) ) - K1 + 1
641.NOT..OR..LT..OR. IF( ( BLK22 ) ( INCOLKTOP )
642.GT..OR..LE..OR. $ ( NDCOLKBOT ) ( NS2 )
643.LT. $ NUKDU ) THEN
644*
645* ==== Updates not exploiting the 2-by-2 block
646* . structure of U. K1 and NU keep track of
647* . the location and size of U in the special
648* . cases of introducing bulges and chasing
649* . bulges off the bottom. In these special
650* . cases and in case the number of shifts
651* . is NS = 2, there is no 2-by-2 block
652* . structure to exploit. ====
653*
654* ==== Horizontal Multiply ====
655*
656 DO 160 JCOL = MIN(MAX(INCOL+KDU,NDCOL),KBOT)+ 1, JBOT, NH
657 JLEN = MIN( NH, JBOT-JCOL+1 )
658 CALL SGEMM( 'c', 'n', NU, JLEN, NU, ONE, U( K1, K1 ),
659 $ LDU, H( INCOL+K1, JCOL ), LDH, ZERO, WH,
660 $ LDWH )
661 CALL SLAMOV( 'all', NU, JLEN, WH, LDWH,
662 $ H( INCOL+K1, JCOL ), LDH )
663 160 CONTINUE
664*
665* ==== Vertical multiply ====
666*
667 DO 170 JROW = JTOP, MAX( KTOP, INCOL ) - 1, NV
668 JLEN = MIN( NV, MAX( KTOP, INCOL )-JROW )
669 CALL SGEMM( 'n', 'n', JLEN, NU, NU, ONE,
670 $ H( JROW, INCOL+K1 ), LDH, U( K1, K1 ),
671 $ LDU, ZERO, WV, LDWV )
672 CALL SLAMOV( 'all', JLEN, NU, WV, LDWV,
673 $ H( JROW, INCOL+K1 ), LDH )
674 170 CONTINUE
675*
676* ==== Z multiply (also vertical) ====
677*
678 IF( WANTZ ) THEN
679 DO 180 JROW = ILOZ, IHIZ, NV
680 JLEN = MIN( NV, IHIZ-JROW+1 )
681 CALL SGEMM( 'n', 'n', JLEN, NU, NU, ONE,
682 $ Z( JROW, INCOL+K1 ), LDZ, U( K1, K1 ),
683 $ LDU, ZERO, WV, LDWV )
684 CALL SLAMOV( 'all', JLEN, NU, WV, LDWV,
685 $ Z( JROW, INCOL+K1 ), LDZ )
686 180 CONTINUE
687 END IF
688 ELSE
689*
690* ==== Updates exploiting U's 2-by-2 block structure.
691* . (I2, I4, J2, J4 are the last rows and columns
692* . of the blocks.) ====
693*
694 I2 = ( KDU+1 ) / 2
695 I4 = KDU
696 J2 = I4 - I2
697 J4 = KDU
698*
699* ==== KZS and KNZ deal with the band of zeros
700* . along the diagonal of one of the triangular
701* . blocks. ====
702*
703 KZS = ( J4-J2 ) - ( NS+1 )
704 KNZ = NS + 1
705*
706* ==== Horizontal multiply ====
707*
708 DO 190 JCOL = MIN(MAX(INCOL+KDU,NDCOL),KBOT)+ 1, JBOT, NH
709 JLEN = MIN( NH, JBOT-JCOL+1 )
710*
711* ==== Copy bottom of H to top+KZS of scratch ====
712* (The first KZS rows get multiplied by zero.) ====
713*
714 CALL SLAMOV( 'all', KNZ, JLEN, H( INCOL+1+J2, JCOL ),
715 $ LDH, WH( KZS+1, 1 ), LDWH )
716 CALL SLASET( 'all', KZS, JLEN, ZERO, ZERO, WH, LDWH )
717*
718* ==== Multiply by U21' ====
719*
720 CALL STRMM( 'l', 'u', 'c', 'n', KNZ, JLEN, ONE,
721 $ U( J2+1, 1+KZS ), LDU, WH( KZS+1, 1 ),
722 $ LDWH )
723*
724* ==== Multiply top of H by U11' ====
725*
726 CALL SGEMM( 'c', 'N', i2, jlen, j2, one, u, ldu,
727 $ h( incol+1, jcol ), ldh, one, wh, ldwh )
728*
729* ==== Copy top of H to bottom of WH ====
730*
731 CALL slamov( 'ALL', j2, jlen, h( incol+1, jcol ), ldh,
732 $ wh( i2+1, 1 ), ldwh )
733*
734* ==== Multiply by U21' ====
735*
736 CALL strmm( 'L', 'L', 'C', 'N', j2, jlen, one,
737 $ u( 1, i2+1 ), ldu, wh( i2+1, 1 ), ldwh )
738*
739* ==== Multiply by U22 ====
740*
741 CALL sgemm( 'C', 'N', i4-i2, jlen, j4-j2, one,
742 $ u( j2+1, i2+1 ), ldu,
743 $ h( incol+1+j2, jcol ), ldh, one,
744 $ wh( i2+1, 1 ), ldwh )
745*
746* ==== Copy it back ====
747*
748 CALL slamov( 'ALL', kdu, jlen, wh, ldwh,
749 $ h( incol+1, jcol ), ldh )
750 190 CONTINUE
751*
752* ==== Vertical multiply ====
753*
754 DO 200 jrow = jtop, max( incol, ktop ) - 1, nv
755 jlen = min( nv, max( incol, ktop )-jrow )
756*
757* ==== Copy right of H to scratch (the first KZS
758* . columns get multiplied by zero) ====
759*
760 CALL slamov( 'ALL', jlen, knz, h( jrow, incol+1+j2 ),
761 $ ldh, wv( 1, 1+kzs ), ldwv )
762 CALL slaset( 'ALL', jlen, kzs, zero, zero, wv, ldwv )
763*
764* ==== Multiply by U21 ====
765*
766 CALL strmm( 'R', 'U', 'N', 'N', jlen, knz, one,
767 $ u( j2+1, 1+kzs ), ldu, wv( 1, 1+kzs ),
768 $ ldwv )
769*
770* ==== Multiply by U11 ====
771*
772 CALL sgemm( 'N', 'N', jlen, i2, j2, one,
773 $ h( jrow, incol+1 ), ldh, u, ldu, one, wv,
774 $ ldwv )
775*
776* ==== Copy left of H to right of scratch ====
777*
778 CALL slamov( 'ALL', jlen, j2, h( jrow, incol+1 ), ldh,
779 $ wv( 1, 1+i2 ), ldwv )
780*
781* ==== Multiply by U21 ====
782*
783 CALL strmm( 'R', 'L', 'N', 'N', jlen, i4-i2, one,
784 $ u( 1, i2+1 ), ldu, wv( 1, 1+i2 ), ldwv )
785*
786* ==== Multiply by U22 ====
787*
788 CALL sgemm( 'N', 'N', jlen, i4-i2, j4-j2, one,
789 $ h( jrow, incol+1+j2 ), ldh,
790 $ u( j2+1, i2+1 ), ldu, one, wv( 1, 1+i2 ),
791 $ ldwv )
792*
793* ==== Copy it back ====
794*
795 CALL slamov( 'ALL', jlen, kdu, wv, ldwv,
796 $ h( jrow, incol+1 ), ldh )
797 200 CONTINUE
798*
799* ==== Multiply Z (also vertical) ====
800*
801 IF( wantz ) THEN
802 DO 210 jrow = iloz, ihiz, nv
803 jlen = min( nv, ihiz-jrow+1 )
804*
805* ==== Copy right of Z to left of scratch (first
806* . KZS columns get multiplied by zero) ====
807*
808 CALL slamov( 'ALL', jlen, knz,
809 $ z( jrow, incol+1+j2 ), ldz,
810 $ wv( 1, 1+kzs ), ldwv )
811*
812* ==== Multiply by U12 ====
813*
814 CALL slaset( 'ALL', jlen, kzs, zero, zero, wv,
815 $ ldwv )
816 CALL strmm( 'R', 'U', 'N', 'N', jlen, knz, one,
817 $ u( j2+1, 1+kzs ), ldu, wv( 1, 1+kzs ),
818 $ ldwv )
819*
820* ==== Multiply by U11 ====
821*
822 CALL sgemm( 'N', 'N', jlen, i2, j2, one,
823 $ z( jrow, incol+1 ), ldz, u, ldu, one,
824 $ wv, ldwv )
825*
826* ==== Copy left of Z to right of scratch ====
827*
828 CALL slamov( 'ALL', jlen, j2, z( jrow, incol+1 ),
829 $ ldz, wv( 1, 1+i2 ), ldwv )
830*
831* ==== Multiply by U21 ====
832*
833 CALL strmm( 'R', 'L', 'N', 'N', jlen, i4-i2, one,
834 $ u( 1, i2+1 ), ldu, wv( 1, 1+i2 ),
835 $ ldwv )
836*
837* ==== Multiply by U22 ====
838*
839 CALL sgemm( 'N', 'N', jlen, i4-i2, j4-j2, one,
840 $ z( jrow, incol+1+j2 ), ldz,
841 $ u( j2+1, i2+1 ), ldu, one,
842 $ wv( 1, 1+i2 ), ldwv )
843*
844* ==== Copy the result back to Z ====
845*
846 CALL slamov( 'ALL', jlen, kdu, wv, ldwv,
847 $ z( jrow, incol+1 ), ldz )
848 210 CONTINUE
849 END IF
850 END IF
851 END IF
852 220 CONTINUE
853*
854* ==== Clear out workspaces and return. ====
855*
856 IF( n.GE.5 )
857 $ CALL slaset( 'Lower', n-4, n-4, zero, zero, h(5,1), ldh )
858*
859* ==== End of SLAQR6 ====
860*
861 END
subroutine slabad(small, large)
SLABAD
Definition slabad.f:74
subroutine slaset(uplo, m, n, alpha, beta, a, lda)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition slaset.f:110
subroutine slaqr1(n, h, ldh, sr1, si1, sr2, si2, v)
SLAQR1 sets a scalar multiple of the first column of the product of 2-by-2 or 3-by-3 matrix H and spe...
Definition slaqr1.f:121
subroutine slarfg(n, alpha, x, incx, tau)
SLARFG generates an elementary reflector (Householder matrix).
Definition slarfg.f:106
subroutine sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
SGEMM
Definition sgemm.f:187
subroutine strmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
STRMM
Definition strmm.f:177
#define min(a, b)
Definition macros.h:20
#define max(a, b)
Definition macros.h:21
subroutine slaqr6(job, wantt, wantz, kacc22, n, ktop, kbot, nshfts, sr, si, h, ldh, iloz, ihiz, z, ldz, v, ldv, u, ldu, nv, wv, ldwv, nh, wh, ldwh)
Definition slaqr6.f:4