Uploaded image for project: 'Grouper'
  1. Grouper
  2. GRP-2041

PSPNG - Catch, log and rethrow Runtime exceptions

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Won't Fix
    • Minor
    • 2.4.0.patch
    • None
    • provisioning
    • None

    Description

      PSPNG's changelog processing has in the past failed and not logged details, perhaps because only PspExceptions are caught and RunTimeExceptions are passed up through the provisioning harnesses.

      This ticket is requesting additional logging be added to Provisioner.provisionBatchOfitems so that at least some information will be logged about unexpected problems.

       

      An example observation from Scott Koranda, Jan 2017:

      > > ... Did not get all the way through the batch! 88720 != 88721 ...
      >
      > Changelog entry 88721 failed to be processed. It might help to know what that
      > change was, and it might help try to duplicate the problem. Can you query the
      > changelog for some details about it?

      mysql> SELECT * FROM grouper_change_log_entry WHERE sequence_number = 88721\G

                                                          • 1. row ***************************
                                                            change_log_type_id: f677ebad71ab4c0c9cb63f14ad9928c2
                                                                    context_id: 1cf47f5cb1da428b84759f5b01bfa112
                                                                    created_on: 1484182869667000
                                                               sequence_number: 88721
                                                                      string01: 0660320dd7144056842f3b1be2f1856d
                                                                      string02: members
                                                                      string03: I46500
                                                                      string04: RegistryLDAP
                                                                      string05: flattened
                                                                      string06: 686e09db47e14726bb2b976dc25c1c19
                                                                      string07: My Stem:MyPrefix_TM Foo, Bar, Blarf
                                                                      string08: 0028e6b1edbd46a4b8285d4c1c4ef31d
                                                                      string09: 17123756633e40daab179ad09dd89bf6
                                                                      string10: NULL
                                                                      string11: NULL
                                                                      string12: NULL

      (I obfuscated the name of the group of course...)

      > Of course, PSP should have logged something about its failure to process that
      > change; I would have expected an error message to be there before the log lines
      > you sent.

      There are no ERRORs before that in grouper_error.log, nor are there any
      WARN entries.

      > Also, it's a long shot, but might there be some information in the
      > grouper_event log? 

      There are no ERRORs or WARNs in grouper_event.log, and an inspection of
      the time around 2017-01-12 01:01:15 in the grouper_event.log does
      now show anything unusual--just "normal" INFO event log entries.

      > Or search for 88720 and 88721 in the log directory,
      > obviously ignoring the "Did not get all the way through the batch!" messages.

      /var/log/grouperApi# grep 88720 grouper_error.log | grep -v "Did not get all the way through the batch"
      2017-01-12 05:45:30,014: [DefaultQuartzScheduler_Worker-7] INFO  EventLog.info(156) -  - [74202005c5234594b3288720d1205685,'GrouperSystem','application'] session: start (3ms)

      So, nothing in grouper_error.log other than the ERRORs about
      the batch.

      > The "Did not get all the way through the batch! 88720 != 88721" repeated over
      > and over, right?

      Correct, until the restart of the grouper_loader process.

      > If not, then retryOnError might not have been working.
      >
      >
      > Finally, if there aren't any other error messages, then we'll probably have to
      > try to duplicate the problem with more verbose logging.

      We will see if we can reproduce on dev, though since the problem is moving
      from the temp change log to the change log I am not sure how we can
      cause the issue easily other than repeatedly making changes to
      that particular group. We will try that.

      If we can reproduce it we will turn up debugging on the dev tier and send
      you some more information.

      Since a restart of the Grouper loader process made the error state
      disappear, I am assuming it cannot be a problem in the table(s), ie. the
      data in that row is not problematic.

      I should add that the loader had been up for 2 days previously, but before
      that it had been up for about 30 days. So in general the loader is
      running as expected.

      Thanks,

      Scott K

      Attachments

        Activity

          People

            bert.beelindgren@at.internet2.edu Bert Bee-Lindgren (gatech.edu)
            bert.beelindgren@at.internet2.edu Bert Bee-Lindgren (gatech.edu)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: