Uploaded image for project: 'Grouper'
  1. Grouper
  2. GRP-2040

Query produced by GrouperDaemonDeleteMultipleCorruption.fixAssign is expensive causing job to fail



    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0, 2.4.0
    • 2.4.0.patch, 2.4.1
    • grouperLoader
    • None
    • Grouper 2.3.0 patches up to 109 applied., probably effects 2.4.0, running ubuntu.


      We are using oracle for the backend.   Our  "Maintenence_cleanLogs" job is "hanging" without error.  I did some debugging, and determine we have a really expensive query running, and the network is killing our connection to oracle, and hibernate isn't detecting the dead connection.  I'm not expecting the team to fix our network connection, but rather look to optimize the query so it returns faster.  I've set the flag in our system to turn off the "fixAssigns" method call in the job, so we have worked around it.

      In the method  fixAssigns startingon line 252, there is a hibernate/jpa query that is really expensive.  I translated it to SQL, and  in our prod database, the cost on the explain plan is around 197,000.  In our test database the cost is around 50,000, and in our dev database, the cost is around 5,000.  The major difference between environments is prod has alot more groups that test, and test has more groups that dev.  

      The offending query starts on line 267.

      List<Object[]> listOfObjects = HibernateSession.byHqlStatic()
      .createQuery("SELECT distinct gaa, gadn, gad FROM AttributeAssign gaa, "
      + " AttributeDefName gadn, AttributeDef gad WHERE "
      + " gaa.attributeDefNameId = gadn.id AND gadn.attributeDefId = gad.id AND gad.multiAssignableDb = 'F' "
      + " AND EXISTS (SELECT 1 FROM AttributeAssign gaa2 "
      + " WHERE gaa2.id != gaa.id"
      + " AND gaa2.attributeAssignActionId = gaa.attributeAssignActionId"
      + " AND gaa2.attributeDefNameId = gaa.attributeDefNameId"
      + " AND gaa2.attributeAssignTypeDb = gaa.attributeAssignTypeDb"
      + " AND ((gaa2.attributeAssignTypeDb = 'group' AND gaa2.ownerGroupId = gaa.ownerGroupId) "
      + " OR (gaa2.attributeAssignTypeDb = 'stem' AND gaa2.ownerStemId = gaa.ownerStemId) "
      + " OR (gaa2.attributeAssignTypeDb = 'member' AND gaa2.ownerMemberId = gaa.ownerMemberId) "
      + " OR (gaa2.attributeAssignTypeDb = 'attr_def' AND gaa2.ownerAttributeDefId = gaa.ownerAttributeDefId) "
      + " OR (gaa2.attributeAssignTypeDb = 'imm_mem' AND gaa2.ownerMembershipId = gaa.ownerMembershipId) "
      + " OR (gaa2.attributeAssignTypeDb = 'any_mem' AND gaa2.ownerMemberId = gaa.ownerMemberId AND gaa2.ownerGroupId = gaa.ownerGroupId) "
      + " OR (gaa2.attributeAssignTypeDb in ('any_mem_asgn', 'attr_def_asgn', 'group_asgn', 'imm_mem_asgn', 'mem_asgn', 'stem_asgn') AND gaa2.ownerAttributeAssignId = gaa.ownerAttributeAssignId) )"
      + ")")



      It appears the code eventually throw the data into a map, does some processing, and cleans up some bad records.  Playing around with the generated SQL,the cost increases dramatically once more than one of the "OR"'ed records are added.  Would there be a problem breaking this query into about 7 different smaller queries, and aggregate the data?  I suppose another option could be to generate a view that used 'union' clauses instead of the 'OR' clauses.

       I do know just using one of the 'OR'ed clauses at a time, the cost of the query is around 600 in our production environment and returns almost instantly for each one.




            shilen.patel@at.internet2.edu Shilen Patel (duke.edu)
            althor David Malia (Inactive)
            0 Vote for this issue
            1 Start watching this issue



              Smart Checklist