Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
2.2.0
-
None
Description
From: Andrew Morgan
Sent: Monday, September 15, 2014 7:43 PM
To: David Langenberg
Cc: Chris Hyzer; Erica Lomax; Shilen Patel
Subject: Re: [grouper-users] grouper loader performance
Update:
I have a performance comparison between Oracle and MySQL for (almost) the
same SQL Loader run. The SQL Loader job is populating Grouper with course
groups for Fall term. The source data comes from 2 Oracle views:
COURSE_CREATION_GROUPER_OSU 21986 rows (groups with perms)
STUDENT_ENRL_GROUPER_OSU 148896 rows (memberships)
Due to addIncludeExclude logic, this turns into roughly 110,000 groups in
Grouper.
I ran a refresh of the groups a couple days after the initial loads.
Oracle (about 2hours, 45minutes):
loader ran successfully, inserted 4670 memberships, deleted 924 memberships, total membership count: 148704
started time: 15-SEP-2014 11:55:54
ended time: 15-SEP-2014 14:37:55
MySQL (about 5 hours):
loader ran successfully, inserted 4378 memberships, deleted 666 memberships, total membership count: 148555
started_time: 2014-09-14 09:04:59
ended_time: 2014-09-14 14:12:46
Our Oracle DBA and I monitored the Oracle refresh using Enterprise
Manager. We didn't see any slow queries, so I think we may be hitting the
limits of the hardware and the logic within the Grouper loader.
When I monitored the MySQL refresh, I saw at least one query (already
given below) that was slow. I think that query is executed many thousands
of times, so it probably has a big impact on performance.
I'd like to get your thoughts on optimizing the loader job. This may be a
use case that will always be painful because there are so many groups that
must be checked. Still, it seems like some of the MySQL queries can be
improved.
Thanks,
Andy
On Thu, 11 Sep 2014, Andrew Morgan wrote:
> I now have a third instance of grouper storing it's registry in Oracle, so I
> can run the same SQL Loader job against 3 different databases. I just
> started the Oracle loader job, so I don't have anything to report there yet.
>
> I haven't been in touch with our SQL Server DBA yet to try any tuning
> suggestions. First, I'll ask him to set READ_COMMITTED_SNAPSHOT. Hopefully
> that will fix the deadlock errors I've been seeing in the logs.
>
> I had to stop the SQL loader on MySQL because I was running out of disk space
> on the MySQL server. I plan to add more disk and increase the innoDB buffer
> pool size from 1GB to 4GB this weekend.
>
> While I was running a refresh of the SQL loader on MySQL, I was monitoring
> the queries using "show processlist". One of the queries that showed up
> often was this:
>
> select distinct group0_.id as id20_, group0_.hibernate_version_number as
> hibernate2_20_, group0_.last_membership_change as last3_20_,
> group0_.last_imm_membership_change as last4_20_, group0_.parent_stem as
> parent5_20_, group0_.creator_id as creator6_20_, group0_.create_time as
> create7_20_, group0_.modifier_id as modifier8_20_, group0_.modify_time as
> modify9_20_, group0_.name as name20_, group0_.display_name as display11_20_,
> group0_.extension as extension20_, group0_.display_extension as
> display13_20_, group0_.description as descrip14_20_, group0_.context_id as
> context15_20_, group0_.alternate_name as alternate16_20_,
> group0_.type_of_group as type17_20_, group0_.id_index as id18_20_ from
> grouper_groups group0_ where (group0_.name in
> ('osu:ref:courses:201501:MUP:193:200:instructor_systemOfRecord') or
> group0_.alternate_name in
> ('osu:ref:courses:201501:MUP:193:200:instructor_systemOfRecord') or exists
> (select attributea2_.id from grouper_attribute_assign attributea1_ cross join
> grouper_attribute_assign_value attributea2_ cross join
> grouper_attribute_def_name attributed3_ where group0_.type_of_group='entity'
> and group0_.id=attributea1_.owner_group_id and
> attributea1_.attribute_def_name_id=attributed3_.id and
> attributed3_.name='etc:attribute:entities:entitySubjectIdentifier' and
> attributea2_.attribute_assign_id=attributea1_.id and
> (attributea2_.value_string in
> ('osu:ref:courses:201501:MUP:193:200:instructor_systemOfRecord')))) and
> (group0_.type_of_group in ('entity')) order by group0_.display_name asc;
>
>
> It takes a long time to run (> 1sec), so I ran EXPLAIN on it:
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | id | select_type | table | type | possible_keys | key
> | key_len | ref | rows | Extra |
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | 1 | PRIMARY | group0_ | ALL |
> group_name_idx,group_alternate_name_idx | NULL | NULL
> | NULL | 202896 | Using where; Using filesort |
> | 2 | DEPENDENT SUBQUERY | attributed3_ | const |
> PRIMARY,attribute_def_name_name_idx |
> attribute_def_name_name_idx | 767 | const | 1 |
> |
> | 2 | DEPENDENT SUBQUERY | attributea1_ | ref |
> PRIMARY,attribute_asgn_attr_name_idx,attr_asgn_own_group_idx |
> attribute_asgn_attr_name_idx | 122 | const | 1 | Using where
> |
> | 2 | DEPENDENT SUBQUERY | attributea2_ | ref |
> attribute_val_assign_idx,attribute_val_string_idx |
> attribute_val_assign_idx | 122 | grouper-dev-test.attributea1_.id |
> 1 | Using where |
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
> Sorry for the brutal wordwrap. This query appears to do a full table scan of
> grouper_groups.
>
>
> If there is any particular testing you'd like me to try, just say so!