A cup of Data Science with pandas

Using NGFS data I have concluded mean pregnant is nearly the same inside family (having more than 1 child) and when comapring diffirent families (first vs other on pairwise globally). Example code for inside family calculation below (38.62 vs 38.52 weeks).

import itertools
import nsfg

preg_map = nsfg.MakePregMap(live)
preg = nsfg.ReadFemPreg()

# get only caseid with more than 1 baby
for caseid, index in preg_map.copy().items():
if len(index) < 2:
preg_map.pop(caseid)

# Calculte mean pregnant length for first baby
first_index_list = list(j[0] for j in list(preg_map.values()))
mean_first = sum(preg.iloc[i].prglngth for i in first_index_list ) / len(first_index_list)
# Calculte mean pregnant length for second and next baby
other_index_list = list(j[1:] for j in list(preg_map.values()))
other_index_list_flattened = list(itertools.chain(*other_index_list))
mean_others = sum(preg.iloc[i].prglngth for i in other_index_list_flattened ) / len(other_index_list_flattened)

print("Mean pregnant lenght for first baby [weeks]: {0}".format(round(mean_first,2)))
print("Mean pregnant lenght for second and next baby [weeks]: {0}".format(round(mean_others,2)))

#Mean pregnant lenght for first baby [weeks]: 38.62 
#Mean pregnant lenght for second and next baby [weeks]: 38.52

			

Concurrent statistics in Oracle 12.1

Concurrent statistics could significally reduce statistics gathering time in database. It works like “global parallel mode” where task is divided into several jobs run across all objects. We know from previous release parameter DEGREE in several procedures/function inside package DBMS_STATS. Parallelism was possible, but only limited to one object at the time. Concurrent statistics bring down this limitation allow to run parallelism on global level. Price is resource usage (as always when we talking about parallel operations). As stated in Oracle blog we have two method for limit resource utilizastion for conccurent statistics -> job_queue_processes or using Resource Manager.
Below I present example configuration with database Resource Manager.
In Oracle 12cR1 they are not enable by default. This is global preference.

 

—  First start with checking if current statistics gathering time is acceptable or we want tune it
select  operation,round(avg( (SYSDATE + (END_TIME – start_time)*86400 – SYSDATE)  ),2) from DBA_OPTSTAT_OPERATIONS  group by  operation;

Enable concurrent statistics
set serveroutput on
set lines 200 pages 9999
begin
dbms_stats.set_global_prefs(‘CONCURRENT’,’ALL’);
end;
/

— Set target above parallel servers will land into queue. Parallel servers will not timeout inside queue (default).
ALTER SYSTEM SET PARALLEL_SERVERS_TARGET = 16;

— Create resource manager plan with limitation
begin
dbms_resource_manager.create_pending_area();
dbms_resource_manager.create_plan(‘CONCURRENT_STATS’, ‘CONCURRENT_STATS’);
dbms_resource_manager.create_plan_directive(
‘CONCURRENT_STATS’,’OTHER_GROUPS’,
‘OTHER_GROUPS directive for concurrent statistics’,
parallel_target_percentage => 50);
dbms_resource_manager.create_plan_directive(
‘CONCURRENT_STATS’,
‘SYS_GROUPS’,
‘SYS_GROUP directive for concurrent statistics’,
parallel_target_percentage => 70);
dbms_resource_manager.submit_pending_area();
end;
/