Blog de l’ETL Informatica PowerCenter: Exploitation de la Repository Database par requêtage

Comme nous l'avons décrit dans notre précédent article composants serveur de PowerCenter, le Repository PowerCenter est une base de données relationnelle gérée par le Repository Service et qui se compose de tables de bases de données qui stockent les métadonnées. Les métadonnées décrivent différents types d'objets, tels que des applications et des transformations, que vous pouvez créer ou modifier en utilisant les clients PowerCenter. Le référentiel stocke également des informations telles que les autorisations pour les utilisateurs et les informations sur les exécutions des traitements.

La base de donnée Repository contient des tables et des vues pré-créées et peut être utilisée à des fins de contrôle et de suivi, notamment dans les situations suivantes :

Développement: Consultations en masse et contrôle des développements

Le chef de projet ou le Team Lead peut s'appuyer sur les tables et les vues du Repository afin de mener des contrôles sur les développements notamment pour vérifier le bon nommage des objets, les commentaires et vérifier le respect des normes de développement et d'optimisation (ex: surcharge des lookups, sorted input...etc).

La méthode la plus pratique est de créer des vues mises à jour en temps réel et consultables directement.

Production: Suivi des traitements

Nous pouvons suivre l'exécution des traitements et voir les statistiques dans le Workflow Monitor, mais ce client n'est pas pratique pour extraire ces informations à des fins de calcul et/ou analyse massifs. Cela reste visuel et la transcription des données sous Excel est pénible.

La BDD du Repository est pratique par le fait que l'on peut extraire les informations des exécutions des workflows et les formater comme on le souhaite sous Excel ou utilisant un outil de reporting à temps réel.

Les équipes de production n'ont pas toujours accès aux Repository ni à l'ensemble des clients PowerCenter et même s'ils ont accès à ces derniers, l'outil Workflow Monitor ne permet pas de récupérer les données dans Excel pour suivre l'exécution et calculer les durées des jobs. D'où la solution qui consiste à mettre en place des Reportings dédiés (Business Objects, autres outils de reporting ou même Toad) qui récupèrent les données dans une base de réplication (pour se dissocier du Repository) constituée à partir des tables du Repository, et qui seront formatés dans un "beau" rapport web avec, pourquoi pas, des invites et accessible à tous les membres de l'équipe.

Exemples de requêtes :

- Lister tous les mapping et leurs propriétés en effectuant un simple select sur la vue REP_ALL_MAPPINGS.

- Lister les Folders du Repository, les utilisateurs propriétaires et l'information sur le partage du Folder:
SELECT TSB.SUBJ_NAME FOLDER,
TSB.SUBJ_ID SUBJECT_ID,
TSB.PERMISSIONS PERMISSIONS,
RUS.USER_ID,
RUS.USER_NAME,
TSB.IS_SHARED
FROM REP_USERS RUS, --- c'est une vue
OPB_SUBJECT TSB      --- c'est une table
WHERE TSB.OWNER_ID = RUS.USER_ID

- Lister les objets transformations, tâches, mapplets et mappings et voir les commentaires associés:
SELECT SUB.SUBJ_NAME ,
'TRANSFORMATION',
RAT.WIDGET_ID,
RAT.WIDGET_NAME,
RAT.WIDGET_DESCRIPTION
FROM REP_ALL_TRANSFORMS RAT, --- c'est une vue
OPB_SUBJECT SUB --- c'est une table
WHERE SUB.SUBJ_ID = RAT.SUBJECT_ID
UNION
SELECT SUB.SUBJ_NAME ,
'TASK',
RATK.TASK_ID,
RATK.TASK_NAME,
RATK.DESCRIPTION
FROM REP_ALL_TASKS RATK, OPB_SUBJECT SUB
WHERE SUB.SUBJ_ID   = RATK.SUBJECT_ID
AND RATK.TASK_TYPE <> 62
UNION
SELECT SUB.SUBJ_NAME ,
'MAPPLET',
RAMP.MAPPLET_ID,
RAMP.MAPPLET_NAME,
RAMP.MAPPLET_DESCRIPTION
FROM REP_ALL_MAPPLETS RAMP, OPB_SUBJECT SUB
WHERE SUB.SUBJ_ID = RAMP.SUBJECT_ID
UNION
SELECT SUB.SUBJ_NAME ,
'MAPPING',
RAM.MAPPING_ID,
RAM.MAPPING_NAME,
RAM.MAPPING_DESCRIPTION
FROM REP_ALL_MAPPINGS RAM, OPB_SUBJECT SUB
WHERE SUB.SUBJ_ID = RAM.SUBJECT_ID;

- Identifier les valeurs actuelles (Current Value) des séquences réutilisables :
SELECT a.widget_id,
b.widget_name,
a.attr_value,
b.subject_id
FROM OPB_WIDGET_ATTR a, opb_widget b
WHERE a.widget_id=b.widget_id
AND a.attr_id    =4      --- Current Value
AND b.is_reusable=1   --- est réutilisable
AND b.widget_type=7 --- Séquence
ORDER BY 1;

- Extraire les dates de début/fin, le statut et les messages d'erreurs d'une tâche ou un workflow:

SELECT OPB_SUBJECT.SUBJ_NAME SUBJECT_AREA,
OPB_WFLOW_RUN.WORKFLOW_NAME,
OPB_TASK_INST_RUN.WORKFLOW_RUN_ID,
OPB_TASK_INST_RUN.INSTANCE_NAME,
OPB_TASK_INST_RUN.TASK_ID,
OPB_TASK_INST_RUN.TASK_TYPE,
to_char(OPB_TASK_INST_RUN.START_TIME, 'DD/MM/YYYY HH24:MI.SS') DATE_DEBUT,
to_char(OPB_TASK_INST_RUN.END_TIME, 'DD/MM/YYYY HH24:MI.SS') DATE_FIN,
OPB_TASK_INST_RUN.RUN_ERR_CODE,
OPB_TASK_INST_RUN.RUN_ERR_MSG,
OPB_TASK_INST_RUN.RUN_STATUS_CODE,
OPB_TASK_INST_RUN.TASK_NAME,
OPB_TASK_INST_RUN.VERSION_NUMBER TASK_VERSION_NUMBER,
OPB_TASK_INST_RUN.SERVER_NAME
FROM OPB_TASK_INST_RUN, OPB_WFLOW_RUN, OPB_SUBJECT
WHERE OPB_SUBJECT.SUBJ_ID = OPB_TASK_INST_RUN.SUBJECT_ID
AND OPB_WFLOW_RUN.WORKFLOW_ID = OPB_TASK_INST_RUN.WORKFLOW_ID
AND OPB_WFLOW_RUN.WORKFLOW_RUN_ID = OPB_TASK_INST_RUN.WORKFLOW_RUN_ID
ORDER BY 1, 2, 3;

- Extraire en plus des le nombre des enregistrements traités dans chacune des sessions:

SELECT
OPB_SUBJECT.SUBJ_NAME SUBJECT_AREA,
OPB_WFLOW_RUN.WORKFLOW_NAME,
OPB_TASK_INST_RUN.WORKFLOW_RUN_ID,
OPB_TASK_INST_RUN.INSTANCE_NAME,
to_char(OPB_TASK_INST_RUN.START_TIME, 'DD/MM/YYYY HH24:MI.SS') DATE_DEBUT,
to_char(OPB_TASK_INST_RUN.END_TIME, 'DD/MM/YYYY HH24:MI.SS') DATE_FIN,
OPB_TASK_INST_RUN.RUN_ERR_CODE,
OPB_TASK_INST_RUN.RUN_ERR_MSG,
OPB_TASK_INST_RUN.RUN_STATUS_CODE,
OPB_TASK_INST_RUN.TASK_NAME,
OPB_SWIDGINST_LOG.INSTANCE_NAME,
OPB_SWIDGINST_LOG.WIDGET_TYPE,
APPLIED_ROWS SUCCESSFUL_ROWS,
AFFECTED_ROWS SUCCESSFUL_AFFECTED_ROWS,
REJECTED_ROWS FAILED_ROWS,
LAST_ERR_MSG LAST_ERROR,
LAST_ERR_CODE LAST_ERROR_CODE
FROM OPB_TASK_INST_RUN, OPB_WFLOW_RUN, OPB_SUBJECT, OPB_SWIDGINST_LOG
WHERE OPB_SUBJECT.SUBJ_ID = OPB_TASK_INST_RUN.SUBJECT_ID
AND OPB_WFLOW_RUN.WORKFLOW_ID = OPB_TASK_INST_RUN.WORKFLOW_ID
AND OPB_WFLOW_RUN.WORKFLOW_RUN_ID = OPB_TASK_INST_RUN.WORKFLOW_RUN_ID
AND OPB_SWIDGINST_LOG.WORKFLOW_RUN_ID = OPB_TASK_INST_RUN.WORKFLOW_RUN_ID
AND OPB_SWIDGINST_LOG.TASK_INSTANCE_ID = OPB_TASK_INST_RUN.INSTANCE_ID
AND OPB_SWIDGINST_LOG.WIDGET_TYPE IN (3, 2) --- Source/Source Qualifier ou Target
ORDER BY 1, 2, 3;

- Lister tous les workflows, sessions, mappings, sources, lookups et cibles au sein d'un même folder :

SELECT DISTINCT OPB_SUBJECT.SUBJ_NAME FOLDER_NAME,
WORKFLOW.TASK_NAME WORKFLOW_NAME,
SESS.TASK_NAME SESSION_NAME,
OPB_MAPPING.MAPPING_NAME MAPPING_NAME,
OPB_MAPPING.LAST_SAVED MAPPING_LAST_SAVED,
OPB_SRC.SOURCE_NAME SOURCE_NAME,
OPB_TARG.TARGET_NAME TARGET_NAME,
TARGET_INSTANCES.INSTANCE_NAME TARGET_INSTANCE_NAME,
LOOKUP.INSTANCE_NAME LOOKUP_NAME,
TABLE_LOOKUP.ATTR_VALUE LOOKUP_TABLE_NAME
FROM OPB_SRC, OPB_TARG, OPB_SUBJECT, OPB_MAPPING, OPB_WIDGET_INST TARGET_INSTANCES, OPB_WIDGET_INST SOURCE_INSTANCES,
(SELECT WA.WIDGET_ID, WA.ATTR_VALUE FROM OPB_WIDGET_ATTR WA
WHERE WA.ATTR_ID=2) TABLE_LOOKUP,
(SELECT WI.MAPPING_ID, WI.INSTANCE_NAME, WI.WIDGET_ID FROM OPB_WIDGET_INST WI
WHERE WI.WIDGET_TYPE = 11) LOOKUP,
OPB_MAP_TARG_INFO, OPB_SESSION, OPB_TASK SESS, OPB_TASK WORKFLOW, OPB_TASK_INST
WHERE OPB_SUBJECT.SUBJ_ID= OPB_MAPPING.SUBJECT_ID
AND OPB_MAPPING.MAPPING_ID= TARGET_INSTANCES.MAPPING_ID
AND OPB_MAPPING.MAPPING_ID= SOURCE_INSTANCES.MAPPING_ID
AND OPB_MAPPING.MAPPING_ID= LOOKUP.MAPPING_ID (+)
AND LOOKUP.WIDGET_ID= TABLE_LOOKUP.WIDGET_ID (+)
AND OPB_MAPPING.MAPPING_ID= OPB_MAP_TARG_INFO.MAPPING_ID
AND OPB_MAPPING.MAPPING_ID= OPB_SESSION.MAPPING_ID
AND OPB_SESSION.SESSION_ID= SESS.TASK_ID
AND SESS.SUBJECT_ID= OPB_SUBJECT.SUBJ_ID
AND SESS.TASK_TYPE= 68 -- SESSION
AND WORKFLOW.TASK_ID= OPB_TASK_INST.WORKFLOW_ID
AND SESS.TASK_ID= OPB_TASK_INST.TASK_ID
AND WORKFLOW.TASK_TYPE= 71 -- WORKFLOW
AND TARGET_INSTANCES.WIDGET_TYPE= 2
AND TARGET_INSTANCES.WIDGET_ID= OPB_TARG.TARGET_ID
AND OPB_MAP_TARG_INFO.TARGET_INSTANCE_ID = TARGET_INSTANCES.INSTANCE_ID
AND SOURCE_INSTANCES.WIDGET_TYPE= 1 --- SOURCE
AND OPB_MAP_TARG_INFO.OBJECT_INSTANCE_ID = SOURCE_INSTANCES.INSTANCE_ID
AND OPB_MAP_TARG_INFO.OBJECT_TYPE= 1AND OPB_MAP_TARG_INFO.TARGET_OBJECT_TYPE = 2
AND OPB_SRC.SRC_ID= SOURCE_INSTANCES.WIDGET_ID
AND OPB_SRC.IS_VISIBLE= 1
AND OPB_MAPPING.REF_WIDGET_ID= 0
AND OPB_MAPPING.VERSION_NUMBER= SOURCE_INSTANCES.VERSION_NUMBER
AND OPB_MAPPING.VERSION_NUMBER= TARGET_INSTANCES.VERSION_NUMBER
AND OPB_MAPPING.VERSION_NUMBER= OPB_MAP_TARG_INFO.VERSION_NUMBER
AND OPB_MAPPING.IS_VISIBLE= 1
AND OPB_TARG.IS_VISIBLE= 1
AND OPB_SUBJECT.SUBJ_ID= 7 --- Mon FOLDER
ORDER BY OPB_MAPPING.MAPPING_NAME,
OPB_SRC.SOURCE_NAME

- Calculer la durée du dernier traitement pour chaque workflow :

SELECT OPB_SUBJECT.SUBJ_NAME,
OPB_WFLOW_RUN.WORKFLOW_NAME,
TRUNC(OPB_WFLOW_RUN.START_TIME) LOAD_DATE,
TO_CHAR(OPB_WFLOW_RUN.START_TIME,'HH24:MI.SS') WF_START_TIME,
TO_CHAR(OPB_WFLOW_RUN.END_TIME,'HH24:MI.SS') WF_END_TIME,
TRUNC((OPB_WFLOW_RUN.END_TIME - OPB_WFLOW_RUN.START_TIME) * 1440,2) DURATION_IN_MINS

FROM OPB_SUBJECT, OPB_WFLOW_RUN
    WHERE OPB_SUBJECT.SUBJ_ID = opb_wflow_run.subject_id
    AND opb_subject.subj_id = 7
    AND OPB_WFLOW_RUN.start_time =
                (SELECT MAX(wr.start_time)
                FROM OPB_WFLOW_RUN wr
               WHERE wr.workflow_id = OPB_WFLOW_RUN.workflow_id
                )
--- LE DERNIER RUN
ORDER BY 1,2,4

- Lister toutes les sequences réutisables ou non, les mappings associés et les "current value"

--- Les sequences non reutilisables ont des mappings associés
Select b.subject_id, s.subj_name as Folder, a.widget_id, b.widget_name, m.mapping_name, decode(b.is_reusable, 0, 'Sequence non reutisable', 'Sequence reutisable') as IS_REUSABLE, a.attr_value as Current_Value
from OPB_WIDGET_ATTR a,opb_widget b, opb_subject s, opb_mapping m
where
a.widget_id=b.widget_id
and b.subject_id = s.subj_id
and a.widget_type=b.widget_type
and m.mapping_id (+) = b.ru_parent_id
and a.attr_id=4
and b.is_reusable <> 1
and b.widget_type=7
union
--- Les séquences reutilisables et les mappings associés
Select b.subject_id, s.subj_name as Folder, a.widget_id, b.widget_name, m.mapping_name, decode(b.is_reusable, 0, 'Sequence non reutisable', 'Sequence reutisable') as IS_REUSABLE, a.attr_value as Current_Value
from OPB_WIDGET_ATTR a,opb_widget b, opb_subject s, opb_mapping m, rep_widget_inst wi
where
a.widget_id=b.widget_id
and b.subject_id = s.subj_id
and wi.instance_name = b.widget_name
and wi.subject_id = b.subject_id
and a.widget_type=b.widget_type
and m.mapping_id (+) = wi.mapping_id
and m.subject_id (+) = wi.subject_id
and a.attr_id=4
and b.is_reusable = 1
and b.widget_type=7
union
--- Les séquences reutilisables orphelines ==> code mort à supprimer
Select b.subject_id, s.subj_name as Folder, a.widget_id, b.widget_name, 'N/A : Sequence Orpheline' as mapping_name, decode(b.is_reusable, 0, 'Sequence non reutisable', 'Sequence reutisable') as IS_REUSABLE, a.attr_value as Current_Value
from OPB_WIDGET_ATTR a,opb_widget b, opb_subject s
where
a.widget_id=b.widget_id
and b.subject_id = s.subj_id
and a.attr_id=4
and a.widget_type=b.widget_type
and b.widget_type=7
and b.widget_id not in (
          Select a.widget_id
          from OPB_WIDGET_ATTR a, opb_widget b, opb_subject s
          where
          a.widget_id=b.widget_id
          and b.subject_id = s.subj_id
          and a.attr_id=4
          and b.is_reusable <> 1
          and a.widget_type=b.widget_type
          and b.widget_type=7
          union
          Select a.widget_id
          from OPB_WIDGET_ATTR a,opb_widget b, opb_subject s, rep_widget_inst wi
          where
          a.widget_id=b.widget_id
          and b.subject_id = s.subj_id
          and wi.subject_id = b.subject_id
          and wi.instance_name = b.widget_name
          and a.attr_id=4
          and b.is_reusable = 1
          and a.widget_type=b.widget_type
          and b.widget_type=7
)
order by 1, 2 , 3, 4;

4 commentaires:

mazagan19 février 2019 à 11:40
inta m3alem , ca m'a servi ce jour .

Merci Houssine
RépondreSupprimer
Réponses
Unknown20 mai 2019 à 23:38
Bonjour,

Comment peut-on retrouver l'emplacement (workflow, session ou mapping) d'une requête lorsque l'on dispose d'une requête, d'un SQL ID, d'une date d'exécution, du nom du projet? l'objectif est de retrouver cet emplacement pour aller optimiser la requête qui sature la mémoire.

D'avance merci de votre aide.
RépondreSupprimer
Réponses
Lhoussine13 juillet 2019 à 00:39
Bonjour,
Désolé pour le retour un peu tardif, je viens de voir votre demande.
A ma connaissance, le SQL ID est une information base de données non transcrite dans les logs Informatica.
Je vous propose de récupérer la date/heure d'exécution de la requête SQL et aller regarder le workflow/task en cours en requêtant les tables OPB_WFLOW_RUN et OPB_TASK_INST_RUN.

cdt
Lhoussine
RépondreSupprimer
Réponses

Ajouter un commentaire

8 juil. 2014

Exploitation de la Repository Database par requêtage

4 commentaires: