2019년 8월 6일 화요일

NiFi: Processors for Querying Databases

Database Extract with NiFi

Processors for Querying Databases

  • ExecuteSQL: 쿼리를 직접 작성하고 전송할 목적, pagination이 적용되는 쿼리에 적합. 
    executes an arbitrary SQL statement and returns the results as one FlowFile, in Avro format, containing all of the result records. Very simple and flexible, works with a broad set of statements including stored procedure calls. Designed for general-purpose use, does not have specific features for incremental extraction. ExecuteSQL can accept incoming FlowFiles, and FlowFile attributes may be used in expression language statements to make the SQL.
  • QueryDatabaseTable: 증분 쿼리, 최근 변경 사항만 가져오는 용도
    designed specifically for incremental extraction. Computes SQL queries based on a given table name and incrementing column. Maintains NiFi state data tracking the last incremental value retrieved. Results are formatted as Avro files.
  • GenerateTableFetch: Pagination 적용 쿼리, 데이터가 큰 경우
    New in NiFi 1.0.0. May be used to generate a sequence of paged query statements for use with ExecuteSQL, making it practical to query very large data sets in manageable chunks.
If you are doing an incremental extract, trying to get only the latest records, then QueryDatabaseTable is probably your processor. If you need to customize the SQL statement to individual input FlowFiles, ExecuteSQL is the only way to go. If you wish to run a routine query on a scheduleExecuteSQL is probably a better fit.
NiFi에서 DB connection test를 하기 위해서는 ExecuteSQL이 적합할 것으로 예상됨. Oracle을 예로 들면 다음과 같이 connection test를 할 것임
SELECT 1 FROM DUAL
이러한 custom SQL statement를 작성하고 결과를 받기위해서 사용하는 것이 ExecuteSQL이므로 가장 적합해 보임

Dealing with Avro Files

  • SplitAvro: 여러 레코드가 저장된 avro file을 개별 레코드 또는 일정한 크기의 FlowFile로 분리시킴
    splits an Avro file with multiple records into individual FlowFiles, or FlowFiles of an arbitrary size. Since an Avro file may contain more than one record, using SplitAvro can give you consistent size of the FlowFiles in your flow.
  • ConvertAvroToJSON: appropriately named processor that converts Avro files to -- wait for it -- JSON. This is extremely useful both for the flexibility of processing JSON and the ease of visually inspecting JSON data. Combined with the SplitAvro processor, you can easily convert a stream of database records into a stream of single-record JSON FlowFiles.

Large Result Sets

  • 대용량 데이터 조회시 QueryDatabaseTable은 사용하지 말 것
  • ExecuteSQL + GenerateTableFetch를 사용할 것
Due to the wide variance in SQL support for limiting result sets, NiFi does not provide a means to automagically keep result sets down to hundreds or thousands of rows. Reading a large table with QueryDatabaseTable may not be practical.
But remember that ExecuteSQL allows you to customize the SQL statement with an incoming FlowFile. If you have NiFi 1.0.0, you can use the **GenerateTableFetch **processor to do build a stream of incremental queries...

Reference

댓글 없음:

댓글 쓰기