Posts

Refreshing Dremio Assets

Image
Dremio refreshes datasets in the interval we setup at the data source level. There are two sections for metadata refresh. Dataset Discovery: Refresh interval for top-level source object names such as names of DBs and tables. This is a lightweight operation. Dataset Details: Metadata Dremio needs for query planning such as information on fields, types, shards, statistics and locality. Any new data ingested into the source should be available instantaneously. However, there can be some changes in the DDL based on use cases. On that case, until the Dataset Details are refreshed by Dremio, the new records will not appear. Refreshing this setting frequently can be a overhead. So, in case of any particular datasets need to be refreshed frequently better we can refresh those on-demand rather than making the whole data source busier. Dremio provides few ways to refresh datasets (physical data sets aka PDS). Let's see. REST API Fetch the Entity Id pds = 'Hive.demoschema.demotable'

Azure Purview - Search By Glossary Terms using Atlas API

Image
Azure Purview's support to Atlas API has led a way to create custom reporting applications based on our need. In this blog we'll see how we can leverage the API to create a report based on assigned glossary terms.    Prerequisite:   The service principal we'll use to read should have Purview Data Reader role. Find the Enterprise Application Id for Purview: Copy the application id. Generate an OAuth token:   url = "https://login.microsoftonline.com/{{Directory (Tenant) ID}}/oauth2/token" body={ "grant_type": "client_credentials", "client_id": "{{Service Client Id}}", "client_secret": "{{Service Client Credential}}", "resource": "{{Enterprise Application Id for Purview}}" } response = requests.post(url=url, data=body) token = response.json() expires_on = int (token['expires_on']) access_token = token['access_token'] Fetch all Glossary Terms: url = "ht

Writing a Dremio client

Image
Dremio can be connected using REST API, ODBC and JDBC APIs. In this short blog, we'll see how we can use REST and ODBC. REST API Generating a Dremio token import requests, json url = "https://{{Dremio URL}}/apiv2/login" token = "" loginData = ''' { "userName": "''' + username + '''", "password": "''' + password + '''" }''' headers = {'content-type':'application/json'} response = requests.post(url, headers=headers, \ data=loginData, verify=False) response_status = response.status_code if (response_status == 401): print ("Exception: 401!") raise ValueError("Exception: 401!") elif (response_status == 500): print ("Exception: 500!") raise ValueError("Exception: 500!") elif (response_status == 200): data = json.loads(response.text) # retrieve the login token token = '_dremio