How to Read Data From Google Drive in Python

· fourteen min read · Updated feb 2022 · Application Programming Interfaces

Google Drive enables you to store your files in the cloud, which you can admission anytime and everywhere in the globe. In this tutorial, you lot will acquire how to listing your Google drive files, search over them, download stored files, and even upload local files into your drive programmatically using Python.

Hither is the table of contents:

  • Enable the Drive API
  • List Files and Directories
  • Upload Files
  • Search for Files and Directories
  • Download Files

To get started, let's install the required libraries for this tutorial:

          pip3 install google-api-python-client google-auth-httplib2 google-auth-oauthlib tabulate requests tqdm        

Enable the Bulldoze API

Enabling Google Bulldoze API is very like to other Google APIs such as Gmail API, YouTube API, or Google Search Engine API. First, you need to accept a Google account with Google Drive enabled. Head to this page and click the "Enable the Drive API" push equally shown below:

Enable the Drive API

A new window will pop up; choose your blazon of application. I will stick with the "Desktop app" and and so hit the "Create" button. After that, y'all'll see another window appear saying yous're all set:

Drive API is enabled

Download your credentials by clicking the "Download Customer Configuration" button and so "Done".

Finally, you demand to put credentials.json that is downloaded into your working directories (i.e., where you execute the upcoming Python scripts).

List Files and Directories

Before we do anything, nosotros need to authenticate our code to our Google business relationship. The below function does that:

          import pickle import bone from googleapiclient.discovery import build from google_auth_oauthlib.menstruum import InstalledAppFlow from google.auth.transport.requests import Asking from tabulate import tabulate  # If modifying these scopes, delete the file token.pickle. SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly']  def get_gdrive_service():     creds = None     # The file token.pickle stores the user's admission and refresh tokens, and is     # created automatically when the authority catamenia completes for the offset     # time.     if os.path.exists('token.pickle'):         with open('token.pickle', 'rb') as token:             creds = pickle.load(token)     # If there are no (valid) credentials available, permit the user log in.     if non creds or non creds.valid:         if creds and creds.expired and creds.refresh_token:             creds.refresh(Request())         else:             menstruum = InstalledAppFlow.from_client_secrets_file(                 'credentials.json', SCOPES)             creds = flow.run_local_server(port=0)         # Save the credentials for the adjacent run         with open('token.pickle', 'wb') every bit token:             pickle.dump(creds, token)     # return Google Drive API service     return build('drive', 'v3', credentials=creds)        

We've imported the necessary modules. The above function was grabbed from the Google Drive quickstart page. Information technology basically looks for token.pickle file to authenticate with your Google business relationship. If it didn't find it, it'd utilise credentials.json to prompt you for authentication in your browser. After that, information technology'll initiate the Google Drive API service and render information technology.

Going to the primary function, let'southward ascertain a function that lists files in our bulldoze:

          def main():     """Shows basic usage of the Drive v3 API.     Prints the names and ids of the first 5 files the user has access to.     """     service = get_gdrive_service()     # Phone call the Drive v3 API     results = service.files().list(         pageSize=5, fields="nextPageToken, files(id, name, mimeType, size, parents, modifiedTime)").execute()     # get the results     items = results.get('files', [])     # listing all 20 files & folders     list_files(items)        

And then we used service.files().list() function to return the get-go five files/folders the user has admission to past specifying pageSize=5, we passed some useful fields to the fields parameter to get details near the listed files, such as mimeType (blazon of file), size in bytes, parent directory IDs, and the final modified date time. Check this page to see all other fields.

Notice we used list_files(items) part, we didn't define this function all the same. Since results are now a list of dictionaries, information technology isn't that readable. We pass items to this function to print them in human-readable format:

          def list_files(items):     """given items returned past Google Drive API, prints them in a tabular way"""     if not items:         # empty bulldoze         print('No files found.')     else:         rows = []         for item in items:             # get the File ID             id = item["id"]             # go the name of file             name = item["proper noun"]             try:                 # parent directory ID                 parents = item["parents"]             except:                 # has no parrents                 parents = "N/A"             try:                 # get the size in nice bytes format (KB, MB, etc.)                 size = get_size_format(int(particular["size"]))             except:                 # non a file, may exist a folder                 size = "N/A"             # get the Google Drive type of file             mime_type = item["mimeType"]             # get last modified date time             modified_time = item["modifiedTime"]             # append everything to the list             rows.suspend((id, proper name, parents, size, mime_type, modified_time))         print("Files:")         # convert to a human being readable tabular array         table = tabulate(rows, headers=["ID", "Name", "Parents", "Size", "Type", "Modified Time"])         # print the table         impress(table)        

We converted that listing of dictionaries items variable into a list of tuples rows variable, and and so pass them to tabulate module we installed earlier to print them in a nice format, let'southward phone call main() function:

          if __name__ == '__main__':     principal()        

See my output:

          Files: ID                                 Name                            Parents                  Size      Type                          Modified Time ---------------------------------  ------------------------------  -----------------------  --------  ----------------------------  ------------------------ 1FaD2BVO_ppps2BFm463JzKM-gGcEdWVT  some_text.txt                   ['0AOEK-gp9UUuOUk9RVA']  31.00B    text/apparently                    2020-05-15T13:22:20.000Z 1vRRRh5OlXpb-vJtphPweCvoh7qYILJYi  google-drive-512.png            ['0AOEK-gp9UUuOUk9RVA']  fifteen.62KB   prototype/png                     2020-05-14T23:57:eighteen.000Z 1wYY_5Fic8yt8KSy8nnQfjah9EfVRDoIE  bbc.nada                         ['0AOEK-gp9UUuOUk9RVA']  863.61KB  application/x-nada-compressed  2019-08-19T09:52:22.000Z 1FX-KwO6EpCMQg9wtsitQ-JUqYduTWZub  Nasdaq 100 Historical Information.csv  ['0AOEK-gp9UUuOUk9RVA']  363.10KB  text/csv                      2019-05-17T16:00:44.000Z 1shTHGozbqzzy9Rww9IAV5_CCzgPrO30R  my_python_code.py               ['0AOEK-gp9UUuOUk9RVA']  ane.92MB    text/x-python                 2019-05-13T14:21:ten.000Z        

These are the files in my Google Bulldoze. Notice the Size column are scaled in bytes; that'south considering nosotros used get_size_format() function in list_files() function, here is the lawmaking for information technology:

          def get_size_format(b, factor=1024, suffix="B"):     """     Scale bytes to its proper byte format     e.1000:         1253656 => '1.20MB'         1253656678 => '1.17GB'     """     for unit of measurement in ["", "1000", "M", "1000", "T", "P", "East", "Z"]:         if b < gene:             render f"{b:.2f}{unit}{suffix}"         b /= factor     return f"{b:.2f}Y{suffix}"        

The above function should be defined before running the main() method. Otherwise, it'll heighten an error. For convenience, check the total code.

Remember after you lot run the script, you lot'll be prompted in your default browser to select your Google business relationship and permit your application for the scopes y'all specified earlier, don't worry, this volition only happen the first time yous run it, and and so token.pickle will be saved and will load authentication details from there instead.

Annotation: Sometimes, yous'll meet a "This awarding is non validated" alert (since Google didn't verify your app) after choosing your Google account. It's okay to go "Advanced" department and permit the application to your account.

Upload Files

To upload files to our Google Bulldoze, we need to change the SCOPES list we specified earlier, we need to add together the permission to add files/folders:

          from __future__ import print_function import pickle import bone.path from googleapiclient.discovery import build from google_auth_oauthlib.flow import InstalledAppFlow from google.auth.transport.requests import Asking from googleapiclient.http import MediaFileUpload  # If modifying these scopes, delete the file token.pickle. SCOPES = ['https://www.googleapis.com/auth/bulldoze.metadata.readonly',           'https://www.googleapis.com/auth/drive.file']        

Different scope means different privileges, and you demand to delete token.pickle file in your working directory and rerun the code to authenticate with the new scope.

We volition use the same get_gdrive_service() office to authenticate our account, let's make a function to create a folder and upload a sample file to it:

          def upload_files():     """     Creates a folder and upload a file to it     """     # cosign account     service = get_gdrive_service()     # folder details we want to brand     folder_metadata = {         "proper noun": "TestFolder",         "mimeType": "application/vnd.google-apps.folder"     }     # create the folder     file = service.files().create(torso=folder_metadata, fields="id").execute()     # go the folder id     folder_id = file.get("id")     print("Folder ID:", folder_id)     # upload a file text file     # first, define file metadata, such as the proper name and the parent folder ID     file_metadata = {         "name": "test.txt",         "parents": [folder_id]     }     # upload     media = MediaFileUpload("test.txt", resumable=True)     file = service.files().create(body=file_metadata, media_body=media, fields='id').execute()     print("File created, id:", file.get("id"))        

We used service.files().create() method to create a new folder, we passed the folder_metadata dictionary that has the type and the name of the folder we want to create, nosotros passed fields="id" to retrieve binder id so we can upload a file into that binder.

Next, we used MediaFileUpload class to upload the sample file and pass it to the same service.files().create() method, make sure you lot have a test file of your choice chosen test.txt, this time we specified the "parents" attribute in the metadata dictionary, we simply put the folder we just created. Let's run it:

          if __name__ == '__main__':     upload_files()        

Later I ran the code, a new folder was created in my Google Drive:

A folder created using Google Drive API in Python And indeed, after I enter that folder, I encounter the file we simply uploaded:

File Uploaded using Google Drive API in Python We used a text file for demonstration, only you can upload whatever type of file you lot want. Cheque the full code of uploading files to Google Drive.

Search for Files and Directories

Google Bulldoze enables usa to search for files and directories using the previously used list() method only by passing the 'q' parameter, the beneath function takes the Drive API service and query and returns filtered items:

          def search(service, query):     # search for the file     result = []     page_token = None     while True:         response = service.files().list(q=query,                                         spaces="bulldoze",                                         fields="nextPageToken, files(id, name, mimeType)",                                         pageToken=page_token).execute()         # iterate over filtered files         for file in response.get("files", []):             upshot.suspend((file["id"], file["name"], file["mimeType"]))         page_token = response.get('nextPageToken', None)         if non page_token:             # no more than files             break     return result        

Permit'south see how to use this function:

          def main():     # filter to text files     filetype = "text/plain"     # authenticate Google Bulldoze API     service = get_gdrive_service()     # search for files that has blazon of text/plain     search_result = search(service, query=f"mimeType='{filetype}'")     # convert to table to impress well     table = tabulate(search_result, headers=["ID", "Name", "Type"])     print(tabular array)        

Then we're filtering text/plain files here by using "mimeType='text/plain'" as query parameter, if you want to filter by name instead, yous tin simply use "name='filename.ext'" as query parameter. See Google Drive API documentation for more detailed data.

Let's execute this:

          if __name__ == '__main__':     primary()        

Output:

          ID                                 Name           Type ---------------------------------  -------------  ---------- 15gdpNEYnZ8cvi3PhRjNTvW8mdfix9ojV  test.txt       text/plainly 1FaE2BVO_rnps2BFm463JwPN-gGcDdWVT  some_text.txt  text/plainly        

Check the total lawmaking hither.

Related: How to Use Gmail API in Python.

Download Files

To download files, we need showtime to get the file we want to download. We can either search for it using the previous code or manually get its bulldoze ID. In this section, we gonna search for the file by name and download it to our local disk:

          import pickle import bone import re import io from googleapiclient.discovery import build from google_auth_oauthlib.flow import InstalledAppFlow from google.auth.send.requests import Request from googleapiclient.http import MediaIoBaseDownload import requests from tqdm import tqdm  # If modifying these scopes, delete the file token.pickle. SCOPES = ['https://www.googleapis.com/auth/bulldoze.metadata',           'https://www.googleapis.com/auth/drive',           'https://world wide web.googleapis.com/auth/drive.file'           ]        

I've added two scopes here. That's because nosotros demand to create permission to make files shareable and downloadable. Hither is the main part:

          def download():     service = get_gdrive_service()     # the proper noun of the file yous want to download from Google Drive      filename = "bbc.zip"     # search for the file by proper noun     search_result = search(service, query=f"name='{filename}'")     # go the GDrive ID of the file     file_id = search_result[0][0]     # make information technology shareable     service.permissions().create(body={"role": "reader", "type": "anyone"}, fileId=file_id).execute()     # download file     download_file_from_google_drive(file_id, filename)        

You saw the first iii lines in previous recipes. We simply authenticate with our Google account and search for the desired file to download.

Afterwards that, we excerpt the file ID and create new permission that will allow united states to download the file, and this is the same as creating a shareable link button in the Google Drive web interface.

Finally, we utilize our defined download_file_from_google_drive() office to download the file, there you have it:

          def download_file_from_google_drive(id, destination):     def get_confirm_token(response):         for primal, value in response.cookies.items():             if key.startswith('download_warning'):                 return value         return None      def save_response_content(response, destination):         CHUNK_SIZE = 32768         # get the file size from Content-length response header         file_size = int(response.headers.go("Content-Length", 0))         # extract Content disposition from response headers         content_disposition = response.headers.get("content-disposition")         # parse filename         filename = re.findall("filename=\"(.+)\"", content_disposition)[0]         print("[+] File size:", file_size)         impress("[+] File proper noun:", filename)         progress = tqdm(response.iter_content(CHUNK_SIZE), f"Downloading {filename}", total=file_size, unit="Byte", unit_scale=Truthful, unit_divisor=1024)         with open(destination, "wb") as f:             for chunk in progress:                 if chunk: # filter out keep-alive new chunks                     f.write(chunk)                     # update the progress bar                     progress.update(len(chunk))         progress.shut()      # base of operations URL for download     URL = "https://docs.google.com/uc?export=download"     # init a HTTP session     session = requests.Session()     # make a request     response = session.go(URL, params = {'id': id}, stream=True)     print("[+] Downloading", response.url)     # get confirmation token     token = get_confirm_token(response)     if token:         params = {'id': id, 'confirm':token}         response = session.get(URL, params=params, stream=True)     # download to disk     save_response_content(response, destination)                  

I've grabbed a part of the above lawmaking from downloading files tutorial; it is simply making a Become request to the target URL we constructed by passing the file ID as params in session.get() method.

I've used the tqdm library to print a progress bar to see when it'll finish, which will become handy for large files. Let's execute information technology:

          if __name__ == '__main__':     download()        

This will search for the bbc.zip file, download information technology and salve information technology in your working directory. Check the full code.

Decision

Alright, in that location you have it. These are basically the core functionalities of Google Drive. Now you lot know how to do them in Python without transmission mouse clicks!

Remember, whenever you change the SCOPES list, you need to delete token.pickle file to authenticate to your business relationship again with the new scopes. Meet this folio for further information, along with a listing of scopes and their explanations.

Feel free to edit the code to accept file names equally parameters to download or upload them. Become and endeavour to brand the script equally dynamic as possible by introducing argparse module to make some useful scripts. Let'due south run into what you lot build!

Beneath is a list of other Google APIs tutorials, if yous want to cheque them out:

  • How to Extract Google Trends Data in Python.
  • How to Use Google Custom Search Engine API in Python.
  • How to Extract YouTube Data using YouTube API in Python.
  • How to Use Gmail API in Python.

Happy Coding ♥

View Full Code


Read Besides


How to Use Google Custom Search Engine API in Python

How to Download and Upload Files in FTP Server using Python

How to Extract Google Trends Data in Python


Comment console

loartheight.blogspot.com

Source: https://www.thepythoncode.com/article/using-google-drive--api-in-python

0 Response to "How to Read Data From Google Drive in Python"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel