> ## Documentation Index
> Fetch the complete documentation index at: https://langchain-5e9cc07a-preview-opensw-1782332329-96d87c7.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Google drive integration

> Integrate with the Google drive document loader using LangChain Python.

> [Google Drive](https://en.wikipedia.org/wiki/Google_Drive) is a file storage and synchronization service developed by Google.

This notebook covers how to load documents from `Google Drive`. Currently, only `Google Docs` are supported.

## Prerequisites

1. Create a Google Cloud project or use an existing project
2. Enable the [Google Drive API](https://console.cloud.google.com/flows/enableapi?apiid=drive.googleapis.com)
3. [Authorize credentials for desktop app](https://developers.google.com/drive/api/quickstart/python#authorize_credentials_for_a_desktop_application)
4. `pip install -U google-api-python-client google-auth-httplib2 google-auth-oauthlib`

## 🧑 Instructions for ingesting your Google docs data

Set the environmental variable `GOOGLE_APPLICATION_CREDENTIALS` to an empty string (`""`).

By default, the `GoogleDriveLoader` expects the `credentials.json` file to be located at `~/.credentials/credentials.json`, but this is configurable using the `credentials_path` keyword argument. Same thing with `token.json` - default path: `~/.credentials/token.json`, constructor param: `token_path`.

The first time you use GoogleDriveLoader, you will be displayed with the consent screen in your browser for user authentication. After authentication, `token.json` will be created automatically at the provided or the default path. Also, if there is already a `token.json` at that path, then you will not be prompted for authentication.

`GoogleDriveLoader` can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:

* Folder: [drive.google.com/drive/u/0/folders/1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5](https://drive.google.com/drive/u/0/folders/1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5) -> folder id is `"1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5"`
* Document: [docs.google.com/document/d/1bfaMQ18\_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw/edit](https://docs.google.com/document/d/1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw/edit) -> document id is `"1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw"`

```python theme={null}
pip install -qU langchain-google-community[drive]
```

```python theme={null}
from langchain_google_community import GoogleDriveLoader
```

```python theme={null}
loader = GoogleDriveLoader(
    folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
    token_path="/path/where/you/want/token/to/be/created/google_token.json",
    # Optional: configure whether to recursively fetch files from subfolders. Defaults to False.
    recursive=False,
)
```

```python theme={null}
docs = loader.load()
```

When you pass a `folder_id` by default all files of type document, sheet and pdf are loaded. You can modify this behaviour by passing a `file_types` argument

```python theme={null}
loader = GoogleDriveLoader(
    folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
    file_types=["document", "sheet"],
    recursive=False,
)
```

## Passing in optional file loaders

When processing files other than Google Docs and Google Sheets, it can be helpful to pass an optional file loader to `GoogleDriveLoader`. If you pass in a file loader, that file loader will be used on documents that do not have a Google Docs or Google Sheets MIME type. Here is an example of how to load an Excel document from Google Drive using a file loader.

<Warning>
  The `langchain-community` package is no longer maintained. Examples that import from `langchain_community` may be outdated or broken. Use with caution.
</Warning>

```python theme={null}
from langchain_community.document_loaders import UnstructuredFileIOLoader
from langchain_google_community import GoogleDriveLoader
```

```python theme={null}
file_id = "1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz"
loader = GoogleDriveLoader(
    file_ids=[file_id],
    file_loader_cls=UnstructuredFileIOLoader,
    file_loader_kwargs={"mode": "elements"},
)
```

```python theme={null}
docs = loader.load()
```

```python theme={null}
docs[0]
```

You can also process a folder with a mix of files and Google Docs/Sheets using the following pattern:

```python theme={null}
folder_id = "1asMOHY1BqBS84JcRbOag5LOJac74gpmD"
loader = GoogleDriveLoader(
    folder_id=folder_id,
    file_loader_cls=UnstructuredFileIOLoader,
    file_loader_kwargs={"mode": "elements"},
)
```

```python theme={null}
docs = loader.load()
```

```python theme={null}
docs[0]
```

```python theme={null}
```

## Extended usage

An external (unofficial) component can manage the complexity of Google Drive : `langchain-googledrive`
It's compatible with the ̀`langchain_community.document_loaders.GoogleDriveLoader` and can be used
in its place.

To be compatible with containers, the authentication uses an environment variable `̀GOOGLE_ACCOUNT_FILE` to credential file (for user or service).

```python theme={null}
pip install -qU  langchain-googledrive
```

```python theme={null}
folder_id = "root"
# folder_id='1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5'
```

```python theme={null}
# Use the advanced version.
from langchain_googledrive.document_loaders import GoogleDriveLoader
```

```python theme={null}
loader = GoogleDriveLoader(
    folder_id=folder_id,
    recursive=False,
    num_results=2,  # Maximum number of file to load
)
```

By default, all files with these mime-type can be converted to `Document`.

* text/text
* text/plain
* text/html
* text/csv
* text/markdown
* image/png
* image/jpeg
* application/epub+zip
* application/pdf
* application/rtf
* application/vnd.google-apps.document (GDoc)
* application/vnd.google-apps.presentation (GSlide)
* application/vnd.google-apps.spreadsheet (GSheet)
* application/vnd.google.colaboratory (Notebook colab)
* application/vnd.openxmlformats-officedocument.presentationml.presentation (PPTX)
* application/vnd.openxmlformats-officedocument.wordprocessingml.document (DOCX)

It's possible to update or customize this. See the documentation of `GDriveLoader`.

The corresponding packages must be installed.

```python theme={null}
pip install -qU  unstructured
```

```python theme={null}
for doc in loader.load():
    print("---")
    print(doc.page_content.strip()[:60] + "...")
```

### Loading auth identities

Authorized identities for each file ingested by Google Drive Loader can be loaded along with metadata per Document.

```python theme={null}
from langchain_google_community import GoogleDriveLoader

loader = GoogleDriveLoader(
    folder_id=folder_id,
    load_auth=True,
    # Optional: configure whether to load authorized identities for each Document.
)

doc = loader.load()
```

You can pass load\_auth=True, to add Google Drive document access identities to metadata.

```python theme={null}
doc[0].metadata
```

### Loading extended metadata

Following extra fields can also be fetched within metadata of each Document:

* full\_path - Full path of the file/s in google drive.
* owner - owner of the file/s.
* size - size of the file/s.

```python theme={null}
from langchain_google_community import GoogleDriveLoader

loader = GoogleDriveLoader(
    folder_id=folder_id,
    load_extended_matadata=True,
    # Optional: configure whether to load extended metadata for each Document.
)

doc = loader.load()
```

You can pass load\_extended\_matadata=True, to add Google Drive document extended details to metadata.

```python theme={null}
doc[0].metadata
```

### Customize the search pattern

All parameter compatible with Google [`list()`](https://developers.google.com/drive/api/v3/reference/files/list)
API can be set.

To specify the new pattern of the Google request, you can use a `PromptTemplate()`.
The variables for the prompt can be set with `kwargs` in the constructor.
Some pre-formated request are proposed (use `{query}`, `{folder_id}` and/or `{mime_type}`):

You can customize the criteria to select the files. A set of predefined filter are proposed:

| template                               | description                                                         |
| -------------------------------------- | ------------------------------------------------------------------- |
| gdrive-all-in-folder                   | Return all compatible files from a `folder_id`                      |
| gdrive-query                           | Search `query` in all drives                                        |
| gdrive-by-name                         | Search file with name `query`                                       |
| gdrive-query-in-folder                 | Search `query` in `folder_id` (and sub-folders if `recursive=true`) |
| gdrive-mime-type                       | Search a specific `mime_type`                                       |
| gdrive-mime-type-in-folder             | Search a specific `mime_type` in `folder_id`                        |
| gdrive-query-with-mime-type            | Search `query` with a specific `mime_type`                          |
| gdrive-query-with-mime-type-and-folder | Search `query` with a specific `mime_type` and in `folder_id`       |

```python theme={null}
loader = GoogleDriveLoader(
    folder_id=folder_id,
    recursive=False,
    template="gdrive-query",  # Default template to use
    query="machine learning",
    num_results=2,  # Maximum number of file to load
    supportsAllDrives=False,  # GDrive `list()` parameter
)
for doc in loader.load():
    print("---")
    print(doc.page_content.strip()[:60] + "...")
```

You can customize your pattern.

```python theme={null}
from langchain_core.prompts.prompt import PromptTemplate

loader = GoogleDriveLoader(
    folder_id=folder_id,
    recursive=False,
    template=PromptTemplate(
        input_variables=["query", "query_name"],
        template="fullText contains '{query}' and name contains '{query_name}' and trashed=false",
    ),  # Default template to use
    query="machine learning",
    query_name="ML",
    num_results=2,  # Maximum number of file to load
)
for doc in loader.load():
    print("---")
    print(doc.page_content.strip()[:60] + "...")
```

The conversion can manage in Markdown format:

* bullet
* link
* table
* titles

Set the attribute `return_link` to `True` to export links.

#### Modes for GSlide and GSheet

The parameter mode accepts different values:

* "document": return the body of each document
* "snippets": return the description of each file (set in metadata of Google Drive files).

The parameter `gslide_mode` accepts different values:

* "single" : one document with \<PAGE BREAK>
* "slide" : one document by slide
* "elements" : one document for each elements.

```python theme={null}
loader = GoogleDriveLoader(
    template="gdrive-mime-type",
    mime_type="application/vnd.google-apps.presentation",  # Only GSlide files
    gslide_mode="slide",
    num_results=2,  # Maximum number of file to load
)
for doc in loader.load():
    print("---")
    print(doc.page_content.strip()[:60] + "...")
```

The parameter `gsheet_mode` accepts different values:

* `"single"`: Generate one document by line
* `"elements"` : one document with markdown array and \<PAGE BREAK> tags.

```python theme={null}
loader = GoogleDriveLoader(
    template="gdrive-mime-type",
    mime_type="application/vnd.google-apps.spreadsheet",  # Only GSheet files
    gsheet_mode="elements",
    num_results=2,  # Maximum number of file to load
)
for doc in loader.load():
    print("---")
    print(doc.page_content.strip()[:60] + "...")
```

### Advanced usage

All Google File have a 'description' in the metadata. This field can be used to memorize a summary of the document or others indexed tags (See method `lazy_update_description_with_summary()`).

If you use the `mode="snippet"`, only the description will be used for the body. Else, the `metadata['summary']` has the field.

Sometime, a specific filter can be used to extract some information from the filename, to select some files with specific criteria. You can use a filter.

Sometimes, many documents are returned. It's not necessary to have all documents in memory at the same time. You can use the lazy versions of methods, to get one document at a time. It's better to use a complex query in place of a recursive search. For each folder, a query must be applied if you activate `recursive=True`.

```python theme={null}
import os

loader = GoogleDriveLoader(
    gdrive_api_file=os.environ["GOOGLE_ACCOUNT_FILE"],
    num_results=2,
    template="gdrive-query",
    filter=lambda search, file: "#test" not in file.get("description", ""),
    query="machine learning",
    supportsAllDrives=False,
)
for doc in loader.load():
    print("---")
    print(doc.page_content.strip()[:60] + "...")
```

***

<div className="source-links">
  <Callout icon="terminal-2">
    [Connect these docs](/use-these-docs) to Claude, VSCode, and more via MCP for real-time answers.
  </Callout>

  <Callout icon="edit">
    [Edit this page on GitHub](https://github.com/langchain-ai/docs/edit/main/src/oss/python/integrations/document_loaders/google_drive.mdx) or [file an issue](https://github.com/langchain-ai/docs/issues/new/choose).
  </Callout>
</div>
