Tagging and Summarizing Articles with OpenAI and LangChain
In this post, we build a simple AI-enabled application for tagging and summarizing articles by leveraging OpenAI and LangChain.
Under the hood, our application will use OpenAI function calling to interact with a Large Language Model and get a structured output with a summary of the article, its language, and associated tags.
Firstly, we import our OpenAPI Key in the environment variable and add the necessary imports.
import os
import getpass
def _set_env(var: str):
if not os.environ.get(var):
os.environ[var] = getpass.getpass(f"{var}: ")
_set_env("OPENAI_API_KEY")
from pydantic import BaseModel, Field
from langchain_core.utils.function_calling import convert_to_openai_function
from langchain.document_loaders import WebBaseLoader
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.output_parsers.openai_functions import (
JsonOutputFunctionsParser, JsonKeyOutputFunctionsParser
)
Then, we initialize the model (with temperature=0) to limit the LLM from producing random and creative responses.
model = ChatOpenAI(temperature=0)
Next, load the document we would like to summarize.
loader = WebBaseLoader("https://blogs.nvidia.com/blog/ces-2025-jensen-huang/")
documents = loader.load()
doc = documents[0]
Next, we create a Pydantic data model to structure the output. The Pydantic library offers a concise way to define data structures and provide validation support.
We must add a docstring to the model and descriptions to all the fields to help the LLM understand the desired responses. We use Pydantic data models to create a JSON schema, allowing easy integration with OpenAI models.
class Overview(BaseModel):
"""Overview of an article."""
summary: str = Field(description="Provide a excerpt of the content in 60 words.")
language: str = Field(description="Provide the language that the content is written in.")
keywords: str = Field(description="Provide keywords related to the content. All the keywords should be in lowercase.")
Next, we use the convert_to_openai_function function to create the JSON schema and bind it to the model.
overview_tagging_function = [
convert_to_openai_function(Overview)
]We then create the model, a simple prompt, and an output parser.
tagging_model = model.bind(
functions=overview_tagging_function,
function_call={"name":"Overview"}
)
prompt = ChatPromptTemplate.from_messages([
("system", "Think carefully, and then tag the text as instructed"),
("user", "{input}")
])
json_output_function_parser = JsonOutputFunctionsParser()
Later, we create a LangChain runnable named tagging_chain by piping prompt, tagging_model and the json_output_function_parser output parser.
tagging_chain = prompt | tagging_model | json_output_function_parserFinally, we invoke the runnable by inserting the contents of the page to get a well-structured JSON output like -
tagging_chain.invoke({"input": doc.page_content})
{
'summary': 'NVIDIA CEO Jensen Huang discussed the advancements in AI at CES 2025, unveiling new products like the NVIDIA Cosmos platform and Blackwell RTX 50 Series GPUs. The focus was on physical AI, AI tools for PCs, and innovations in autonomous vehicles and robotics.',
'language': 'english',
'keywords': 'nvidia, ces 2025, ai advancements, jensen huang, nvidia cosmos, blackwell rtx 50 series gpus, physical ai, robotics, autonomous vehicles'
}
Links
https://platform.openai.com/docs/guides/function-calling
https://openai.com/index/function-calling-and-other-api-updates/