It’s truly amazing what developers can build these days by hacking together a few APIs. Almost seems like the original promise of Web 2.0 mashups (remember those? 👴) has finally been fulfilled. Just a few years ago, it would have taken weeks or months to develop certain kinds of data analytics apps. Today, they can be built in less than an hour.

In this post I will show how to build a simple web app that extracts analytics data with GPT-3 from unstructured text and renders tables, charts, and maps based on that data. To build it, we will use Streamlit, a Python framework that allows developers to quickly create analytics and ML apps without having to write extensive frontend code. To process unstructured text, we’ll use GalaxyBrain, my LLM workflows framework that provides primitives for OpenAI completions, prompts, memory, rules, and more.

This is what we are aiming for by the end of this post.

Before we start, I have to admit that, to my shame, I’ve mostly ignored recent developments in AI and LLMs (large language models) until I saw ChatGPT just over a month ago (not a particularly uncommon story). After playing with it, I felt a wave of excitement that was reminiscent of growing up in the 90s when I learned about the Internet and used Netscape Navigator for the first time. Since discovering ChatGPT, I’ve been tinkering with LLMs to learn how they work and to see how they can be used to build real-world applications. I’m going to be sharing more findings about all things LLM, so if you are interested please subscribe and follow.

Let’s start by writing the app in Streamlit. At the high level, we want it to do the following:

• Take unstructured input from the user.
• Query the GPT-3 model with a request to process the input and return it as a JSON object.
• Parse the input into a DataFrame.
• Use the DataFrame to render visualizations like tables, charts, and maps.

Time to write some code!

# We want OpenAI to extract analytics data and return it as a valid JSON object,
# so let's defined some rules
rules = [
json_rules.return_valid_json(),
Rule("act as an analyst and extract quantitative data from your inputs")
]

# Now, let's define a default workflow driver that will be automatically used
# to make requests to GPT.
driver = OpenAiCompletionDriver(temperature=0.5, user="demo")

# Finally, setup the workflow
workflow = Workflow(rules=rules, completion_driver=driver)

if "is_data_processed" not in st.session_state:
st.session_state.is_data_processed = False

st.title("GPT Analyst")

with st.expander("Raw Data", expanded=True):
raw_text = st.text_area(
height=400
)

# Once the "Process" button is clicked, we kick off the magic.
if st.button("Process Raw Data") or st.session_state.is_data_processed:
try:
# Add a completion step to the workflow
CompletionStep(input=Prompt(raw_text))
)

# Start the workflow. This will execute out only Step defined above
workflow.start()

# Only proceed if the LLM result was validated against our rules
validator = Validator(step.output, rules)

if validator.validate():
st.session_state.is_data_processed = True

processed_text = step.output.value

# Add a text field for the OpenAI output that the user can tweak.
# When this field is updated all charts and tables are automatically
# refreshed.
with st.expander("Processed Data", expanded=True):
edited_data = st.text_area(
"Edit processed data",
processed_text,
height=400
)

# Finally, render default Streamlit visualizations
tab1, tab2, tab3, tab4 = st.tabs(
["DataFrame", "Bar Chart", "Line Chart", "Map"]
)

with tab1:
try:
st.dataframe(to_dataframe(edited_data), use_container_width=True)
except Exception as err:
st.error(err)
with tab2:
try:
st.bar_chart(to_dataframe(edited_data))
except Exception as err:
st.error(err)
with tab3:
try:
st.line_chart(to_dataframe(edited_data))
except Exception as err:
st.error(err)
pass
with tab4:
try:
st.map(to_dataframe(edited_data))
pass
except Exception as err:
st.error(err)
else:
failed_rules = "\n".join([rule.value for rule in validator.failed_rules()])

st.error(f"The following rules failed: {failed_rules}")

except Exception as err:
st.error(err)
logging.error(err.with_traceback, exc_info=True)


After the user presses the submit button, we add a CompletionStep to Workflow and start it. Then we validate the output and load it into a new text field that the user can tweak. Finally, we use Streamlit to render an interactive table, some charts, and a map. All of those primitives use Pandas DataFrames—a common data structure for data analytics and ML tasks.

Depending on the requirements, we can always play more with different rules. For example, we can add a rule to change our JSON output keys:

rules = [
json_rules.return_valid_json(),
Rule("act as an analyst and extract quantitative data from your inputs"),
Rule("all JSON keys should be in PascalCase")
]


We can also write a custom validator, but it’s beyond the scope of this article.

Finally, we’ll use Pandas to parse the JSON object into a DataFrame. Here is how we do it:

def to_dataframe(raw_data: str) -> pd.DataFrame: