Retail
No Result
View All Result
69 °f
Austin
67 ° Wed
66 ° Thu
67 ° Fri
62 ° Sat
Monday, April 12, 2021
  • News
  • World News
  • US News
  • Political News
  • Business News
  • Health
  • Sports
  • Technology
  • Entertainment
  • Fashion and Lifestyle
  • World Religion
Word of Life News
No Result
View All Result

Weaviate is an open-source search engine powered by ML, vectors, graphs, and GraphQL | ZDNet

April 7, 2021
in Technology
7 min read
0
SHARES
0
VIEWS
ShareShareShareShareShareShare

YOU MAY ALSO LIKE

PUBG Mobile developer could be working on a new sci-fi shooter codenamed ‘Vertical

Google Photos for Desktop Gets Google Lens Functionality

Bob van Luijt’s career in technology started at age 15, building websites to help people sell toothbrushes online. Not many 15 year-olds do that. Apparently, this gave van Luijt enough of a head start to arrive at the confluence of technology trends today.

Van Luijt went on to study arts but ended up working full time in technology anyway. In 2015, when Google introduced its RankBrain algorithm, the quality of search results jumped up. It was a watershed moment, as it introduced machine learning in search. A few people noticed, including van Luijt, who saw a business opportunity and decided to bring this to the masses. 

ZDNet connected with van Luijt to find out more.

Weaviate, a B2B search engine modeled after Google

Does Google’s RankBrain machine learning improve search results for users? People were wondering at the time RankBrain was introduced. As ZDNet’s own Eileen Brown noted: Yes, and results delivered by RankBrain will get better as it learns what we are trying to ask of it.

For van Luijt, this was an “Aha” moment. Like everyone else working in technology, he had to deal with lots of unstructured data. In his words, relating data is a problem. Data integration is hard to do, even for structured data. When you have unstructured data from different sources, it becomes extremely challenging.

Van Luijt read up on RankBrain and figured it uses word vectorization to infer relations in the queries and then try to present results. Vectors are how machine learning models understand the world. Where people see images, for example, machine learning models see image representations, in the form of vectors.

The introduction of Google’s RankBrain algorithm was a watershed moment for search, as it introduced machine learning to search. Image: Search Engine Journal

A vector is a very long list of numbers, which can be thought of as coordinates in a geometrical space. Three-dimensional vectors — i.e. vectors of the form (X, Y, Z) — correspond to a space humans are familiar with. But multi-dimensional vectors also exist, and this complicates things:

“There are many dimensions, but to paint a mental picture, you can say there’s just three dimensions. The problem now is, it’s great that you can use a vector to recognize a pattern in a photo and then say, yes, it’s a cat, or no, it’s not a cat. But then, what if you want to do that for one hundred thousand photos or for a million photos? Then you need a different solution, you need to have a way to look into the space and find similar things.”

This is what Google did with RankBrain for text. Van Luijt was intrigued. He started experimenting with Natural Language Processing (NLP) models. He even got to ask Google’s people directly: Were they going to build a B2B search engine solution? Since their reply was “no,” he set out to do that with Weaviate.

Searching the document space with vectors

NLP machine learning models output vectors: They place individual words in a vector space. The idea behind Weaviate was: What if we take a document — an email, a product, a post, whatever — look at all the individual words that describe it and calculate a vector for those words.

This will be where the document sits in the vector space. And then, if you ask, for example: What publications are most related to fashion? The search engine should look into the vector space, and find publications like Vogue, as being close to “fashion” in this space.

This is at the core of what Weaviate does. In addition, data in Weaviate are stored in a graph format. When nodes in the graph are located, users can traverse further and find other nodes in the graph.

weaviate.jpg

Weaviate uses vectors to search for documents in spaces comprising of many dimensions. (Image: Weaviate)

It’s not that it isn’t possible to store vectors in traditional databases. It is, and people do that. But after a certain point, it becomes impractical. Besides performance, complexity is also a barrier. For example, van Luijt mentioned, in most cases, people are not privy to the details of how vectorization happens.

Weaviate comes with a number of built-in vectorizers. Some are general-purpose, some are tailored to specific domains such as cybersecurity or healthcare. A modular structure enables people to plugin their own vectorizers, too.

Weaviate also works with popular machine learning frameworks such as PyTorch or TensorFlow. However, there is a catch: At this time, if you train your model, or use one provided by Weaviate, you’re stuck with it.

If a model changes in a way that influences the way it generates vectors, Weaviate would have to re-index its data to work. This is not currently supported. Van Luijt mentioned it was not required in their current use cases, but they are looking into ways of supporting that.

As a startup, SeMI Technologies, the company van Luijt founded around Weaviate, is navigating the market for traction. Currently, the retail and FMCG industry is working well for them, with Metro AG being a prominent use case.

The challenge that Metro had was how to find new opportunities in the market. Weaviate helped them do that by combining data from their CRM and Open Street Maps. If a location where a business exists could not be associated with a customer in the CRM, that indicated an opportunity.

GraphQL makes for good API UX

Across industries, van Luijt noted, the problem is always the same at the root level: unstructured data needs to be related to something internally structured. Graphs are well-known for helping leverage connections. But it turns out that even the inability to find connections can generate business value, as the Metro use case exemplifies.

Van Luijt is a firm believer in the value of graphs for leveraging connections — or lack thereof. Stacking up data in data warehouses and data lakes and lakehouses and whatnot does have value. But, to get value from connections in the data, it’s the graph model that makes the most sense, he noted.

Then, the question becomes: How are we going to get people access to this? To give people a lot of capabilities so they can do “a tremendous amount of stuff,” a graph query language like SPARQL may make sense, van Luijt said.

graphql.jpg

GraphQL’s meteoric rise among developers has attracted interest in using it as an access layer for databases, too. Image: Apollo

But if you want to make it simple for people to access graphs so they have a very short learning curve, GraphQL becomes interesting, he went on to add: “Most developers who are unfamiliar with graph technology, if they see SPARQL, they start sweating and they get nervous. If they see GraphQL, they go like, ‘Hey, I understand this. This makes sense.”https://www.zdnet.com/”

There’s another upside to GraphQL: the community around it. There are many libraries available, and because Weaviate uses GraphQL, these libraries can be used as well. Van Luijt described the decision to use GraphQL as a user experience (UX) decision — the UX to access an API should be smooth.

Weaviate also supports the notion of schemas. When an instance starts running, the API endpoint becomes available, and the first thing users need to do is to create a class property schema. It can be as simple or as complex as it needs to, and existing schemas can also be imported.

A pragmatic approach

Van Luijt has very pragmatic views when it comes to the limitations of vectors, as well as to the use of open source. To quote Gary Marcus and Ray Mooney before him, “You can’t cram the meaning of a whole $&!#* sentence into a single $!#&* vector”.

That much is true, but does it matter if you can get practical results out of using vectors? Not much, argues van Luijt. The problem Weaviate is trying to solve is finding things. So, if the similarity search does a good job in finding things using vectors, that’s good enough. The idea, he went on to add, is to turn vectorization-based search from a data science problem into an engineering problem.

The same pragmatic approach is taken when it comes to open source. There are many reasons why people choose to go with open source. For Weaviate, open source, or rather open core, was chosen as a mechanism for transparency towards customers and users.

Perhaps surprisingly, van Luijt noted Weaviate is not necessarily looking for contributors. That would be nice to have, but the main purpose being open source serves is enabling audits. When clients ask their experts to audit Weaviate, being open source enables this.

Weaviate is available both as Software-as-a-Service and on-premises. Counter to conventional wisdom, it seems most Weaviate users are interested in on-premise deployments.

In practice, however, this oftentimes means their own project in one of the major cloud providers, with services from the Weaviate team. As the team and the product scale-up, a shift toward the self-service model may be called for.

Disclosure: SeMI Technologies has worked with the author as a client.

Credit: Source link

Share this:

  • Share
  • Tweet
  • Telegram
  • WhatsApp

Related

ShareTweetSendSharePinShare
Previous Post

NWSL Commissioner Lisa Baird Ready For 2021 Challenge Cup, Season To Begin With Fans Returning: ‘It’s Going To Be Exciting For Our Women’

Next Post

Highlights from MedPAC’s Spring 2021 Report – Healthcare Economist

Related Posts

Technology

PUBG Mobile developer could be working on a new sci-fi shooter codenamed ‘Vertical

April 12, 2021
Technology

Google Photos for Desktop Gets Google Lens Functionality

April 12, 2021
Technology

LG Wing gets a whopping Rs 40,000 discount in India

April 12, 2021
Next Post

Highlights from MedPAC’s Spring 2021 Report – Healthcare Economist

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result

Recent News

PUBG Mobile developer could be working on a new sci-fi shooter codenamed ‘Vertical

April 12, 2021

Back Off! Arizona Governor Warns NFL Not To Mess With His State’s Election Bills, Super Bowl

April 12, 2021

Peru to Hold Second Round of Presidential Elections in Summer

April 12, 2021

Google Photos for Desktop Gets Google Lens Functionality

April 12, 2021

California teacher flips out on students over distance learning complaints

April 12, 2021

Follow The Money

April 12, 2021

European Stock Futures Lower; Caution Ahead of Action-Packed Week By Investing.com

April 12, 2021

Akhil Turai Enters the Space Industry with Space Science LLC

April 12, 2021

John Oliver Takes Biden To Task Over Immigration Reform, Uncovers Wild West Of Nursing Homes – Deadline

April 12, 2021

(Video) Manchester City linked wonderkid scores highlight-reel goal for Fluminense

April 12, 2021

GFAMNEWS Facebook

Word of Life News

This is an online news portal that aims to share latest news, usa news, politics, business, tech, health, fashion, sports, entertainment and much more stuff like that. Feel free to get in touch with us!

Recent News

  • PUBG Mobile developer could be working on a new sci-fi shooter codenamed ‘Vertical
  • Back Off! Arizona Governor Warns NFL Not To Mess With His State’s Election Bills, Super Bowl
  • Peru to Hold Second Round of Presidential Elections in Summer

Subscribe Now

Loading
  • Contact Us
  • Our Feeds
  • Terms & Service

© 2020 gfamnews.com - All rights reserved!

No Result
View All Result
  • News
  • World News
  • US News
  • Political News
  • Business News
  • Health
  • Sports
  • Technology
  • Entertainment
  • Fashion and Lifestyle
  • World Religion

© 2020 gfamnews.com - All rights reserved!