Til 2025-06-19
AI-related stuffs
Semantic search
Human likes words, images but the computer feel it’s so much freaking easier to deal with primitive data types like numbers. That’s why, when they try to compare things for similarity search, we should provide them with data in a format that a computer can understand and work with more easily.
Some new words came out, “vector” and “embedding”
“Vector” is just an array of numbers, but in the world of the computer, they represent the characteristics of the original data.
“Embedding” is the process of converting raw data to those arrays of numbers. That’s it!
So again, the root point: computers can compare numbers more easily than words or images.
Retrival
It’s just getting the data.
Prompt
It’s just string input for LLM models. Because the LLM response is nondeterministic, it’s good to have a good prompt to have a more reliable and consistent response from LLM models.