Artificial intelligence engineering (AI engineering) is a technical discipline that focuses on the design, development, and deployment of AI systems. AI engineering involves applying engineering principles and methodologies to create scalable, efficient, and reliable AI-based solutions. It merges aspects of data engineering and software engineering to create real-world applications in diverse domains such as healthcare, finance, autonomous systems, and industrial automation. == Terminology ambiguity == According to Chip Huyen's book AI Engineering: Building Applications with Foundation Models, the term AI engineering refers to the process of building applications that use foundation models, which are typically models developed by a small number of research laboratories and made available as a service. Huyen distinguishes this from machine learning (ML) engineering, which involves building and deploying models developed in-house. She notes that most practical AI systems combine both approaches. For example, a customer-support chatbot may use a generative model to produce responses while also incorporating locally built components such as request classifiers or scoring mechanisms to assess response quality. As a result, the terms AI engineering and ML engineering are often used together or interchangeably in practice. The distinction and broader usage of the term have been discussed in industry publications and interviews, where AI engineering has been described as an emerging discipline focused on productionizing applications built with foundation models. == Key components == AI engineering integrates a variety of technical domains and practices, all of which are essential to building scalable, reliable, and ethical AI systems. === Data engineering and infrastructure === Data serves as the cornerstone of AI systems, necessitating careful engineering to ensure premium quality, wide spread availability, and usability. AI engineers gather large, diverse datasets from multiple sources such as databases, APIs, and real-time streams. This data undergoes cleaning, normalization, and preprocessing, often facilitated by automated data pipelines that manage extraction, transformation, and loading (ETL) processes. Efficient storage solutions, such as SQL (or NoSQL) databases and data lakes, must be selected based on data characteristics and use cases. Security measures, including encryption and access controls, are critical for protecting sensitive information and ensuring compliance with regulations like GDPR. Scalability is essential, frequently involving cloud services and distributed computing frameworks to handle growing data volumes effectively. === Algorithm selection and optimization === Selecting the appropriate algorithm is crucial for the success of any AI system. Engineers evaluate the problem (which could be classification or regression, for example) to determine the most suitable machine learning algorithm, including deep learning paradigms. Once an algorithm is chosen, optimizing it through hyperparameter tuning is essential to enhance efficiency and accuracy. Techniques such as grid search or Bayesian optimization are employed, and engineers often utilize parallelization to expedite training processes, particularly for large models and datasets. For existing models, techniques like transfer learning can be applied to adapt pre-trained models for specific tasks, reducing the time and resources needed for training. === Deep learning engineering === Deep learning is particularly important for tasks involving large and complex datasets. Engineers design neural network architectures tailored to specific applications, such as convolutional neural networks for visual tasks or recurrent neural networks for sequence-based tasks. Transfer learning, where pre-trained models are fine-tuned for specific use cases, helps streamline development and often enhances performance. Optimization for deployment in resource-constrained environments, such as mobile devices, involves techniques like pruning and quantization to minimize model size while maintaining performance. Engineers also mitigate data imbalance through augmentation and synthetic data generation, ensuring robust model performance across various classes. === Natural language processing === Natural language processing (NLP) is a crucial component of AI engineering, focused on enabling machines to understand and generate human language. The process begins with text preprocessing to prepare data for machine learning models. Recent advancements, particularly transformer-based models like BERT and GPT, have greatly improved the ability to understand context in language. AI engineers work on various NLP tasks, including sentiment analysis, machine translation, and information extraction. These tasks require sophisticated models that utilize attention mechanisms to enhance accuracy. Applications range from virtual assistants and chatbots to more specialized tasks like named-entity recognition (NER) and Part of speech (POS) tagging. === Reasoning and decision-making systems === Developing systems capable of reasoning and decision-making is a significant aspect of AI engineering. Whether starting from scratch or building on existing frameworks, engineers create solutions that operate on data or logical rules. Symbolic AI employs formal logic and predefined rules for inference, while probabilistic reasoning techniques like Bayesian networks help address uncertainty. These models are essential for applications in dynamic environments, such as autonomous vehicles, where real-time decision-making is critical. === Security === Security is a critical consideration in AI engineering, particularly as AI systems become increasingly integrated into sensitive and mission-critical applications. AI engineers implement robust security measures to protect models from adversarial attacks, such as evasion and poisoning, which can compromise system integrity and performance. Techniques such as adversarial training, where models are exposed to malicious inputs during development, help harden systems against these attacks. Additionally, securing the data used to train AI models is of paramount importance. Encryption, secure data storage, and access control mechanisms are employed to safeguard sensitive information from unauthorized access and breaches. AI systems also require constant monitoring to detect and mitigate vulnerabilities that may arise post-deployment. In high-stakes environments like autonomous systems and healthcare, engineers incorporate redundancy and fail-safe mechanisms to ensure that AI models continue to function correctly in the presence of security threats. === Ethics and compliance === As AI systems increasingly influence societal aspects, ethics and compliance are vital components of AI engineering. Engineers design models to mitigate risks such as data poisoning and ensure that AI systems adhere to legal frameworks, such as data protection regulations like GDPR. Privacy-preserving techniques, including data anonymization and differential privacy, are employed to safeguard personal information and ensure compliance with international standards. Ethical considerations focus on reducing bias in AI systems, preventing discrimination based on race, gender, or other protected characteristics. By developing fair and accountable AI solutions, engineers contribute to the creation of technologies that are both technically sound and socially responsible. == Workload == An AI engineer's workload revolves around the AI system's life cycle, which is a complex, multi-stage process. This process may involve building models from scratch or using pre-existing models through transfer learning, depending on the project's requirements. Each approach presents unique challenges and influences the time, resources, and technical decisions involved. === Problem definition and requirements analysis === Regardless of whether a model is built from scratch or based on a pre-existing model, the work begins with a clear understanding of the problem. The engineer must define the scope, understand the business context, and identify specific AI objectives that align with strategic goals. This stage includes consulting with stakeholders to establish key performance indicators (KPIs) and operational requirements. When developing a model from scratch, the engineer must also decide which algorithms are most suitable for the task. Conversely, when using a pre-trained model, the workload shifts toward evaluating existing models and selecting the one most aligned with the task. The use of pre-trained models often allows for a more targeted focus on fine-tuning, as opposed to designing an entirely new model architecture. === Data acquisition and preparation === Data acquisition and preparation are critical stages regardless of the development method chosen, as the performance of any AI system relies heavily on high-quality, re
Outline of robotics
The following outline is provided as an overview of and topical guide to robotics: Robotics is a branch of mechanical engineering, electrical engineering and computer science that deals with the design, construction, operation, and application of robots, as well as computer systems for their control, sensory feedback, and information processing. These technologies deal with automated machines that can take the place of humans in dangerous environments or manufacturing processes, or resemble humans in appearance, behaviour, and or cognition. Many of today's robots are inspired by nature contributing to the field of bio-inspired robotics. The word "robot" was introduced to the public by Czech writer Karel Čapek in his play R.U.R. (Rossum's Universal Robots), published in 1920. The term "robotics" was coined by Isaac Asimov in his 1941 science fiction short-story "Liar!" == Nature of robotics == Robotics can be described as: An applied science – scientific knowledge transferred into a physical environment. A branch of computer science – A branch of electrical engineering – A branch of mechanical engineering – Research and development – A branch of technology – == Branches of robotics == Adaptive control – control method used by a controller which must adapt to a controlled system with parameters which vary, or are initially uncertain. For example, as an aircraft flies, its mass will slowly decrease as a result of fuel consumption; a control law is needed that adapts itself to such changing conditions. Aerial robotics – development of unmanned aerial vehicles (UAVs), commonly known as drones, aircraft without a human pilot aboard. Their flight is controlled either autonomously by onboard computers or by the remote control of a pilot on the ground or in another vehicle. Android science – interdisciplinary framework for studying human interaction and cognition based on the premise that a very humanlike robot (that is, an android) can elicit human-directed social responses in human beings. Anthrobotics – science of developing and studying robots that are either entirely or in some way human-like. Artificial intelligence – the intelligence of machines and the branch of computer science that aims to create it. Artificial neural networks – a mathematical model inspired by biological neural networks. Autonomous car – an autonomous vehicle capable of fulfilling the human transportation capabilities of a traditional car Autonomous research robotics – Bayesian network – BEAM robotics – a style of robotics that primarily uses simple analogue circuits instead of a microprocessor in order to produce an unusually simple design (in comparison to traditional mobile robots) that trades flexibility for robustness and efficiency in performing the task for which it was designed. Behavior-based robotics – the branch of robotics that incorporates modular or behavior based AI (BBAI). Bio-inspired robotics – making robots that are inspired by biological systems. Biomimicry and bio-inspired design are sometimes confused. Biomimicry is copying the nature while bio-inspired design is learning from nature and making a mechanism that is simpler and more effective than the system observed in nature. Biomimetic – see Bionics. Biomorphic robotics – a sub-discipline of robotics focused upon emulating the mechanics, sensor systems, computing structures and methodologies used by animals. Bionics – also known as biomimetics, biognosis, biomimicry, or bionical creativity engineering is the application of biological methods and systems found in nature to the study and design of engineering systems and modern technology. Biorobotics – a study of how to make robots that emulate or simulate living biological organisms mechanically or even chemically. Cloud robotics – is a field of robotics that attempts to invoke cloud technologies such as cloud computing, cloud storage, and other Internet technologies centered around the benefits of converged infrastructure and shared services for robotics. Cognitive robotics – views animal cognition as a starting point for the development of robotic information processing, as opposed to more traditional Artificial Intelligence techniques. Clustering – Computational neuroscience – study of brain function in terms of the information processing properties of the structures that make up the nervous system. Robot control – a study of controlling robots Robotics conventions – Data mining Techniques – Degrees of freedom – in mechanics, the degree of freedom (DOF) of a mechanical system is the number of independent parameters that define its configuration. It is the number of parameters that determine the state of a physical system and is important to the analysis of systems of bodies in mechanical engineering, aeronautical engineering, robotics, and structural engineering. Developmental robotics – a methodology that uses metaphors from neural development and developmental psychology to develop the mind for autonomous robots Digital control – a branch of control theory that uses digital computers to act as system controllers. Digital image processing – the use of computer algorithms to perform image processing on digital images. Dimensionality reduction – the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. Distributed robotics – Electronic stability control – is a computerized technology that improves the safety of a vehicle's stability by detecting and reducing loss of traction (skidding). Evolutionary computation – Evolutionary robotics – a methodology that uses evolutionary computation to develop controllers for autonomous robots Extended Kalman filter – Flexible Distribution functions – Feedback control and regulation – Human–computer interaction – a study, planning and design of the interaction between people (users) and computers Human robot interaction – a study of interactions between humans and robots Intelligent vehicle technologies – comprise electronic, electromechanical, and electromagnetic devices - usually silicon micromachined components operating in conjunction with computer controlled devices and radio transceivers to provide precision repeatability functions (such as in robotics artificial intelligence systems) emergency warning validation performance reconstruction. Computer vision – Machine vision – Kinematics – study of motion, as applied to robots. This includes both the design of linkages to perform motion, their power, control and stability; also their planning, such as choosing a sequence of movements to achieve a broader task. Laboratory robotics – the act of using robots in biology or chemistry labs Robot learning – learning to perform tasks such as obstacle avoidance, control and various other motion-related tasks Direct manipulation interface – In computer science, direct manipulation is a human–computer interaction style which involves continuous representation of objects of interest and rapid, reversible, and incremental actions and feedback. The intention is to allow a user to directly manipulate objects presented to them, using actions that correspond at least loosely to the physical world. Manifold learning – Microrobotics – a field of miniature robotics, in particular mobile robots with characteristic dimensions less than 1 mm Motion planning – (a.k.a., the "navigation problem", the "piano mover's problem") is a term used in robotics for the process of detailing a task into discrete motions. Motor control – information processing related activities carried out by the central nervous system that organize the musculoskeletal system to create coordinated movements and skilled actions. Nanorobotics – the emerging technology field creating machines or robots whose components are at or close to the scale of a nanometer (10−9 meters). Passive dynamics – refers to the dynamical behavior of actuators, robots, or organisms when not drawing energy from a supply (e.g., batteries, fuel, ATP). Programming by Demonstration – an End-user development technique for teaching a computer or a robot new behaviors by demonstrating the task to transfer directly instead of programming it through machine commands. Quantum robotics – a subfield of robotics that deals with using quantum computers to run robotics algorithms more quickly than digital computers can. Rapid prototyping – automatic construction of physical objects via additive manufacturing from virtual models in computer aided design (CAD) software, transforming them into thin, virtual, horizontal cross-sections and then producing successive layers until the items are complete. As of June 2011, used for making models, prototype parts, and production-quality parts in relatively small numbers. Reinforcement learning – an area of machine learning in computer science, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward. Robot
Latent semantic analysis
Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents. An information retrieval technique using latent semantic structure was patented in 1988 by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter. In the context of its application to information retrieval, it is sometimes called latent semantic indexing (LSI). == Overview == === Occurrence matrix === LSA can use a document-term matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to terms and whose columns correspond to documents. A typical example of the weighting of the elements of the matrix is tf-idf (term frequency–inverse document frequency): the weight of an element of the matrix is proportional to the number of times the terms appear in each document, where rare terms are upweighted to reflect their relative importance. This matrix is also common to standard semantic models, though it is not necessarily explicitly expressed as a matrix, since the mathematical properties of matrices are not always used. === Rank lowering === After the construction of the occurrence matrix, LSA finds a low-rank approximation to the term-document matrix. There could be various reasons for these approximations: The original term-document matrix is presumed too large for the computing resources; in this case, the approximated low rank matrix is interpreted as an approximation (a "least and necessary evil"). The original term-document matrix is presumed noisy: for example, anecdotal instances of terms are to be eliminated. From this point of view, the approximated matrix is interpreted as a de-noisified matrix (a better matrix than the original). The original term-document matrix is presumed overly sparse relative to the "true" term-document matrix. That is, the original matrix lists only the words actually in each document, whereas we might be interested in all words related to each document—generally a much larger set due to synonymy. The consequence of the rank lowering is that some dimensions are combined and depend on more than one term: {(car), (truck), (flower)} → {(1.3452 car + 0.2828 truck), (flower)} This mitigates the problem of identifying synonymy, as the rank lowering is expected to merge the dimensions associated with terms that have similar meanings. It also partially mitigates the problem with polysemy, since components of polysemous words that point in the "right" direction are added to the components of words that share a similar meaning. Conversely, components that point in other directions tend to either simply cancel out, or, at worst, to be smaller than components in the directions corresponding to the intended sense. === Derivation === Let X {\displaystyle X} be a matrix where element ( i , j ) {\displaystyle (i,j)} describes the occurrence of term i {\displaystyle i} in document j {\displaystyle j} (this can be, for example, the frequency). X {\displaystyle X} will look like this: d j ↓ t i T → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] {\displaystyle {\begin{matrix}&{\textbf {d}}_{j}\\&\downarrow \\{\textbf {t}}_{i}^{T}\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}\end{matrix}}} Now a row in this matrix will be a vector corresponding to a term, giving its relation to each document: t i T = [ x i , 1 … x i , j … x i , n ] {\displaystyle {\textbf {t}}_{i}^{T}={\begin{bmatrix}x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\end{bmatrix}}} Likewise, a column in this matrix will be a vector corresponding to a document, giving its relation to each term: d j = [ x 1 , j ⋮ x i , j ⋮ x m , j ] {\displaystyle {\textbf {d}}_{j}={\begin{bmatrix}x_{1,j}\\\vdots \\x_{i,j}\\\vdots \\x_{m,j}\\\end{bmatrix}}} Now the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} between two term vectors gives the correlation between the terms over the set of documents. The matrix product X X T {\displaystyle XX^{T}} contains all these dot products. Element ( i , p ) {\displaystyle (i,p)} (which is equal to element ( p , i ) {\displaystyle (p,i)} ) contains the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} ( = t p T t i {\displaystyle ={\textbf {t}}_{p}^{T}{\textbf {t}}_{i}} ). Likewise, the matrix X T X {\displaystyle X^{T}X} contains the dot products between all the document vectors, giving their correlation over the terms: d j T d q = d q T d j {\displaystyle {\textbf {d}}_{j}^{T}{\textbf {d}}_{q}={\textbf {d}}_{q}^{T}{\textbf {d}}_{j}} . Now, from the theory of linear algebra, there exists a decomposition of X {\displaystyle X} such that U {\displaystyle U} and V {\displaystyle V} are orthogonal matrices and Σ {\displaystyle \Sigma } is a diagonal matrix. This is called a singular value decomposition (SVD): X = U Σ V T {\displaystyle {\begin{matrix}X=U\Sigma V^{T}\end{matrix}}} The matrix products giving us the term and document correlations then become X X T = ( U Σ V T ) ( U Σ V T ) T = ( U Σ V T ) ( V T T Σ T U T ) = U Σ V T V Σ T U T = U Σ Σ T U T X T X = ( U Σ V T ) T ( U Σ V T ) = ( V T T Σ T U T ) ( U Σ V T ) = V Σ T U T U Σ V T = V Σ T Σ V T {\displaystyle {\begin{matrix}XX^{T}&=&(U\Sigma V^{T})(U\Sigma V^{T})^{T}=(U\Sigma V^{T})(V^{T^{T}}\Sigma ^{T}U^{T})=U\Sigma V^{T}V\Sigma ^{T}U^{T}=U\Sigma \Sigma ^{T}U^{T}\\X^{T}X&=&(U\Sigma V^{T})^{T}(U\Sigma V^{T})=(V^{T^{T}}\Sigma ^{T}U^{T})(U\Sigma V^{T})=V\Sigma ^{T}U^{T}U\Sigma V^{T}=V\Sigma ^{T}\Sigma V^{T}\end{matrix}}} Since Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} and Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } are diagonal we see that U {\displaystyle U} must contain the eigenvectors of X X T {\displaystyle XX^{T}} , while V {\displaystyle V} must be the eigenvectors of X T X {\displaystyle X^{T}X} . Both products have the same non-zero eigenvalues, given by the non-zero entries of Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} , or equally, by the non-zero entries of Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } . Now the decomposition looks like this: X U Σ V T ( d j ) ( d ^ j ) ↓ ↓ ( t i T ) → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] = ( t ^ i T ) → [ [ u 1 ] … [ u l ] ] ⋅ [ σ 1 … 0 ⋮ ⋱ ⋮ 0 … σ l ] ⋅ [ [ v 1 ] ⋮ [ v l ] ] {\displaystyle {\begin{matrix}&X&&&U&&\Sigma &&V^{T}\\&({\textbf {d}}_{j})&&&&&&&({\hat {\textbf {d}}}_{j})\\&\downarrow &&&&&&&\downarrow \\({\textbf {t}}_{i}^{T})\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}&=&({\hat {\textbf {t}}}_{i}^{T})\rightarrow &{\begin{bmatrix}{\begin{bmatrix}\,\\\,\\{\textbf {u}}_{1}\\\,\\\,\end{bmatrix}}\dots {\begin{bmatrix}\,\\\,\\{\textbf {u}}_{l}\\\,\\\,\end{bmatrix}}\end{bmatrix}}&\cdot &{\begin{bmatrix}\sigma _{1}&\dots &0\\\vdots &\ddots &\vdots \\0&\dots &\sigma _{l}\\\end{bmatrix}}&\cdot &{\begin{bmatrix}{\begin{bmatrix}&&{\textbf {v}}_{1}&&\end{bmatrix}}\\\vdots \\{\begin{bmatrix}&&{\textbf {v}}_{l}&&\end{bmatrix}}\end{bmatrix}}\end{matrix}}} The values σ 1 , … , σ l {\displaystyle \sigma _{1},\dots ,\sigma _{l}} are called the singular values, and u 1 , … , u l {\displaystyle u_{1},\dots ,u_{l}} and v 1 , … , v l {\displaystyle v_{1},\dots ,v_{l}} the left and right singular vectors. Notice the only part of U {\displaystyle U} that contributes to t i {\displaystyle {\textbf {t}}_{i}} is the i 'th {\displaystyle i{\textrm {'th}}} row. Let this row vector be called t ^ i T {\displaystyle {\hat {\textrm {t}}}_{i}^{T}} . Likewise, the only part of V T {\displaystyle V^{T}} that contributes to d j {\displaystyle {\textbf {d}}_{j}} is the j 'th {\displaystyle j{\textrm {'th}}} column, d ^ j {\displaystyle {\hat {\textrm {d}}}_{j}} . These are not the eigenvectors, but depend on all the eigenvectors. I
Query understanding
Query understanding is the process of inferring the intent of a search engine user by extracting semantic meaning from the searcher’s keywords. Query understanding methods generally take place before the search engine retrieves and ranks results. It is related to natural language processing but specifically focused on the understanding of search queries. == Methods == === Stemming and lemmatization === Many languages inflect words to reflect their role in the utterance they appear in. The variation between various forms of a word is likely to be of little importance for the relatively coarse-grained model of meaning involved in a retrieval system, and for this reason the task of conflating the various forms of a word is a potentially useful technique to increase recall of a retrieval system. Stemming algorithms, also known as stemmers, typically use a collection of simple rules to remove suffixes intended to model the language’s inflection rules. For some languages, there are simple lemmatisation methods to reduce a word in query to its lemma or root form or its stem; for others, this operation involves non-trivial string processing and may require recognizing the word's part of speech or referencing a lexical database. The effectiveness of stemming and lemmatization varies across languages. === Query Segmentation === Query segmentation is a key component of query understanding, aiming to divide a query into meaningful segments. Traditional approaches, such as the bag-of-words model, treat individual words as independent units, which can limit interpretative accuracy. For languages like Chinese, where words are not separated by spaces, segmentation is essential, as individual characters often lack standalone meaning. Even in English, the BOW model may not capture the full meaning, as certain phrases—such as "New York"—carry significance as a whole rather than as isolated terms. By identifying phrases or entities within queries, query segmentation enhances interpretation, enabling search engines to apply proximity and ordering constraints, ultimately improving search accuracy and user satisfaction. === Entity recognition === Entity recognition is the process of locating and classifying entities within a text string. Named-entity recognition specifically focuses on named entities, such as names of people, places, and organizations. In addition, entity recognition includes identifying concepts in queries that may be represented by multi-word phrases. Entity recognition systems typically use grammar-based linguistic techniques or statistical machine learning models. === Query rewriting === Query rewriting is the process of automatically reformulating a search query to more accurately capture its intent. Query expansion adds additional query terms, such as synonyms, in order to retrieve more documents and thereby increase recall. Query relaxation removes query terms to reduce the requirements for a document to match the query, thereby also increasing recall. Other forms of query rewriting, such as automatically converting consecutive query terms into phrases and restricting query terms to specific fields, aim to increase precision. === Spelling Correction === Automatic spelling correction is a critical feature of modern search engines, designed to address common spelling errors in user queries. Such errors are especially frequent as users often search for unfamiliar topics. By correcting misspelled queries, search engines enhance their understanding of user intent, thereby improving the relevance and quality of search results and overall user experience.
Color normalization
Color normalization is a topic in computer vision concerned with artificial color vision and object recognition. In general, the distribution of color values in an image depends on the illumination, which may vary depending on lighting conditions, cameras, and other factors. Color normalization allows for object recognition techniques based on color to compensate for these variations. == Main concepts == === Color constancy === Color constancy is a feature of the human internal model of perception, which provides humans with the ability to assign a relatively constant color to objects even under different illumination conditions. This is helpful for object recognition as well as identification of light sources in an environment. For example, humans see an object approximately as the same color when the sun is bright or when the sun is dim. === Applications === Color normalization has been used for object recognition on color images in the field of robotics, bioinformatics and general artificial intelligence, when it is important to remove all intensity values from the image while preserving color values. One example is in case of a scene shot by a surveillance camera over the day, where it is important to remove shadows or lighting changes on same color pixels and recognize the people that passed. Another example is automated screening tools used for the detection of diabetic retinopathy as well as molecular diagnosis of cancer states, where it is important to include color information during classification. == Known issues == The main issue about certain applications of color normalization is that the result looks unnatural or too distant from the original colors. In cases where there is a subtle variation between important aspects, this can be problematic. More specifically, the side effect can be that pixels become divergent and not reflect the actual color value of the image. A way of combating this issue is to use color normalization in combination with thresholding to correctly and consistently segment a colored image. == Transformations and algorithms == There is a vast array of different transformations and algorithms for achieving color normalization and a limited list is presented here. The performance of an algorithm is dependent on the task and one algorithm which performs better than another in one task might perform worse in another (no free lunch theorem). Additionally, the choice of the algorithm depends on the preferences of the user for the end-result, e.g. they may want a more natural-looking color image. === Grey world === The grey world normalization makes the assumption that changes in the lighting spectrum can be modelled by three constant factors applied to the red, green and blue channels of color. More specifically, a change in illuminated color can be modelled as a scaling α, β and γ in the R, G and B color channels and as such the grey world algorithm is invariant to illumination color variations. Therefore, a constancy solution can be achieved by dividing each color channel by its average value as shown in the following formula: ( α R , β G , γ B ) → ( α R α n ∑ i R , β G β n ∑ i G , γ B γ n ∑ i B ) {\displaystyle \left(\alpha R,\beta G,\gamma B\right)\rightarrow \left({\frac {\alpha R}{{\frac {\alpha }{n}}\sum _{i}R}},{\frac {\beta G}{{\frac {\beta }{n}}\sum _{i}G}},{\frac {\gamma B}{{\frac {\gamma }{n}}\sum _{i}B}}\right)} As mentioned above, grey world color normalization is invariant to illuminated color variations α, β and γ, however it has one important problem: it does not account for all variations of illumination intensity and it is not dynamic; when new objects appear in the scene it fails. To solve this problem there are several variants of the grey world algorithm. Additionally there is an iterative variation of the grey world normalization, however it was not found to perform significantly better. === Histogram equalization === Histogram equalization is a non-linear transform which maintains pixel rank and is capable of normalizing for any monotonically increasing color transform function. It is considered to be a more powerful normalization transformation than the grey world method. The results of histogram equalization tend to have an exaggerated blue channel and look unnatural, due to the fact that in most images the distribution of the pixel values is usually more similar to a Gaussian distribution, rather than uniform. === Histogram specification === Histogram specification transforms the red, green and blue histograms to match the shapes of three specific histograms, rather than simply equalizing them. It refers to a class of image transforms which aims to obtain images of which the histograms have a desired shape. As specified, firstly it is necessary to convert the image so that it has a particular histogram. Assume an image x. The following formula is the equalization transform of this image: y = f ( x ) = ∫ 0 x p x ( u ) d u {\displaystyle y=f(x)=\int \limits _{0}^{x}p_{x}(u)du} Then assume wanted image z. The equalization transform of this image is: y ′ = g ( z ) = ∫ 0 z p z ( u ) d u {\displaystyle y'=g(z)=\int \limits _{0}^{z}p_{z}(u)du} Of course p z ( u ) {\displaystyle p_{z}(u)} is the histogram of the output image. The formula to find the inverse of the above transform is: z = g − 1 ( y ′ ) {\displaystyle z=g^{-1}(y')} Therefore, since images y and y' have the same equalized histogram they are actually the same image, meaning y = y' and the transform from the given image x to the wanted image z is: z = g − 1 ( y ′ ) = g − 1 ( y ) = g − 1 ( f ( x ) ) {\displaystyle z=g^{-1}(y')=g^{-1}(y)=g^{-1}(f(x))} Histogram specification has the advantage of producing more realistic looking images, as it does not exaggerate the blue channel like histogram equalization. === Comprehensive Color Normalization === The comprehensive color normalization is shown to increase localization and object classification results in combination with color indexing. It is an iterative algorithm which works in two stages. The first stage is to use the red, green and blue color space with the intensity normalized, to normalize each pixel. The second stage is to normalize each color channel separately, so that the sum of the color components is equal to one third of the number of pixels. The iterations continue until convergence, meaning no additional changes. Formally: Normalize the color image f ( t ) = [ f i j ( t ) ] i = 1... N , j = 1... M {\displaystyle f^{(t)}=[f_{ij}^{(t)}]_{i=1...N,j=1...M}} which consists of color vectors f i j ( t ) = ( r i j ( t ) , g i j ( t ) , b i j ( t ) ) T . {\displaystyle f_{ij}^{(t)}=(r_{ij}^{(t)},g_{ij}^{(t)},b_{ij}^{(t)})^{T}.} For the first step explained above, compute: S i j := r i j ( t ) + g i j ( t ) + b i j ( t ) {\displaystyle S_{ij}:=r_{ij}^{(t)}+g_{ij}^{(t)}+b_{ij}^{(t)}} which leads to r i j ( t + 1 ) = r i j ( t ) S i j , g i j ( t + 1 ) = g i j ( t ) S i j {\displaystyle r_{ij}^{(t+1)}={\frac {r_{ij}^{(t)}}{S_{ij}}},g_{ij}^{(t+1)}={\frac {g_{ij}^{(t)}}{S_{ij}}}} and b i j ( t + 1 ) = b i j ( t ) S i j . {\displaystyle b_{ij}^{(t+1)}={\frac {b_{ij}^{(t)}}{S_{ij}}}.} For the second step explained above, compute: r ′ = 3 N M ∑ i = 1 N ∑ j = 1 M r i j ( t + 1 ) {\displaystyle r'={\frac {3}{NM}}\sum _{i=1}^{N}\sum _{j=1}^{M}r_{ij}^{(t+1)}} and normalize r i j ( t + 2 ) = r i j ( t + 1 ) r ′ . {\displaystyle r_{ij}^{(t+2)}={\frac {r_{ij}^{(t+1)}}{r'}}.} Of course the same process is done for b' and g'. Then these two steps are repeated until the changes between iteration t and t+2 are less than some set threshold. Comprehensive color normalization, just like the histogram equalization method previously mentioned, produces results that may look less natural due to the reduction in the number of color values.
Digital image
A digital image is an image composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or gray level that is an output from its two-dimensional functions fed as input by its spatial coordinates denoted with x, y on the x-axis and y-axis, respectively. An image can be vector or raster type. By itself, the term "digital image" usually refers to raster images or bitmapped images (as opposed to vector images). == Raster == Raster images have a finite set of digital values, called picture elements or pixels. The digital image contains a fixed number of rows and columns of pixels. Pixels are the smallest individual element in an image, holding quantized values that represent the brightness of a given color at any specific point. Typically, the pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers. These values are often transmitted or stored in a compressed form. Raster images can be created by a variety of input devices and techniques, such as digital cameras, scanners, coordinate-measuring machines, seismographic profiling, airborne radar, and more. They can also be synthesized from arbitrary non-image data, such as mathematical functions or three-dimensional geometric models; the latter being a major sub-area of computer graphics. The field of digital image processing is the study of algorithms for their transformation. === Raster file formats === Most users come into contact with raster images through digital cameras, which use any of several image file formats. Some digital cameras give access to almost all the data captured by the camera, using a raw image format. The Universal Photographic Imaging Guidelines (UPDIG) suggests these formats be used when possible since raw files produce the best quality images. These file formats allow the photographer and the processing agent the greatest level of control and accuracy for output. Their use is inhibited by the prevalence of proprietary information (trade secrets) for some camera makers, but there have been initiatives such as OpenRAW to influence manufacturers to release these records publicly. An alternative may be Digital Negative (DNG), a proprietary Adobe product described as "the public, archival format for digital camera raw data". Although this format is not yet universally accepted, support for the product is growing, and increasingly professional archivists and conservationists, working for respectable organizations, variously suggest or recommend DNG for archival purposes. == Vector == Vector images resulted from mathematical geometry (vector). In mathematical terms, a vector consists of both a magnitude, or length, and a direction. Often, both raster and vector elements will be combined in one image; for example, in the case of a billboard with text (vector) and photographs (raster). Example of vector file types are EPS, PDF, and AI. == Image viewing == Image viewer software displayed on images. Web browsers can display standard internet images formats including JPEG, GIF and PNG. Some can show SVG format which is a standard W3C format. In the past, when the Internet was still slow, it was common to provide "preview" images that would load and appear on the website before being replaced by the main image (to give a preliminary impression). Now Internet is fast enough and this preview image is seldom used. Some scientific images can be very large (for instance, the 46 gigapixel size image of the Milky Way, about 194 GB in size). Such images are difficult to download and are usually browsed online through more complex web interfaces. Some viewers offer a slideshow utility to display a sequence of images. == History == Early digital fax machines such as the Bartlane cable picture transmission system preceded digital cameras and computers by decades. The first picture to be scanned, stored, and recreated in digital pixels was displayed on the Standards Eastern Automatic Computer (SEAC) at NIST. The advancement of digital imagery continued in the early 1960s, alongside development of the space program and in medical research. Projects at the Jet Propulsion Laboratory, MIT, Bell Labs and the University of Maryland, among others, used digital images to advance satellite imagery, wirephoto standards conversion, medical imaging, videophone technology, character recognition, and photo enhancement. Rapid advances in digital imaging began with the introduction of MOS integrated circuits in the 1960s and microprocessors in the early 1970s, alongside progress in related computer memory storage, display technologies, and data compression algorithms. The invention of computerized axial tomography (CAT scanning), using x-rays to produce a digital image of a "slice" through a three-dimensional object, was of great importance to medical diagnostics. As well as origination of digital images, digitization of analog images allowed the enhancement and restoration of archaeological artifacts and began to be used in fields as diverse as nuclear medicine, astronomy, law enforcement, defence and industry. Advances in microprocessor technology paved the way for the development and marketing of charge-coupled devices (CCDs) for use in a wide range of image capture devices and gradually displaced the use of analog film and tape in photography and videography towards the end of the 20th century. The computing power necessary to process digital image capture also allowed computer-generated digital images to achieve a level of refinement close to photorealism. === Digital image sensors === The first semiconductor image sensor was the CCD, developed by Willard S. Boyle and George E. Smith at Bell Labs in 1969. While researching MOS technology, they realized that an electric charge was the analogy of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. The CCD is a semiconductor circuit that was later used in the first digital video cameras for television broadcasting. Early CCD sensors suffered from shutter lag. This was largely resolved with the invention of the pinned photodiode (PPD). It was invented by Nobukazu Teranishi, Hiromitsu Shiraki and Yasuo Ishihara at NEC in 1980. It was a photodetector structure with low lag, low noise, high quantum efficiency and low dark current. In 1987, the PPD began to be incorporated into most CCD devices, becoming a fixture in consumer electronic video cameras and then digital still cameras. Since then, the PPD has been used in nearly all CCD sensors and then CMOS sensors. The NMOS active-pixel sensor (APS) was invented by Olympus in Japan during the mid-1980s. This was enabled by advances in MOS semiconductor device fabrication, with MOSFET scaling reaching smaller micron and then sub-micron levels. The NMOS APS was fabricated by Tsutomu Nakamura's team at Olympus in 1985. The CMOS active-pixel sensor (CMOS sensor) was later developed by Eric Fossum's team at the NASA Jet Propulsion Laboratory in 1993. By 2007, sales of CMOS sensors had surpassed CCD sensors. === Digital image compression === An important development in digital image compression technology was the discrete cosine transform (DCT), a lossy compression technique first proposed by Nasir Ahmed in 1972. DCT compression is used in JPEG, which was introduced by the Joint Photographic Experts Group in 1992. JPEG compresses images down to much smaller file sizes, and has become the most widely used image file format on the Internet. == Mosaic == In digital imaging, a mosaic is a combination of non-overlapping images, arranged in some tessellation. Gigapixel images are an example of such digital image mosaics. Satellite imagery are often mosaicked to cover Earth regions. Interactive viewing is provided by virtual-reality photography.
Phase correlation
Phase correlation is an approach to estimate the relative translative offset between two similar images (digital image correlation) or other data sets. It is commonly used in image registration and relies on a frequency-domain representation of the data, usually calculated by fast Fourier transforms. The term is applied particularly to a subset of cross-correlation techniques that isolate the phase information from the Fourier-space representation of the cross-correlogram. == Example == The following image demonstrates the usage of phase correlation to determine relative translative movement between two images corrupted by independent Gaussian noise. The image was translated by (20,23) pixels. Accordingly, one can clearly see a peak in the phase-correlation representation at approximately (20,23). == Method == Given two input images g a {\displaystyle \ g_{a}} and g b {\displaystyle \ g_{b}} : Apply a window function (e.g., a Hamming window) on both images to reduce edge effects (this may be optional depending on the image characteristics). Then, calculate the discrete 2D Fourier transform of both images. G a = F { g a } , G b = F { g b } {\displaystyle \ \mathbf {G} _{a}={\mathcal {F}}\{g_{a}\},\;\mathbf {G} _{b}={\mathcal {F}}\{g_{b}\}} Calculate the cross-power spectrum by taking the complex conjugate of the second result, multiplying the Fourier transforms together elementwise, and normalizing this product elementwise. R = G a ∘ G b ∗ | G a ∘ G b ∗ | {\displaystyle \ R={\frac {\mathbf {G} _{a}\circ \mathbf {G} _{b}^{}}{|\mathbf {G} _{a}\circ \mathbf {G} _{b}^{}|}}} Where ∘ {\displaystyle \circ } is the Hadamard product (entry-wise product) and the absolute values are taken entry-wise as well. Written out entry-wise for element index ( j , k ) {\displaystyle (j,k)} : R j k = G a , j k ⋅ G b , j k ∗ | G a , j k ⋅ G b , j k ∗ | {\displaystyle \ R_{jk}={\frac {G_{a,jk}\cdot G_{b,jk}^{}}{|G_{a,jk}\cdot G_{b,jk}^{}|}}} Obtain the normalized cross-correlation by applying the inverse Fourier transform. r = F − 1 { R } {\displaystyle \ r={\mathcal {F}}^{-1}\{R\}} Determine the location of the peak in r {\displaystyle \ r} . ( Δ x , Δ y ) = arg max ( x , y ) { r } {\displaystyle \ (\Delta x,\Delta y)=\arg \max _{(x,y)}\{r\}} === Subpixel registration === Commonly, interpolation methods are used to estimate the peak location in the cross-correlogram to non-integer values, despite the fact that the data are discrete, and this procedure is often termed 'subpixel registration'. A large variety of subpixel interpolation methods are given in the technical literature. Common peak interpolation methods such as parabolic interpolation have been used, and the OpenCV computer vision package uses a centroid-based method, though these generally have inferior accuracy compared to more sophisticated methods. Because the Fourier representation of the data has already been computed, it is especially convenient to use the Fourier shift theorem with real-valued (sub-integer) shifts for this purpose, which essentially interpolates using the sinusoidal basis functions of the Fourier transform. An especially popular FT-based estimator is given by Foroosh et al. In this method, the subpixel peak location is approximated by a simple formula involving peak pixel value and the values of its nearest neighbors, where r ( 0 , 0 ) {\displaystyle r_{(0,0)}} is the peak value and r ( 1 , 0 ) {\displaystyle r_{(1,0)}} is the nearest neighbor in the x direction (assuming, as in most approaches, that the integer shift has already been found and the comparand images differ only by a subpixel shift). Δ x = r ( 1 , 0 ) r ( 1 , 0 ) ± r ( 0 , 0 ) {\displaystyle \ \Delta x={\frac {r_{(1,0)}}{r_{(1,0)}\pm r_{(0,0)}}}} The Foroosh et al. method is quite fast compared to most methods, though it is not always the most accurate. Some methods shift the peak in Fourier space and apply non-linear optimization to maximize the correlogram peak, but these tend to be very slow since they must apply an inverse Fourier transform or its equivalent in the objective function. It is also possible to infer the peak location from phase characteristics in Fourier space without the inverse transformation, as noted by Stone. These methods usually use a linear least squares (LLS) fit of the phase angles to a planar model. The long latency of the phase angle computation in these methods is a disadvantage, but the speed can sometimes be comparable to the Foroosh et al. method depending on the image size. They often compare favorably in speed to the multiple iterations of extremely slow objective functions in iterative non-linear methods. Since all subpixel shift computation methods are fundamentally interpolative, the performance of a particular method depends on how well the underlying data conform to the assumptions in the interpolator. This fact also may limit the usefulness of high numerical accuracy in an algorithm, since the uncertainty due to interpolation method choice may be larger than any numerical or approximation error in the particular method. Subpixel methods are also particularly sensitive to noise in the images, and the utility of a particular algorithm is distinguished not only by its speed and accuracy but its resilience to the particular types of noise in the application. == Rationale == The method is based on the Fourier shift theorem. Let the two images g a {\displaystyle \ g_{a}} and g b {\displaystyle \ g_{b}} be circularly-shifted versions of each other: g b ( x , y ) = d e f g a ( ( x − Δ x ) mod M , ( y − Δ y ) mod N ) {\displaystyle \ g_{b}(x,y)\ {\stackrel {\mathrm {def} }{=}}\ g_{a}((x-\Delta x){\bmod {M}},(y-\Delta y){\bmod {N}})} (where the images are M × N {\displaystyle \ M\times N} in size). Then, the discrete Fourier transforms of the images will be shifted relatively in phase: G b ( u , v ) = G a ( u , v ) e − 2 π i ( u Δ x M + v Δ y N ) {\displaystyle \mathbf {G} _{b}(u,v)=\mathbf {G} _{a}(u,v)e^{-2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}} One can then calculate the normalized cross-power spectrum to factor out the phase difference: R ( u , v ) = G a G b ∗ | G a G b ∗ | = G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | = G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | G a G a ∗ | = e 2 π i ( u Δ x M + v Δ y N ) {\displaystyle {\begin{aligned}R(u,v)&={\frac {\mathbf {G} _{a}\mathbf {G} _{b}^{}}{|\mathbf {G} _{a}\mathbf {G} _{b}^{}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{}|}}\\&=e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}\end{aligned}}} since the magnitude of an imaginary exponential always is one, and the phase of G a G a ∗ {\displaystyle \ \mathbf {G} _{a}\mathbf {G} _{a}^{}} always is zero. The inverse Fourier transform of a complex exponential is a Dirac delta function, i.e. a single peak: r ( x , y ) = δ ( x + Δ x , y + Δ y ) {\displaystyle \ r(x,y)=\delta (x+\Delta x,y+\Delta y)} This result could have been obtained by calculating the cross correlation directly. The advantage of this method is that the discrete Fourier transform and its inverse can be performed using the fast Fourier transform, which is much faster than correlation for large images. === Benefits === Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects typical of medical or satellite images. The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation. === Limitations === In practice, it is more likely that g b {\displaystyle \ g_{b}} will be a simple linear shift of g a {\displaystyle \ g_{a}} , rather than a circular shift as required by the explanation above. In such cases, r {\displaystyle \ r} will not be a simple delta function, which will reduce the performance of the method. In such cases, a window function (such as a Gaussian or Tukey window) should be employed during the Fourier transform to reduce edge effects, or the images should be zero padded so that the edge effects can be ignored. If the images consist of a flat background, with all detail situated away from the edges, then a linear shift will be equivalent to a circular shift, and the above derivation will hold exactly. The peak can be sharpened by using edge or vector correlation. For periodic images (such as a chessboard or picket fence), phase correlation may yield ambiguous results with several peaks in the resulting output. == Applications == Phase correlation is the preferred m