Greedy Algorithms
Real-life problems: mostly optimization problems.
If we assume, "local optimal -> global optimal" "Greedy".
Grabs data items in sequence, each time with BEST choice, without thinking Future.
-> BEST choice: local optimal choice (o), global optima(x)
-> In terms of viewpoint(intuition, heuristics(경험))
Example: MST(Minimum Spanning Tree)
Design
while NOT solved yet {
Step 1. Selection (local optimal choice, ranking)
Step 2. Feasibility (contraints, optional)
Step 3. Solution check (termination)
}
Greedy_MST (input: graph G=(V,E))
F=empty set
while NOT solved yet {
Select an edge 'e' (Local Optimal Choice) --- > ranking(scoring) function
if e creates NO cycle then add e to F
if T=(V,F) is a Spanning Tree, then stop/exit
}
---> output: MST T=(V,F) F is a subset of E
---- > how to select 'e'? : Prim's alg, Kruskal's alg.
Prim's Algorithm : node(vertex)-based approach(ranking)
F=empty set
Y=(v1}
while NOT solved yet {
select v in V-Y nearest to Y; // selection
feasibility test (cycle?) // 불필요
add v to Y and add corresponding edge to F
if V == Y, then stop/exit
}
Prim's Algorithm은 항상 트리 외부의 정점을 트리에 포함시키는 방식으로 동작하기 때문에, 사이클이 생길 수 없다
(커버하지 않았던 노드로 하나를 연결시키기 때문에 사이클이 만들어질 수 없다)

<Analysis>
1. Time complexity T(|V|)=T(n) ~= O(n(n-1)) ~= O(n^2)
2 . Proof required (반드시 proof 되어야 하는 것은 아님)
Prim's Theorem: "Prim's alg produces MST"
Proof: .......... by induction. (greedy approach, incremental greedy algorithm)
(Base) F=empty is promising. (MST)
(Hypothesis) Assume: F is promising라고 할 때, (MST T=(Y,F))
(Induction) F + next edge (그 다음 생성되는 MST) is MST.
Proof on Induction is required -> Lemma.
(technique: proof by contradiction)
Contradiction : NOT F + next edge( 그 다음 생성되는 MST ) is MST.
T=(V,F) ---> next tree: prim's alg. T(prim)=(Y+v, F+e) MST (X)
--> there exists another optimal MST: T(opt)=(Y+v', F+e') MST O
T(opt = MST)이 T(prim)보다 총 weight 합이 작다
---> But, weight(T(opt)) > weight(T(Prim)) ... e' > e ---> conflict
weight(T(opt)) > weight (T(prim))
: e’이 더 작았다면 프림이 e’ 엣지를 선택해 왔을 것, 프림이 e를 선택했다는 것은
남아있는 outgoing edge 중에 e가 가장 작았던 것
Proof
Theorm: Prim’s algorithm produces MST
Base: F = {} is promising (MST)
Hypothesis: F is promising이라고 할 때,
Induction: F+next_edge is promising(MST)
Proof by contradiction
F+next_edge가 MST가 아니라고 가정,
Weight(T(prim)) > Weight(T(X))를 만족하는 x가 존재한다는 것
이때, T(prim) = (Y+v, F+e), T(Y+v’, F+e’), e > e’인데,
Prim’s algorithm은 남아있는 outgoing edge 중에서 최소인 edge를 선택하므로 모순이다
따라서 Prim’s alg는 MST를 만들어내는 알고리즘이다.
Kruskal's Algorithm
Edge 집합 e를 sorting한 다음에 sorted된 edege를 중심으로 MST를 구성해 나가자는 것
Question: how to check 'Cycle' in Kruskal?
- set of tree: Forest data structure
- two key operations in Forest: find and union.
- find: root finding in a tree
- union : connect two roots
----> find each root of i and j.
--------> see/check if the two roots are the same
===> IF YES, cycle O
===> IF NO, cycle X, add 'e' -> repeat ...
Kruskal_MST(input G) : edge-based approach
1. F <- empty
2. Create 'n' disjoint sets: (v1), (v2), .., (vn). // init
3. Sort Edge set 'E'
4. while NOT solved yet {
select next edge 'e': from sorted list
IF (cycle check == NO)
// YES => remove/pass
'e' connects two subsets, then, merge them and add e to F.
IF (stop condition == YES) stop
}
CYCLE CHECK
e=next edge: (i, j)
p=find(i)
q=find(j)
IF p != q --> connect/merge,
add e to F.
find(j) != find(k)

Analysis:
T(n) = ?, |V|=n, [E|=m
1. init: theta(n)
2. sort: depending on the specific alg: O(mlogm)
3. while loop: O(m * alpha(n,m)) -> O(mlogn)
: find-union operations에 대한 연구들...
4. remaining ops.
---> Kruskal algo. T(n,m)=O(mlogm)
--- > # edges: n-1 <= m <= n(n-1)/2
일반적으로 vertex(n)보다 edge(m)의 개수가 더 많다
크루스칼 알고리즘에서 sorting이 가장 큰 complexity를 차지한다
Prim's alg vs. Kruskal's alg.
: proof done.
: time complexity? O(n^2) vs O(mlogm)
---> more research is required
---> advance O(n^2)
Dijkstra algorithm for SSSP
1) In terms of RELAX function/concept
2) In terms of MST(Prim's algorithm)
Greedy algorithm:
1) optimal substructure
2) local optima -> global optima. ==> Greedy(fast) > DP
Dijkstra ~~ similar to Prim's algorithm
while (NOT solved yet) {
select v from V-Y nearest from v1(source/start)
using only Y as intermediates(relax);
}